Mesh interfaces will now respond to any broadcast (or
matching directed mesh) probe requests with a probe
response.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Previously, the entire mesh beacon would be generated each
time the beacon timer fired. Instead generate a beacon
head and tail (so the TIM can easily be inserted when mesh
power save is on) when starting a mesh or the MBSS
parameters change.
Also add a mutex for protecting beacon updates and
preventing leaks.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When drivers or regulatory have limitations on
40, 80 or 160 MHz channels, advertise these to
userspace via nl80211. Also add a new feature
flag to let userspace know this is supported.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some drivers might support 80 or 160 MHz only on some
channels for whatever reason, so allow them to disable
these channel widths. Also maintain the new flags when
regulatory bandwidth limitations would disable these
wide channels.
Reviewed-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
A while ago, I made the mac80211 station code never change
the channel type after association. This solved a number of
issues but is ultimately wrong, we should react if the AP
changes the HT operation IE and switches bandwidth. One of
the issues is that we associate as HT40 capable, but if the
AP ever switches to 40 MHz we won't be able to receive such
frames because we never set our channel to 40 MHz.
This addresses this and VHT operation changes. If there's a
change that is incompatible with our setup, e.g. if the AP
decides to change the channel entirely (and for some reason
we still hear the beacon) we'll just disconnect.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For HT and VHT the current bandwidth can change,
add the function ieee80211_vif_change_bandwidth()
to take care of this. It returns a failure if the
new bandwidth isn't compatible with the existing
channel context, the caller has to handle that.
When it happens, also inform the driver that the
bandwidth changed for this virtual interface (no
drivers would actually care today though.)
Changing to/from HT/VHT isn't allowed though.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The channel use is confusing, some uses the channel
context and some the bss_conf.chandef. The latter is
fine, so get rid of the channel context part.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Having HT/VHT operation IEs but not capability IEs
leads to a strange situation where we configure the
channel to an HT or VHT bandwidth and then can't
actually use it. Prevent this by checking that the
HT and VHT capability IEs are present as well as
the operation IEs; if not, disable HT and/or VHT.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In beacons and association response frames an AP may include an
operating mode notification element to advertise changes in the
number of spatial streams it can receive. Handle this using the
existing function that handles the action frame, but only handle
NSS changes, not bandwidth changes which aren't allowed here.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This should be called ieee80211_change_chanctx() since
it changes the channel context, not a chandef.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The code to disable HT and VHT if VHT was advertised
without VHT is wrong -- it accidentally uses the wrong
flags. Fix that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In case of connection, the station data is initialised from
the beacon/probe response first and then updated from the
association response. If the latter is different we update
the rate control algorithm and driver. Instead of doing it
this way, set the station data properly with data from the
association response before initializing rate control.
Also simplify the code by passing the station pointer.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Handle the operating mode notification action frame.
When the supported streams or the bandwidth change
let the driver and rate control algorithm know.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
With VHT, a station can change the number of spatial
streams it can receive on the fly, not unlike spatial
multiplexing in HT. Prepare for that by tracking the
maximum number of spatial streams it can receive when
the connection is established.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For VHT, many more bandwidth changes are possible. As a first
step, stop toggling the IEEE80211_HT_CAP_SUP_WIDTH_20_40 flag
in the HT capabilities and instead introduce a bandwidth field
indicating the currently usable bandwidth to transmit to the
station. Of course, make all drivers use it.
To achieve this, make ieee80211_ht_cap_ie_to_sta_ht_cap() get
the station as an argument, rather than the new capabilities,
so it can set up the new bandwidth field.
If the station is a VHT station and VHT bandwidth is in use,
also set the bandwidth accordingly.
Doing this allows us to get rid of the supports_40mhz flag as
the HT capabilities now reflect the true capability instead of
the current setting.
While at it, also fix ieee80211_ht_cap_ie_to_sta_ht_cap() to not
ignore HT cap overrides when MCS TX isn't supported (not that it
really happens...)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Like with HT, make things a bit simpler in future patches by
passing the station to ieee80211_vht_cap_ie_to_sta_vht_cap()
instead of the vht_cap pointer. Also disable VHT here if HT
isn't supported.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since no driver calls the TKIP functions from interrupt
context, there's no need to use spin_lock_irqsave().
Just use spin_lock_bh() (and spin_lock() in the TX path
where we're in a BH or they're already disabled.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There's no need to use _irqsave() as the lock
is never used in interrupt context.
This also fixes a problem in the iwlwifi MVM
driver that calls spin_unlock_bh() within its
set_tim() callback.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There's no use for it, WPA is entirely handled in
wpa_supplicant in userspace, so don't pick the IE.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In some cases when disconnecting after (or during?) CSA
the queues might not recover, and then the only way to
recover is reloading the module.
Fix this by always unblocking the queue CSA reason when
disconnecting.
Cc: stable@vger.kernel.org
Reported-by: Jan-Michael Brummer <jan.brummer@tabos.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since the idle decision rework, mac80211 started calling
bss_info_changed() for the driver's monitor interface,
which causes a crash for iwlwifi, but drivers generally
don't expect this to happen. Therefore, avoid it.
While at it, also prevent calling it in such cases and
only print a warning. For the P2P Device interface the
idle will no longer be called (no channel context), so
also prevent that and warn on it.
Reported-by: Chaitanya <chaitanya.mgit@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In my commit 1672c0e319
("mac80211: start auth/assoc timeout on frame status")
I broke auth/assoc timeout handling: in case we wait
for the TX status, it now leaves the timeout field set
to 0, which is a valid time and can compare as being
before now ("jiffies"). Thus, if the work struct runs
for some other reason, the auth/assoc is treated as
having timed out.
Fix this by introducing a separate "timeout_started"
variable that tracks whether the timeout has started
and is checked before timing out.
Additionally, for proper TX status handling the change
requires that the skb->dev pointer is set up for all
the frames, so set it up for all frames in mac80211.
Reported-by: Wojciech Dubowik <Wojciech.Dubowik@neratec.com>
Tested-by: Wojciech Dubowik <Wojciech.Dubowik@neratec.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Function ieee80211_sta_reset_conn_monitor has been
resetting probe_send_count too early and nullfunc
check was never called after succesfull ack.
Reported-by: Magnus Cederlöf <mcider@gmail.com>
Tested-by: Magnus Cederlöf <mcider@gmail.com>
Signed-off-by: Wojciech Dubowik <Wojciech.Dubowik@neratec.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
A few mesh utility functions will call
ieee80211_bss_info_change_notify(), and then the caller
might notify the driver of the same change again. Avoid
this redundancy by propagating the BSS changes and
generally calling bss_info_change_notify() once per
change.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When sending a broadcast while at least on of the connected stations is
sleeping, it gets queued and send after a DTIM beacon is sent.
If the packet was to be sent on a vlan interface, the vif used for dequeing
from the per-bss queue does not hold the per-vlan sdata. The correct sdata is
required to use the correct per-vlan broadcast/multicast key.
This patch fixes this by restoring the per-vlan sdata using the skb->dev entry.
Signed-off-by: Michael Braun <michael-dev@fami-braun.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When the vlan device is removed, ps->bc_buf processing can no longer
send its frames.
Signed-off-by: Michael Braun <michael-dev@fami-braun.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add command to trigger radar detection in the driver/FW.
Once radar detection is started it should continuously
monitor for radars as long as the channel active.
If radar is detected usermode notified with 'radar
detected' event.
Scanning and remain on channel functionality must be disabled
while doing radar detection/scanning, and vice versa.
Based on original patch by Victor Goldenshtein <victorg@ti.com>
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add new NL80211_CMD_RADAR_DETECT, which starts the Channel
Availability Check (CAC). This command will also notify the
usermode about events (CAC finished, CAC aborted, radar
detected, NOP finished).
Once radar detection has started it should continuously
monitor for radars as long as the channel is active.
This patch enables DFS for AP mode in nl80211/cfg80211.
Based on original patch by Victor Goldenshtein <victorg@ti.com>
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
[remove WIPHY_FLAG_HAS_RADAR_DETECT again -- my mistake]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Even for non-pfmalloc SKBs, __netif_receive_skb() will do a
tsk_restore_flags() on current unconditionally.
Make __netif_receive_skb() a shim around the existing code, renamed to
__netif_receive_skb_core(). Let __netif_receive_skb() wrap the
__netif_receive_skb_core() call with the task flag modifications, if
necessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
1) Remove a duplicated call to skb_orphan() in pf_key, from Cong Wang.
2) Prepare xfrm and pf_key for algorithms without pf_key support,
from Jussi Kivilinna.
3) Fix an unbalanced lock in xfrm_output_one(), from Li RongQing.
4) Add an IPsec state resolution packet queue to handle
packets that are send before the states are resolved.
5) xfrm4_policy_fini() is unused since 2.6.11, time to remove it.
From Michal Kubecek.
6) The xfrm gc threshold was configurable just in the initial
namespace, make it configurable in all namespaces. From
Michal Kubecek.
7) We currently can not insert policies with mark and mask
such that some flows would be matched from both policies.
Allow this if the priorities of these policies are different,
the one with the higher priority is used in this case.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
They are only used within this file.
Cc: Vlad Yasevich <vyasevic@redhat.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
The following patchset contains three Netfilter fixes, they are:
* Fix conntrack helper re-assignment after NAT mangling if only if
the same helper is attached to the conntrack again, from Florian
Westphal.
* Don't allow the creation of conntrack entries via ctnetlink if the
original and reply tuples are missing, from Florian Westphal.
* Fix broken sysctl interface in nf_ct_reasm while adding netns support
to it, from Michal Kubecek.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The iucv base layer is initialized during the registration of the
first iucv handler. If no handler is registered and the
iucv_reboot_event() notifier is called, a missing check can cause
a kernel panic in iucv_block_cpu(). To solve this issue, check the
IRQ masks invoke iucv_block_cpu() for enabled CPUs only.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There is a check in the completion path for osd requests that
ensures the number of pages allocated is enough to hold the amount
of incoming data expected.
For bio requests coming from rbd the "number of pages" is not really
meaningful (although total length would be). So stop requiring that
nr_pages be supplied for bio requests. This is done by checking
whether the pages pointer is null before checking the value of
nr_pages.
Note that this value is passed on to the messenger, but there it's
only used for debugging--it's never used for validation.
While here, change another spot that used r_pages in a debug message
inappropriately, and also invalidate the r_con_filling_msg pointer
after dropping a reference to it.
This resolves:
http://tracker.ceph.com/issues/3875
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Currently, if the OSD client finds an osd request has had a bio list
attached to it, it drops a reference to it (or rather, to the first
entry on that list) when the request is released.
The code that added that reference (i.e., the rbd client) is
therefore required to take an extra reference to that first bio
structure.
The osd client doesn't really do anything with the bio pointer other
than transfer it from the osd request structure to outgoing (for
writes) and ingoing (for reads) messages. So it really isn't the
right place to be taking or dropping references.
Furthermore, the rbd client already holds references to all bio
structures it passes to the osd client, and holds them until the
request is completed. So there's no need for this extra reference
whatsoever.
So remove the bio_put() call in ceph_osdc_release_request(), as
well as its matching bio_get() call in rbd_osd_req_create().
This change could lead to a crash if old libceph.ko was used with
new rbd.ko. Add a compatibility check at rbd initialization time to
avoid this possibilty.
This resolves:
http://tracker.ceph.com/issues/3798 and
http://tracker.ceph.com/issues/3799
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
An upcoming change implements semantic change that could lead to
a crash if an old version of the libceph kernel module is used with
a new version of the rbd kernel module.
In order to preclude that possibility, this adds a compatibilty
check interface. If this interface doesn't exist, the modules are
obviously not compatible. But if it does exist, this provides a way
of letting the caller know whether it will operate properly with
this libceph module.
Perhaps confusingly, it returns false right now. The semantic
change mentioned above will make it return true.
This resolves:
http://tracker.ceph.com/issues/3800
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The ceph messenger has a few spots that are only used when
bio messages are supported, and that's only when CONFIG_BLOCK
is defined. This surrounds a couple of spots with #ifdef's
that would cause a problem if CONFIG_BLOCK were not present
in the kernel configuration.
This resolves:
http://tracker.ceph.com/issues/3976
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Add an ability to configure a separate "untagged" egress
policy to the VLAN information of the bridge. This superseeds PVID
policy and makes PVID ingress-only. The policy is configured with a
new flag and is represented as a port bitmap per vlan. Egress frames
with a VLAN id in "untagged" policy bitmap would egress
the port without VLAN header.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When VLAN is added to the port, a local fdb entry for that port
(the entry with the mac address of the port) is added for that
VLAN. This way we can correctly determine if the traffic
is for the bridge itself. If the address of the port changes,
we try to change all the local fdb entries we have for that port.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a user adds bridge neighbors, allow him to specify VLAN id.
If the VLAN id is not specified, the neighbor will be added
for VLANs currently in the ports filter list. If no VLANs are
configured on the port, we use vlan 0 and only add 1 entry.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add vlan_id to multicasts groups so that we know which vlan
each group belongs to and can correctly forward to appropriate vlan.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds vlan to unicast fdb entries that are created for
learned addresses (not the manually configured ones). It adds
vlan id into the hash mix and uses vlan as an addditional parameter
for an entry match.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A user may designate a certain vlan as PVID. This means that
any ingress frame that does not contain a vlan tag is assigned to
this vlan and any forwarding decisions are made with this vlan in mind.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At ingress, any untagged traffic is assigned to the PVID.
Any tagged traffic is filtered according to membership bitmap.
At egress, if the vlan matches the PVID, the frame is sent
untagged. Otherwise the frame is sent tagged.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using the RTM_GETLINK dump the vlan filter list of a given
bridge port. The information depends on setting the filter
flag similar to how nic VF info is dumped.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a netlink interface to add and remove vlan configuration on bridge port.
The interface uses the RTM_SETLINK message and encodes the vlan
configuration inside the IFLA_AF_SPEC. It is possble to include multiple
vlans to either add or remove in a single message.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When bridge forwards a frame, make sure that a frame is allowed
to egress on that port.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a frame arrives on a port or transmitted by the bridge,
if we have VLANs configured, validate that a given VLAN is allowed
to enter the bridge.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds an optional infrustructure component to bridge that would allow
native vlan filtering in the bridge. Each bridge port (as well
as the bridge device) now get a VLAN bitmap. Each bit in the bitmap
is associated with a vlan id. This way if the bit corresponding to
the vid is set in the bitmap that the packet with vid is allowed to
enter and exit the port.
Write access the bitmap is protected by RTNL and read access
protected by RCU.
Vlan functionality is disabled by default.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adjusting of data pointers in net/netfilter/nf_conntrack_frag6_*
sysctl table for other namespaces points to wrong netns_frags
structure and has reversed order of entries.
Problem introduced by commit c038a767cd in 3.7-rc1
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In order to avoid any future surprises of kernel panics due to jprobes
function mismatches (as e.g. fixed in 4cb9d6eaf8: sctp: jsctp_sf_eat_sack:
fix jprobes function signature mismatch), we should check both function
types during build and scream loudly if they do not match. __same_type
resolves to __builtin_types_compatible_p, which is 1 in case both types
are the same and 0 otherwise, qualifiers are ignored. Tested by myself.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function jsctp_sf_eat_sack can be made static, no need to extend
its visibility.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This config item has not carried much meaning for a while now and is
almost always enabled by default. As agreed during the Linux kernel
summit, remove it.
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We walk through the bind address list and try to get the best source
address for a given destination. However, currently, we take the
'continue' path of the loop when an entry is invalid (!laddr->valid)
*and* the entry state does not equal SCTP_ADDR_SRC (laddr->state !=
SCTP_ADDR_SRC).
Thus, still, invalid entries with SCTP_ADDR_SRC might not 'continue'
as well as valid entries with SCTP_ADDR_{NEW, SRC, DEL}, with a possible
false baddr and matchlen as a result, causing in worst case dst route
to be false or possibly NULL.
This test should actually be a '||' instead of '&&'. But lets fix it
and make this a bit easier to read by having the condition the same way
as similarly done in sctp_v4_get_dst.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An entry in DAT with the hashed position of 0 can cause a NULL pointer
dereference when the first entry is checked by batadv_choose_next_candidate.
This first candidate automatically has the max value of 0 and the max_orig_node
of NULL. Not checking max_orig_node for NULL in batadv_is_orig_node_eligible
will lead to a NULL pointer dereference when checking for the lowest address.
This problem was added in 785ea11441
("batman-adv: Distributed ARP Table - create DHT helper functions").
Signed-off-by: Pau Koning <paukoning@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch cef401de7b (net: fix possible wrong checksum
generation) fixed wrong checksum calculation but it broke TSO by
defining new GSO type but not a netdev feature for that type.
net_gso_ok() would not allow hardware checksum/segmentation
offload of such packets without the feature.
Following patch fixes TSO and wrong checksum. This patch uses
same logic that Eric Dumazet used. Patch introduces new flag
SKBTX_SHARED_FRAG if at least one frag can be modified by
the user. but SKBTX_SHARED_FRAG flag is kept in skb shared
info tx_flags rather than gso_type.
tx_flags is better compared to gso_type since we can have skb with
shared frag without gso packet. It does not link SHARED_FRAG to
GSO, So there is no need to define netdev feature for this.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A socket timestamp is a sum of the global tcp_time_stamp and
a per-socket offset.
A socket offset is added in places where externally visible
tcp timestamp option is parsed/initialized.
Connections in the SYN_RECV state are not supported, global
tcp_time_stamp is used for them, because repair mode doesn't support
this state. In a future it can be implemented by the similar way
as for TIME_WAIT sockets.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A timestamp can be set, only if a socket is in the repair mode.
This patch adds a new socket option TCP_TIMESTAMP, which allows to
get and set current tcp times stamp.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This functionality is used for restoring tcp sockets. A tcp timestamp
depends on how long a system has been running, so it's differ for each
host. The solution is to set a per-socket offset.
A per-socket offset for a TIME_WAIT socket is inherited from a proper
tcp socket.
tcp_request_sock doesn't have a timestamp offset, because the repair
mode for them are not implemented.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter contacted me with some notes regarding some smatch warnings in the
netpoll code, some of which I introduced with my recent netpoll locking fixes,
some which were there prior. Specifically they were:
net-next/net/core/netpoll.c:243 netpoll_poll_dev() warn: inconsistent
returns mutex:&ni->dev_lock: locked (213,217) unlocked (210,243)
net-next/net/core/netpoll.c:706 netpoll_neigh_reply() warn: potential
pointer math issue ('skb_transport_header(send_skb)' is a 128 bit pointer)
This patch corrects the locking imbalance (the first error), and adds some
parenthesis to correct the second error. Tested by myself. Applies to net-next
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Dan Carpenter <dan.carpenter@oracle.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I get the following build error on next-20130213 due to the following
commit:
commit f05de73bf8 ("skbuff: create
skb_panic() function and its wrappers").
It adds an argument called panic to a function that uses the BUG() macro
which tries to call panic, but the argument masks the panic() function
declaration, resulting in the following error (gcc 4.2.4):
net/core/skbuff.c In function 'skb_panic':
net/core/skbuff.c +126 : error: called object 'panic' is not a function
This is fixed by renaming the argument to msg.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Jean Sacren <sakiwit@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When reading kuids from the wire map them into the initial user
namespace, and validate the mapping succeded.
When reading kgids from the wire map them into the initial user
namespace, and validate the mapping succeded.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
When a new rpc connection is established with an in-kernel server, the
traffic passes through svc_process_common, and svc_set_client and down
into svcauth_unix_set_client if it is of type RPC_AUTH_NULL or
RPC_AUTH_UNIX.
svcauth_unix_set_client then looks at the uid of the credential we
have assigned to the incomming client and if we don't have the groups
already cached makes an upcall to get a list of groups that the client
can use.
The upcall encodes send a rpc message to user space encoding the uid
of the user whose groups we want to know. Encode the kuid of the user
in the initial user namespace as nfs mounts can only happen today in
the initial user namespace.
When a reply to an upcall comes in convert interpret the uid and gid values
from the rpc pipe as uids and gids in the initial user namespace and convert
them into kuids and kgids before processing them further.
When reading proc files listing the uid to gid list cache convert the
kuids and kgids from into uids and gids the initial user namespace. As we are
displaying server internal details it makes sense to display these values
from the servers perspective.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
When writing kuids onto the wire first map them into the initial user
namespace.
When writing kgids onto the wire first map them into the initial user
namespace.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
In svcauth_unix introduce a helper unix_gid_hash as otherwise the
expresion to generate the hash value is just too long.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
For each received uid call make_kuid and validate the result.
For each received gid call make_kgid and validate the result.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
- Use from_kuid when generating the on the wire uid values.
- Use make_kuid when reading on the wire values.
In gss_encode_v0_msg, since the uid in gss_upcall_msg is now a kuid_t
generate the necessary uid_t value on the stack copy it into
gss_msg->databuf where it can safely live until the message is no
longer needed.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
In auth unix there are a couple of places INVALID_GID is used a
sentinel to mark the end of uc_gids array. Use gid_valid
as a type safe way to verify we have not hit the end of
valid data in the array.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
When printing kuids and kgids for debugging purpropses convert them
to ordinary integers so their values can be fed to the oridnary
print functions.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
In unx_create_cred directly assign gids from acred->group_info
to cred->uc_gids.
In unx_match directly compare uc_gids with group_info.
Now that both group_info and unx_cred gids are stored as kgids
this is valid and the extra layer of translation can be removed.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
When comparing uids use uid_eq instead of ==.
When comparing gids use gid_eq instead of ==.
And unfortunate cost of type safety.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Convert variables that store uids and gids to be of type
kuid_t and kgid_t instead of type uid_t and gid_t.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Instead of (uid_t)0 use GLOBAL_ROOT_UID.
Instead of (gid_t)0 use GLOBAL_ROOT_GID.
Instead of (uid_t)-1 use INVALID_UID
Instead of (gid_t)-1 use INVALID_GID.
Instead of NOGROUP use INVALID_GID.
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Intel Wireless devices are able to make a TCP connection
after suspending, sending some data and waking up when
the connection receives wakeup data (or breaks). Add the
WoWLAN configuration and feature advertising API for it.
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When MCS rates start to get bad in 2.4 GHz because of long range or
strong interference, CCK rates can be a lot more robust.
This patch adds a pseudo MCS group containing CCK rates (long preamble
in the lower 4 slots, short preamble in the upper slots).
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
[make minstrel_ht_get_stats static]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
cfg80211_find_vendor_ie() was checking only that the vendor IE would
fit in the remaining IEs buffer. If a corrupt includes a vendor IE
that is too small, we could potentially overrun the IEs buffer.
Fix this by checking that the vendor IE fits in the reported IE length
field and skip it otherwise.
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Luciano Coelho <coelho@ti.com>
[change BUILD_BUG_ON to != 1 (from >= 2)]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If user knows the location of a wowlan pattern to be matched in
Rx packet, he can provide an offset with the pattern. This will
help drivers to ignore initial bytes and match the pattern
efficiently.
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
[refactor pattern sending]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Current act_police uses rate table computed by the "tc" userspace
program, which has the following issue:
The rate table has 256 entries to map packet lengths to token (time
units). With TSO sized packets, the 256 entry granularity leads to
loss/gain of rate, making the token bucket inaccurate.
Thus, instead of relying on rate table, this patch explicitly computes
the time and accounts for packet transmission times with nanosecond
granularity.
This is a followup to 56b765b79e
("htb: improved accuracy at high rates").
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's not used anywhere else, so move it.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current TBF uses rate table computed by the "tc" userspace program,
which has the following issue:
The rate table has 256 entries to map packet lengths to
token (time units). With TSO sized packets, the 256 entry granularity
leads to loss/gain of rate, making the token bucket inaccurate.
Thus, instead of relying on rate table, this patch explicitly computes
the time and accounts for packet transmission times with nanosecond
granularity.
This is a followup to 56b765b79e
("htb: improved accuracy at high rates").
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tbf will need to schedule watchdog in ns. No need to convert it twice.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As it is going to be used in tbf as well, push these to generic code.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These are in ns so convert from ticks to ns.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These are initialized correctly a couple of lines later in the
function.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
The bnx2x gso_type setting bug fix in 'net' conflicted with
changes in 'net-next' that broke the gso_* setting logic
out into a seperate function, which also fixes the bug in
question. Thus, use the 'net-next' version.
Signed-off-by: David S. Miller <davem@davemloft.net>
in htb_change_class() cl->buffer and cl->buffer are stored in ns.
So in dump, convert them back to psched ticks.
Note this was introduced by:
commit 56b765b79e
htb: improved accuracy at high rates
Please consider this for -net/-stable.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit (32f5376 netfilter: nf_ct_helper: disable automatic helper
re-assignment of different type) broke transparent proxy scenarios.
For example, initial helper lookup might yield "ftp" (dport 21),
while re-lookup after REDIRECT yields "ftp-2121".
This causes the autoassign code to toss the ftp helper, even
though these are just different instances of the same helper.
Change the test to check for the helper function address instead
of the helper address, as suggested by Pablo.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@gnumonks.org>
Userspace can cause kernel panic by not specifying orig/reply
tuple: kernel will create a tuple with random stack values.
Problem is that tuple.dst.dir will be random, too, which
causes nf_ct_tuplehash_to_ctrack() to return garbage.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@gnumonks.org>
John W. Linville says:
====================
Here is another handful of late-breaking fixes intended for the 3.8
stream... Hopefully the will still make it! :-)
There are three mac80211 fixes pulled from Johannes:
"Here are three fixes still for the 3.8 stream, the fix from Cong Ding
for the bad sizeof (Stephen Hemminger had pointed it out before but I'd
promptly forgotten), a mac80211 managed-mode channel context usage fix
where a downgrade would never stop until reaching non-HT and a bug in
the channel determination that could cause invalid channels like HT40+
on channel 11 to be used."
Also included is a mwl8k fix that avoids an oops when using mwl8k
devices that only support the 5 GHz band.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tommi was fuzzing with trinity and reported the following problem :
commit 3f518bf745 (datagram: Add offset argument to __skb_recv_datagram)
missed that a raw socket receive queue can contain skbs with no payload.
We can loop in __skb_recv_datagram() with MSG_PEEK mode, because
wait_for_packet() is not prepared to skip these skbs.
[ 83.541011] INFO: rcu_sched detected stalls on CPUs/tasks: {}
(detected by 0, t=26002 jiffies, g=27673, c=27672, q=75)
[ 83.541011] INFO: Stall ended before state dump start
[ 108.067010] BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child31:2847]
...
[ 108.067010] Call Trace:
[ 108.067010] [<ffffffff818cc103>] __skb_recv_datagram+0x1a3/0x3b0
[ 108.067010] [<ffffffff818cc33d>] skb_recv_datagram+0x2d/0x30
[ 108.067010] [<ffffffff819ed43d>] rawv6_recvmsg+0xad/0x240
[ 108.067010] [<ffffffff818c4b04>] sock_common_recvmsg+0x34/0x50
[ 108.067010] [<ffffffff818bc8ec>] sock_recvmsg+0xbc/0xf0
[ 108.067010] [<ffffffff818bf31e>] sys_recvfrom+0xde/0x150
[ 108.067010] [<ffffffff81ca4329>] system_call_fastpath+0x16/0x1b
Reported-by: Tommi Rantala <tt.rantala@gmail.com>
Tested-by: Tommi Rantala <tt.rantala@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad says: The whole multiple cookie keys code is completely unused
and has been all this time. Noone uses anything other then the
secret_key[0] since there is no changeover support anywhere.
Thus, for now clean up its left-over fragments.
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9p has thre strucrtures that can encode inode stat information. Modify
all of those structures to contain kuid_t and kgid_t values. Modify
he wire encoders and decoders of those structures to use 'u' and 'g' instead of
'd' in the format string where uids and gids are present.
This results in all kuid and kgid conversion to and from on the wire values
being performed by the same code in protocol.c where the client is known
at the time of the conversion.
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Modify the p9_client_rpc format specifiers of every function that
directly transmits a uid or a gid from 'd' to 'u' or 'g' as
appropriate.
Modify those same functions to take kuid_t and kgid_t parameters
instead of uid_t and gid_t parameters.
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
This allows concentrating all of the conversion to and from kuids and
kgids into the format needed by the 9p protocol into one location.
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Today ceph opens tcp sockets from a delayed work callback. Delayed
work happens from kernel threads which are always in the initial
network namespace. Therefore fail early if someone attempts
to mount a ceph filesystem from something other than the initial
network namespace.
Cc: Sage Weil <sage@inktank.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Bail out if no update is made to the SMPS state. This
allows the driver to avoid duplicating the state.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
With my recent commit I introduced two sparse warnings. Looking closer there
were a few more in the same file, so I fixed them all up. Basic rcu pointer
dereferencing suff.
I've validated these changes using CONFIG_PROVE_RCU while starting and stopping
netconsole repeatedly in bonded and non-bonded configurations
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: fengguang.wu@intel.com
CC: David Miller <davem@davemloft.net>
CC: eric.dumazet@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
__netpoll_rcu_free is used to free netpoll structures when the rtnl_lock is
already held. The mechanism is used to asynchronously call __netpoll_cleanup
outside of the holding of the rtnl_lock, so as to avoid deadlock.
Unfortunately, __netpoll_cleanup modifies pointers (dev->np), which means the
rtnl_lock must be held while calling it. Further, it cannot be held, because
rcu callbacks may be issued in softirq contexts, which cannot sleep.
Fix this by converting the rcu callback to a work queue that is guaranteed to
get scheduled in process context, so that we can hold the rtnl properly while
calling __netpoll_cleanup
Tested successfully by myself.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Cong Wang <amwang@redhat.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create skb_panic() function in lieu of both skb_over_panic() and
skb_under_panic() so that code duplication would be avoided. Update type
and variable name where necessary.
Jiri Pirko suggested using wrappers so that we would be able to keep the
fruits of the original code.
Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
We've got a couple of races when enabling powersave with an AP for
off-channel operation. The first is fairly simple. If we go off-channel
before the nullfunc frame to enable PS is transmitted then it may not be
received by the AP. Add a flush after enabling off-channel PS to prevent
this from happening.
The second race is a bit more subtle. If the driver supports QoS and has
frames queued when the nullfunc frame is queued, those frames may get
transmitted after the nullfunc frame. If PM is not set then the AP is
being told that we've exited PS before we go off-channel and may try to
deliver frames. To prevent this, add a flush after stopping the queues
but before passing the nullfunc frame to the driver.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Scans currently work by stopping the netdev tx queues but leaving the
mac80211 queues active. This stops the flow of incoming packets while
still allowing mac80211 to transmit nullfunc and probe request frames to
facilitate scanning. However, the driver may try to wake the mac80211
queues while in this state, which will also wake the netdev queues.
To prevent this, add a new queue stop reason,
IEEE80211_QUEUE_STOP_REASON_OFFCHANNEL, to be used when stopping the tx
queues for off-channel operation. This prevents the netdev queues from
waking when a driver wakes the mac80211 queues.
This also stops all frames from being transmitted, even those meant to
be sent off-channel. Add a new tx control flag,
IEEE80211_TX_CTL_OFFCHAN_TX_OK, which allows frames to be transmitted
when the queues are stopped only for the off-channel stop reason. Update
all locations transmitting off-channel frames to use this flag.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It's more of an unnecessary micro-optimization and it prevents switching
from long-GI to short-GI in HT20/single-stream for the lowest rate
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Spanning Tree Protocol packets should have always been marked as
control packets, this causes them to get queued in the high prirority
FIFO. As Radia Perlman mentioned in her LCA talk, STP dies if bridge
gets overloaded and can't communicate. This is a long-standing bug back
to the first versions of Linux bridge.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sparse spotted local function that could be static.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes sparse complaints about dropping __user in casts.
warning: cast removes address space of expression
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And remove no longer used br->flags.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Erik Hugne <erik.hugne@ericsson.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
v2:
a) moved before multicast source address check
b) changed comment to netdev style
Cc: Erik Hugne <erik.hugne@ericsson.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: Erik Hugne <erik.hugne@ericsson.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we get to association, the AP station already exists and
is marked authenticated, so moving it into IEEE80211_STA_AUTH
again is a NOP, remove it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Now that we have channel contexts, idle is (pretty
much) equivalent to not having a channel context.
Change the code to use this relation so that there
no longer is a need for a lot of idle recalculate
calls everywhere.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There are only a few drivers that use HW scan, and
all of those don't need a non-idle transition before
starting the scan -- some don't even care about idle
at all. Remove the flag and code associated with it.
The only driver that really actually needed this is
wl1251 and it can just do it itself in the hw_scan
callback -- implement that.
Acked-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The functions were added for some sort of Bluetooth
coexistence, but aren't used, so remove them again.
Reviewed-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In order to be able to predict the next DTIM TBTT
in the driver, add the ability to use timing data
from beacons only with the new hardware flag
IEEE80211_HW_TIMING_BEACON_ONLY and the BSS info
value sync_dtim_count which is only valid if the
timing data came from a beacon. The data can only
come from a beacon, and if no beacon was received
before association it is updated later together
with the DTIM count notification.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
While technically the TSF isn't an IE, it can be
necessary to distinguish between the TSF from a
beacon and a probe response, in particular in
order to know the next DTIM TBTT, as not all APs
are spec compliant wrt. TSF==0 being a DTIM TBTT
and thus the DTIM count needs to be taken into
account as well.
To allow this, move the TSF into the IE struct
so it can be known whence it came.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There's no way scan BSS IEs can be NULL as even
if the allocation fails the frame is discarded.
Remove some code checking for this and document
that it is always non-NULL.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add debugfs driver callbacks so drivers can add
debugfs entries for interfaces. Note that they
_must_ remove the entries again as add/remove in
the driver doesn't correspond to add/remove in
debugfs; the former is up/down while the latter
is netdev create/destroy.
Signed-off-by: Alexander Bondar <alexander.bondar@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently, cfg80211 will copy beacon IEs from a previously
received hidden SSID beacon to a probe response entry, if
that entry is created after the beacon entry. However, if
it is the other way around, or if the beacon is updated,
such changes aren't propagated.
Fix this by tracking the relation between the probe
response and beacon BSS structs in this case.
In case drivers have private data stored in a BSS struct
and need access to such data from a beacon entry, cfg80211
now provides the hidden_beacon_bss pointer from the probe
response entry to the beacon entry.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently the code assigns channel contexts to VLANs
(for use by the TX/RX code) when the AP master gets
its channel context assigned. This works fine, but
in the upcoming radar detection work the VLANs don't
require a channel context (during radar detection)
and assigning one to them anyway causes issues with
locking and also inconsistencies -- a VLAN interface
that is added before radar detection would get the
channel context, while one added during it wouldn't.
Fix these issues moving the channel context copying
to a new explicit operation that will not be used
in the radar detection code.
Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The chandef tracing writes center_freq1 twice, so
that it is always 0 (no driver supports 80+80 yet)
and leaves center_freq2 unset. Fix this mistake.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The messages currently refer to probe request probes,
but on some devices null data packets will be used
instead. Make the messages more generic.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch fixes the problem which was discussed in
"mac80211: Fix PN corruption in case of multiple
virtual interface" [1].
Amit Shakya reported a serious issue with my patch:
mac80211: serialize rx path workers" [2]:
In case, ieee80211_rx_handlers processing is going on
for skbs received on one vif and at the same time, rx
aggregation reorder timer expires on another vif then
sta_rx_agg_reorder_timer_expired is invoked and it will
push skbs into the single queue (local->rx_skb_queue).
ieee80211_rx_handlers in the while loop assumes that
the skbs are for the same sdata and sta. This assumption
doesn't hold good in this scenario and the PN gets
corrupted by PN received in other vif's skb, causing
traffic to stop due to PN mismatch."
[1] Message-Id: http://mid.gmane.org/201302041844.44436.chunkeey@googlemail.com
[2] Commit-Id: 24a8fdad35
Reported-by: Amit Shakya <amit.shakya@stericsson.com>
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There seems to be no reason, why it has to be limited to 2.4 GHz.
Signed-off-by: Emanuel Taube <emanuel.taube@gmail.com>
[remove 'local' variable]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The patch "mac80211: clean up mesh sta allocation warning"
moved some mesh initialization into a path which is only
called when the kernel handles peering. This causes a hang
when mac80211 tries to clean up a userspace-allocated
station entry and delete a timer which has never been
initialized.
To avoid this, only do any mesh sta peering teardown if
the kernel is actually handling it.
The same is true when quiescing before suspend.
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fix most kernel-doc warnings, for some reason it
seems to have issues with __aligned, don't remove
the documentation entries it considers to be in
excess due to that.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This prepares for using the spinlock instead of krefs
which is needed in the next patch to track the refs
of combined BSSes correctly.
Acked-by: Bing Zhao <bzhao@marvell.com> [mwifiex]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When a driver requests a specific regulatory domain after cfg80211 already
has one, a struct ieee80211_regdomain is leaked.
Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We currently can not insert policies with mark and mask
such that some flows would be matched from both policies.
We make this possible when the priority of these policies
are different. If both policies match a flow, the one with
the higher priority is used.
Reported-by: Emmanuel Thierry <emmanuel.thierry@telecom-bretagne.eu>
Reported-by: Romain Kuntz <r.kuntz@ipflavors.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
When trying to connect to an AP that advertises HT but not
VHT, the mac80211 code erroneously uses the configuration
from the AP as is instead of checking it against regulatory
and local capabilities. This can lead to using an invalid
or even inexistent channel (like 11/HT40+).
Additionally, the return flags from downgrading must be
ORed together, to collect them from all of the downgrades.
Also clarify the message.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
To allow both of protocol-specific data and device-specific data
attached with neighbour entry, and to eliminate size calculation
cost when allocating entry, sizeof protocol-speicic data must be
multiple of NEIGH_PRIV_ALIGN. On 64bit archs,
sizeof(struct dn_neigh) is multiple of NEIGH_PRIV_ALIGN, but on
32bit archs, it was not.
Introduce NEIGH_ENTRY_SPACE() macro to ensure that protocol-specific
entry-size meets our requirement.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC4291 (IPv6 addressing architecture) says that interface-Local scope
spans only a single interface on a node. We should not join L2 device
multicast list for addresses in interface-local (or smaller) scope.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter/IPVS fixes for 3.8-rc7, they are:
* Fix oops in IPVS state-sync due to releasing a random memory area due
to unitialized pointer, from Dan Carpenter.
* Fix SCTP flow establishment due to bad checksumming mangling in IPVS,
from Daniel Borkmann.
* Three fixes for the recently added IPv6 NPT, all from YOSHIFUJI Hideaki,
with an amendment collapsed into those patches from Ulrich Weber. They
fiix adjustment calculation, fix prefix mangling and ensure LSB of
prefixes are zeroes (as required by RFC).
Specifically, it took me a while to validate the 1's complement arithmetics/
checksumming approach in the IPv6 NPT code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We should call skb_share_check() before pskb_may_pull(), or we
can crash in pskb_expand_head()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initial implementation of the Multiple VLAN Registration Protocol
(MVRP) from IEEE 802.1Q-2011, based on the existing implementation
of the GARP VLAN Registration Protocol (GVRP).
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initial implementation of the Multiple Registration Protocol (MRP)
from IEEE 802.1Q-2011, based on the existing implementation of the
Generic Attribute Registration Protocol (GARP).
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
VM Sockets allows communication between virtual machines and the hypervisor.
User level applications both in a virtual machine and on the host can use the
VM Sockets API, which facilitates fast and efficient communication between
guest virtual machines and their host. A socket address family, designed to be
compatible with UDP and TCP at the interface level, is provided.
Today, VM Sockets is used by various VMware Tools components inside the guest
for zero-config, network-less access to VMware host services. In addition to
this, VMware's users are using VM Sockets for various applications, where
network access of the virtual machine is restricted or non-existent. Examples
of this are VMs communicating with device proxies for proprietary hardware
running as host applications and automated testing of applications running
within virtual machines.
The VMware VM Sockets are similar to other socket types, like Berkeley UNIX
socket interface. The VM Sockets module supports both connection-oriented
stream sockets like TCP, and connectionless datagram sockets like UDP. The VM
Sockets protocol family is defined as "AF_VSOCK" and the socket operations
split for SOCK_DGRAM and SOCK_STREAM.
For additional information about the use of VM Sockets, please refer to the
VM Sockets Programming Guide available at:
https://www.vmware.com/support/developer/vmci-sdk/
Signed-off-by: George Zhang <georgezhang@vmware.com>
Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Andy king <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Synchronize with 'net' in order to sort out some l2tp, wireless, and
ipv6 GRE fixes that will be built on top of in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
In sctp_auth_make_key_vector, we allocate a temporary sctp_auth_bytes
structure with kmalloc instead of the sctp_auth_create_key allocator.
Change this to sctp_auth_create_key as it is the case everywhere else,
so that we also can properly free it via sctp_auth_key_put. This makes
it easier for future code changes in the structure and allocator itself,
since a single API is consistently used for this purpose. Also, by
using sctp_auth_create_key we're doing sanity checks over the arguments.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the following RCU warning:
[ 51.680236] ===============================
[ 51.681914] [ INFO: suspicious RCU usage. ]
[ 51.683610] 3.8.0-rc6-next-20130206-sasha-00028-g83214f7-dirty #276 Tainted: G W
[ 51.686703] -------------------------------
[ 51.688281] net/ipv6/ip6_flowlabel.c:671 suspicious rcu_dereference_check() usage!
we should use rcu_dereference_bh() when we hold rcu_read_lock_bh().
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to address the fact that some devices cannot support the full 32K
frag size we need to have the value accessible somewhere so that we can use it
to do comparisons against what the device can support. As such I am moving
the values out of skbuff.c and into skbuff.h.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Revert iwlwifi reclaimed packet tracking, it causes problems for a
bunch of folks. From Emmanuel Grumbach.
2) Work limiting code in brcmsmac wifi driver can clear tx status
without processing the event. From Arend van Spriel.
3) rtlwifi USB driver processes wrong SKB, fix from Larry Finger.
4) l2tp tunnel delete can race with close, fix from Tom Parkin.
5) pktgen_add_device() failures are not checked at all, fix from Cong
Wang.
6) Fix unintentional removal of carrier off from tun_detach(),
otherwise we confuse userspace, from Michael S. Tsirkin.
7) Don't leak socket reference counts and ubufs in vhost-net driver,
from Jason Wang.
8) vmxnet3 driver gets it's initial carrier state wrong, fix from Neil
Horman.
9) Protect against USB networking devices which spam the host with 0
length frames, from Bjørn Mork.
10) Prevent neighbour overflows in ipv6 for locally destined routes,
from Marcelo Ricardo. This is the best short-term fix for this, a
longer term fix has been implemented in net-next.
11) L2TP uses ipv4 datagram routines in it's ipv6 code, whoops. This
mistake is largely because the ipv6 functions don't even have some
kind of prefix in their names to suggest they are ipv6 specific.
From Tom Parkin.
12) Check SYN packet drops properly in tcp_rcv_fastopen_synack(), from
Yuchung Cheng.
13) Fix races and TX skb freeing bugs in via-rhine's NAPI support, from
Francois Romieu and your's truly.
14) Fix infinite loops and divides by zero in TCP congestion window
handling, from Eric Dumazet, Neal Cardwell, and Ilpo Järvinen.
15) AF_PACKET tx ring handling can leak kernel memory to userspace, fix
from Phil Sutter.
16) Fix error handling in ipv6 GRE tunnel transmit, from Tommi Rantala.
17) Protect XEN netback driver against hostile frontend putting garbage
into the rings, don't leak pages in TX GOP checking, and add proper
resource releasing in error path of xen_netbk_get_requests(). From
Ian Campbell.
18) SCTP authentication keys should be cleared out and released with
kzfree(), from Daniel Borkmann.
19) L2TP is a bit too clever trying to maintain skb->truesize, and ends
up corrupting socket memory accounting to the point where packet
sending is halted indefinitely. Just remove the adjustments
entirely, they aren't really needed. From Eric Dumazet.
20) ATM Iphase driver uses a data type with the same name as the S390
headers, rename to fix the build. From Heiko Carstens.
21) Fix a typo in copying the inner network header offset from one SKB
to another, from Pravin B Shelar.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
net: sctp: sctp_endpoint_free: zero out secret key data
net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree
atm/iphase: rename fregt_t -> ffreg_t
net: usb: fix regression from FLAG_NOARP code
l2tp: dont play with skb->truesize
net: sctp: sctp_auth_key_put: use kzfree instead of kfree
netback: correct netbk_tx_err to handle wrap around.
xen/netback: free already allocated memory on failure in xen_netbk_get_requests
xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.
xen/netback: shutdown the ring if it contains garbage.
net: qmi_wwan: add more Huawei devices, including E320
net: cdc_ncm: add another Huawei vendor specific device
ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit()
tcp: fix for zero packets_in_flight was too broad
brcmsmac: rework of mac80211 .flush() callback operation
ssb: unregister gpios before unloading ssb
bcma: unregister gpios before unloading bcma
rtlwifi: Fix scheduling while atomic bug
net: usbnet: fix tx_dropped statistics
tcp: ipv6: Update MIB counters for drops
...
When GSSAPI integrity signatures are in use, or when we're using GSSAPI
privacy with the v2 token format, there is a trailing checksum on the
xdr_buf that is returned.
It's checked during the authentication stage, and afterward nothing
cares about it. Ordinarily, it's not a problem since the XDR code
generally ignores it, but it will be when we try to compute a checksum
over the buffer to help prevent XID collisions in the duplicate reply
cache.
Fix the code to trim off the checksums after verifying them. Note that
in unwrap_integ_data, we must avoid trying to reverify the checksum if
the request was deferred since it will no longer be present when it's
revisited.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
On sctp_endpoint_destroy, previously used sensitive keying material
should be zeroed out before the memory is returned, as we already do
with e.g. auth keys when released.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In sctp_setsockopt_auth_key, we create a temporary copy of the user
passed shared auth key for the endpoint or association and after
internal setup, we free it right away. Since it's sensitive data, we
should zero out the key before returning the memory back to the
allocator. Thus, use kzfree instead of kfree, just as we do in
sctp_auth_key_put().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If gb_len is less than 3 it would cause an integer underflow and
possibly memory corruption in nfc_llcp_parse_gb_tlv().
I removed the old test for gb_len == 0. I also removed the test for
->remote_gb == NULL. It's not possible for ->remote_gb to be NULL and
we have already dereferenced ->remote_gb_len so it's too late to test.
The old test return -ENODEV but my test returns -EINVAL.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Fixed-up drivers/net/wireless/iwlwifi/mvm/mac80211.c to change change
IEEE80211_HW_NEED_DTIM_PERIOD to IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC
as requested by Johannes Berg. -- JWL
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Andrew Savchenko reported a DNS failure and we diagnosed that
some UDP sockets were unable to send more packets because their
sk_wmem_alloc was corrupted after a while (tx_queue column in
following trace)
$ cat /proc/net/udp
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops
...
459: 00000000:0270 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4507 2 ffff88003d612380 0
466: 00000000:0277 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4802 2 ffff88003d613180 0
470: 076A070A:007B 00000000:0000 07 FFFF4600:00000000 00:00000000 00000000 123 0 5552 2 ffff880039974380 0
470: 010213AC:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4986 2 ffff88003dbd3180 0
470: 010013AC:007B 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 4985 2 ffff88003dbd2e00 0
470: 00FCA8C0:007B 00000000:0000 07 FFFFFB00:00000000 00:00000000 00000000 0 0 4984 2 ffff88003dbd2a80 0
...
Playing with skb->truesize is tricky, especially when
skb is attached to a socket, as we can fool memory charging.
Just remove this code, its not worth trying to be ultra
precise in xmit path.
Reported-by: Andrew Savchenko <bircoph@gmail.com>
Tested-by: Andrew Savchenko <bircoph@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of calling 3 times ntohs(random->param_hdr.length), 2 times
ntohs(hmacs->param_hdr.length), and 3 times ntohs(chunks->param_hdr.length)
within the same function, we only call each once and store it in a
variable.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For sensitive data like keying material, it is common practice to zero
out keys before returning the memory back to the allocator. Thus, use
kzfree instead of kfree.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross says:
====================
One bug fix for net/3.8 for a long standing problem that was reported a few
times recently.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
My commit f2d9d270c1
("mac80211: support VHT association") introduced a
very stupid bug: the loop to downgrade the channel
width never attempted to actually use it again so
it would downgrade all the way to 20_NOHT. Fix it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
RFC 6296 points that address bits that are not part of the prefix
has to be zeroed.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Make sure only the bits that are part of the prefix are mangled.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cast __wsum from/to __sum16 is wrong. Instead, apply appropriate
conversion function: csum_unfold() or csum_fold().
[ The original patch has been modified to undo the final ~ that
csum_fold returns. We only need to fold the 32-bit word that
results from the checksum calculation into a 16-bit to ensure
that the original subnet is restored appropriately. Spotted by
Ulrich Weber. ]
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
ip6gre_tunnel_xmit() is leaking the skb when we hit this error branch,
and the -1 return value from this function is bogus. Use the error
handling we already have in place in ip6gre_tunnel_xmit() for this error
case to fix this.
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On 64 bit arches :
There is a off-by-one error in qdisc_pkt_len_init() because
mac_header is not set in xmit path.
skb_mac_header() returns an out of bound value that was
harmless because hdr_len is an 'unsigned int'
On 32bit arches, the error is abysmal.
This patch is also a prereq for "macvlan: add multicast filter"
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_gso_segment() is almost always called in tx path,
except for openvswitch. It calls this function when
it receives the packet and tries to queue it to user-space.
In this special case, the ->ip_summed check inside
skb_gso_segment() is no longer true, as ->ip_summed value
has different meanings on rx path.
This patch adjusts skb_gso_segment() so that we can at least
avoid such warnings on checksum.
Cc: Jesse Gross <jesse@nicira.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
head buffer is only temporary available in mac802154_header_create.
So it's not necessary to put it on the heap.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
head buffer is only temporary available in lowpan_header_create.
So it's not necessary to put it on the heap.
Also fixed a comment codestyle issue.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are transients during normal FRTO procedure during which
the packets_in_flight can go to zero between write_queue state
updates and firing the resulting segments out. As FRTO processing
occurs during that window the check must be more precise to
not match "spuriously" :-). More specificly, e.g., when
packets_in_flight is zero but FLAG_DATA_ACKED is true the problematic
branch that set cwnd into zero would not be taken and new segments
might be sent out later.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vercera was recently backporting commit
9c13cb8bb4 to a RHEL kernel, and I noticed that,
while this patch protects the tg3 driver from having its ndo_poll_controller
routine called during device initalization, it does nothing for the driver
during shutdown. I.e. it would be entirely possible to have the
ndo_poll_controller method (or subsequently the ndo_poll) routine called for a
driver in the netpoll path on CPU A while in parallel on CPU B, the ndo_close or
ndo_open routine could be called. Given that the two latter routines tend to
initizlize and free many data structures that the former two rely on, the result
can easily be data corruption or various other crashes. Furthermore, it seems
that this is potentially a problem with all net drivers that support netpoll,
and so this should ideally be fixed in a common path.
As Ben H Pointed out to me, we can't preform dev_open/dev_close in atomic
context, so I've come up with this solution. We can use a mutex to sleep in
open/close paths and just do a mutex_trylock in the napi poll path and abandon
the poll attempt if we're locked, as we'll just retry the poll on the next send
anyway.
I've tested this here by flooding netconsole with messages on a system whos nic
driver I modfied to periodically return NETDEV_TX_BUSY, so that the netpoll tx
workqueue would be forced to send frames and poll the device. While this was
going on I rapidly ifdown/up'ed the interface and watched for any problems.
I've not found any.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Ivan Vecera <ivecera@redhat.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Ben Hutchings <bhutchings@solarflare.com>
CC: Francois Romieu <romieu@fr.zoreil.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All in-kernel users of class_find_device() don't really need mutable
data for match callback.
In two places (kernel/power/suspend_test.c, drivers/scsi/osd/osd_uld.c)
this patch changes match callbacks to use const search data.
The const is propagated to rtc_class_open() and power_supply_get_by_name()
parameters.
Note that there's a dev reference leak in suspend_test.c that's not
touched in this patch.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Calling icmpv6_send() on a local message size error leads to an
incorrect update of the path mtu in the case when IPsec is used.
So use ipv6_local_error() instead to notify the socket about the
error.
Reported-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
alloc failures already get standardized OOM
messages and a dump_stack.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using 'sizeof' on array given as function argument returns
size of a pointer rather than the size of array.
Cc: stable@vger.kernel.org
Signed-off-by: Cong Ding <dinggnu@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The xfrm gc threshold can be configured via xfrm{4,6}_gc_thresh
sysctl but currently only in init_net, other namespaces always
use the default value. This can substantially limit the number
of IPsec tunnels that can be effectively used.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Function xfrm4_policy_fini() is unused since xfrm4_fini() was
removed in 2.6.11.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
As the default, we blackhole packets until the key manager resolves
the states. This patch implements a packet queue where IPsec packets
are queued until the states are resolved. We generate a dummy xfrm
bundle, the output routine of the returned route enqueues the packet
to a per policy queue and arms a timer that checks for state resolution
when dst_output() is called. Once the states are resolved, the packets
are sent out of the queue. If the states are not resolved after some
time, the queue is flushed.
This patch keeps the defaut behaviour to blackhole packets as long
as we have no states. To enable the packet queue the sysctl
xfrm_larval_drop must be switched off.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In our test lab, we have a simple SCTP client connecting to a SCTP
server via an IPVS load balancer. On some machines, load balancing
works, but on others the initial handshake just fails, thus no
SCTP connection whatsoever can be established!
We observed that the SCTP INIT-ACK handshake reply from the IPVS
machine to the client had a correct IP checksum, but corrupt SCTP
checksum when forwarded, thus on the client-side the packet was
dropped and an intial handshake retriggered until all attempts
run into the void.
To fix this issue, this patch i) adds a missing CHECKSUM_UNNECESSARY
after the full checksum (re-)calculation (as done in IPVS TCP and UDP
code as well), ii) calculates the checksum in little-endian format
(as fixed with the SCTP code in commit 4458f04c: sctp: Clean up sctp
checksumming code) and iii) refactors duplicate checksum code into a
common function. Tested by myself.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
TCP Appropriate Byte Count was added by me, but later disabled.
There is no point in maintaining it since it is a potential source
of bugs and Linux already implements other better window protection
heuristics.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
All in-tree ipv4 protocol implementations are now namespace
aware. Therefore all the run-time checks are superfluous.
Reject registry of any non-namespace aware ipv4 protocol.
Eventually we'll remove prot->netns_ok and this registry
time check as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
The infrastructure is already pretty much entirely there
to allow this conversion.
The tunnel and session lookups have per-namespace tables,
and the ipv4 bind lookup includes the namespace in the
lookup key.
Set netns_ok in l2tp_ip_protocol.
Signed-off-by: David S. Miller <davem@davemloft.net>
When creating unmanaged tunnel sockets we should honour the network namespace
passed to l2tp_tunnel_create. Furthermore, unmanaged tunnel sockets should
not hold a reference to the network namespace lest they accidentally keep
alive a namespace which should otherwise have been released.
Unmanaged tunnel sockets now drop their namespace reference via sk_change_net,
and are released in a new pernet exit callback, l2tp_exit_net.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp_tunnel_create is passed a pointer to the network namespace for the
tunnel, along with an optional file descriptor for the tunnel which may
be passed in from userspace via. netlink.
In the case where the file descriptor is defined, ensure that the namespace
associated with that socket matches the namespace explicitly passed to
l2tp_tunnel_create.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The L2TP netlink code can run in namespaces. Set the netnsok flag in
genl_family to true to reflect that fact.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To allow l2tp_tunnel_delete to be called from an atomic context, place the
tunnel socket release calls on a workqueue for asynchronous execution.
Tunnel memory is eventually freed in the tunnel socket destructor.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/intel/e1000e/ethtool.c
drivers/net/vmxnet3/vmxnet3_drv.c
drivers/net/wireless/iwlwifi/dvm/tx.c
net/ipv6/route.c
The ipv6 route.c conflict is simple, just ignore the 'net' side change
as we fixed the same problem in 'net-next' by eliminating cached
neighbours from ipv6 routes.
The e1000e conflict is an addition of a new statistic in the ethtool
code, trivial.
The vmxnet3 conflict is about one change in 'net' removing a guarding
conditional, whilst in 'net-next' we had a netdev_info() conversion.
The iwlwifi conflict is dealing with a WARN_ON() conversion in
'net-next' vs. a revert happening in 'net'.
Signed-off-by: David S. Miller <davem@davemloft.net>
These routines are used by server and client code, so having them in a
separate header would be best.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Since mesh_plink_quiesce() would unconditionally delete
the plink timer, and the timer initialization was recently
moved into the mesh code path, suspending with a non-mesh
interface now causes a crash. Fix this by only deleting
the plink timer for mesh interfaces.
Reported-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Tested-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The header of this file cites to "RFFC2673" which is "Binary Labels in the
Domain Name System". It should refer to "RFC 2637" which is "Point-to-Point
Tunneling Protocol (PPTP)". This patch also corrects the typo RFFC.
Signed-off-by: Reese Moore <ram@vt.edu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch replaces the global lock to one lock per subsystem.
The per-subsystem lock avoids that processes operating
with different subsystems are synchronized.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds the alias flag to support full NOTRACK target
aliasing.
Based on initial patch from Jozsef Kadlecsik.
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hi>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
It was possible to set NF_CONNTRACK=n and NF_CONNTRACK_LABELS=y via
NETFILTER_XT_MATCH_CONNLABEL=y.
warning: (NETFILTER_XT_MATCH_CONNLABEL) selects NF_CONNTRACK_LABELS which has
unmet direct dependencies (NET && INET && NETFILTER && NF_CONNTRACK)
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>