As the member .cmp_addr of sctp_af_inet6, sctp_v6_cmp_addr should also check
the port of addresses, just like sctp_v4_cmp_addr, cause it's invoked by
sctp_cmp_addr_exact().
Now sctp_v6_cmp_addr just check the port when two addresses have different
family, and lack the port check for two ipv6 addresses. that will make
sctp_hash_cmp() cannot work well.
so fix it by adding ports comparison in sctp_v6_cmp_addr().
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f1705ec197 allows IPv6 addresses to be retained on a link down.
The address can have a cached host route which can point to the wrong
FIB table if the L3 enslavement is changed (e.g., route can point to local
table instead of VRF table if device is added to an L3 domain).
On link up check the table of the cached host route against the FIB
table associated with the device and correct if needed.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SCTP probe log timestamps use struct timespec which is
not y2038 safe.
Use struct timespec64 which is 2038 safe instead.
Use monotonic time instead of real time as only time
differences are logged.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP probe log timestamps use struct timespec which is
not y2038 safe. Even though timespec might be good enough here
as it is used to represent delta time, the plan is to get rid
of all uses of timespec in the kernel.
Replace with struct timespec64 which is y2038 safe.
Prints still use unsigned long format and type.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ICMP timestamp messages and IP source route options require
timestamps to be in milliseconds modulo 24 hours from
midnight UT format.
Add inet_current_timestamp() function to support this. The function
returns the required timestamp in network byte order.
Timestamp calculation is also changed to call ktime_get_real_ts64()
which uses struct timespec64. struct timespec64 is y2038 safe.
Previously it called getnstimeofday() which uses struct timespec.
struct timespec is not y2038 safe.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Example usage:
Set the skb priority using skbedit then allow it to be encoded
sudo tc qdisc add dev $ETH root handle 1: prio
sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action skbedit prio 17 \
action ife encode \
allow prio \
dst 02:15:15:15:15:15
Note: You dont need the skbedit action if you are already encoding the
skb priority earlier. A zero skb priority will not be sent
Alternative hard code static priority of decimal 33 (unlike skbedit)
then mark of 0x12 every time the filter matches
sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action ife encode \
type 0xDEAD \
use prio 33 \
use mark 0x12 \
dst 02:15:15:15:15:15
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Example usage:
Set the skb using skbedit then allow it to be encoded
sudo tc qdisc add dev $ETH root handle 1: prio
sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action skbedit mark 17 \
action ife encode \
allow mark \
dst 02:15:15:15:15:15
Note: You dont need the skbedit action if you are already encoding the
skb mark earlier. A zero skb mark, when seen, will not be encoded.
Alternative hard code static mark of 0x12 every time the filter matches
sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action ife encode \
type 0xDEAD \
use mark 0x12 \
dst 02:15:15:15:15:15
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This action allows for a sending side to encapsulate arbitrary metadata
which is decapsulated by the receiving end.
The sender runs in encoding mode and the receiver in decode mode.
Both sender and receiver must specify the same ethertype.
At some point we hope to have a registered ethertype and we'll
then provide a default so the user doesnt have to specify it.
For now we enforce the user specify it.
Lets show example usage where we encode icmp from a sender towards
a receiver with an skbmark of 17; both sender and receiver use
ethertype of 0xdead to interop.
YYYY: Lets start with Receiver-side policy config:
xxx: add an ingress qdisc
sudo tc qdisc add dev $ETH ingress
xxx: any packets with ethertype 0xdead will be subjected to ife decoding
xxx: we then restart the classification so we can match on icmp at prio 3
sudo $TC filter add dev $ETH parent ffff: prio 2 protocol 0xdead \
u32 match u32 0 0 flowid 1:1 \
action ife decode reclassify
xxx: on restarting the classification from above if it was an icmp
xxx: packet, then match it here and continue to the next rule at prio 4
xxx: which will match based on skb mark of 17
sudo tc filter add dev $ETH parent ffff: prio 3 protocol ip \
u32 match ip protocol 1 0xff flowid 1:1 \
action continue
xxx: match on skbmark of 0x11 (decimal 17) and accept
sudo tc filter add dev $ETH parent ffff: prio 4 protocol ip \
handle 0x11 fw flowid 1:1 \
action ok
xxx: Lets show the decoding policy
sudo tc -s filter ls dev $ETH parent ffff: protocol 0xdead
xxx:
filter pref 2 u32
filter pref 2 u32 fh 800: ht divisor 1
filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1 (rule hit 0 success 0)
match 00000000/00000000 at 0 (success 0 )
action order 1: ife decode action reclassify
index 1 ref 1 bind 1 installed 14 sec used 14 sec
type: 0x0
Metadata: allow mark allow hash allow prio allow qmap
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
xxx:
Observe that above lists all metadatum it can decode. Typically these
submodules will already be compiled into a monolithic kernel or
loaded as modules
YYYY: Lets show the sender side now ..
xxx: Add an egress qdisc on the sender netdev
sudo tc qdisc add dev $ETH root handle 1: prio
xxx:
xxx: Match all icmp packets to 192.168.122.237/24, then
xxx: tag the packet with skb mark of decimal 17, then
xxx: Encode it with:
xxx: ethertype 0xdead
xxx: add skb->mark to whitelist of metadatum to send
xxx: rewrite target dst MAC address to 02:15:15:15:15:15
xxx:
sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 u32 \
match ip dst 192.168.122.237/24 \
match ip protocol 1 0xff \
flowid 1:2 \
action skbedit mark 17 \
action ife encode \
type 0xDEAD \
allow mark \
dst 02:15:15:15:15:15
xxx: Lets show the encoding policy
sudo tc -s filter ls dev $ETH parent 1: protocol ip
xxx:
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2 (rule hit 0 success 0)
match c0a87aed/ffffffff at 16 (success 0 )
match 00010000/00ff0000 at 8 (success 0 )
action order 1: skbedit mark 17
index 6 ref 1 bind 1
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
action order 2: ife encode action pipe
index 3 ref 1 bind 1
dst MAC: 02:15:15:15:15:15 type: 0xDEAD
Metadata: allow mark
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
xxx:
test by sending ping from sender to destination
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow for more multicast router port information to be dumped such as
timer and type attributes. For that that purpose we need to extend the
MDBA_ROUTER_PORT attribute similar to how it was done for the mdb entries
recently. The new format is thus:
[MDBA_ROUTER_PORT] = { <- nested attribute
u32 ifindex <- router port ifindex for user-space compatibility
[MDBA_ROUTER_PATTR attributes]
}
This way it remains compatible with older users (they'll simply retrieve
the u32 in the beginning) and new users can parse the remaining
attributes. It would also allow to add future extensions to the router
port without breaking compatibility.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for a temporary router port which doesn't depend only on the
incoming query. It can be refreshed if set to the same value, which is
a no-op for the rest.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is needed for the upcoming temporary port router. There's no point
to go through the logic if the value is the same.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using raw values makes it difficult to extend and also understand the
code, give them names and do explicit per-option manipulation in
br_multicast_set_port_router.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a user explicitly requests VLAN filtering with something like:
# echo 1 > /sys/class/net/<bridge>/bridge/vlan_filtering
Switchdev propagates a SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING port
attribute.
Add support for it in the DSA layer with a new port_vlan_filtering
function to let drivers toggle 802.1Q filtering on user demand.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce devlink infrastructure for drivers to register and expose to
userspace via generic Netlink interface.
There are two basic objects defined:
devlink - one instance for every "parent device", for example switch ASIC
devlink port - one instance for every physical port of the device.
This initial portion implements basic get/dump of objects to userspace.
Also, port splitter and port type setting is implemented.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the initial implementation the only way to stop a rule from being
inserted into the hardware table was via the device feature flag.
However this doesn't work well when working on an end host system
where packets are expect to hit both the hardware and software
datapaths.
For example we can imagine a rule that will match an IP address and
increment a field. If we install this rule in both hardware and
software we may increment the field twice. To date we have only
added support for the drop action so we have been able to ignore
these cases. But as we extend the action support we will hit this
example plus more such cases. Arguably these are not even corner
cases in many working systems these cases will be common.
To avoid forcing the driver to always abort (i.e. the above example)
this patch adds a flag to add a rule in software only. A careful
user can use this flag to build software and hardware datapaths
that work together. One example we have found particularly useful
is to use hardware resources to set the skb->mark on the skb when
the match may be expensive to run in software but a mark lookup
in a hash table is cheap. The idea here is hardware can do in one
lookup what the u32 classifier may need to traverse multiple lists
and hash tables to compute. The flag is only passed down on inserts.
On deletion to avoid stale references in hardware we always try
to remove a rule if it exists.
The flags field is part of the classifier specific options. Although
it is tempting to lift this into the generic structure doing this
proves difficult do to how the tc netlink attributes are implemented
along with how the dump/change routines are called. There is also
precedence for putting seemingly generic pieces in the specific
classifier options such as TCA_U32_POLICE, TCA_U32_ACT, etc. So
although not ideal I've left FLAGS in the u32 options as well as it
simplifies the code greatly and user space has already learned how
to manage these bits ala 'tc' tool.
Another thing if trying to update a rule we require the flags to
be unchanged. This is to force user space, software u32 and
the hardware u32 to keep in sync. Thanks to Simon Horman for
catching this case.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The offload decision was originally very basic and tied to if the dev
implemented the appropriate ndo op hook. The next step is to allow
the user to more flexibly define if any paticular rule should be
offloaded or not. In order to have this logic in one function lift
the current check into a helper routine tc_should_offload().
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements bookkeeping support to compute the maximum
headroom for all the devices in each datapath. When said value
changes, the underlying devs are notified via the
ndo_set_rx_headroom method.
This also increases the internal vports xmit performance.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On bridge needed_headroom changes, the enslaved devices are
notified via the ndo_set_rx_headroom method
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RTS/CTS needs to be enabled if the rate is a fallback rate *or* if it's
a dual-stream rate and the sta is in dynamic SMPS mode.
Cc: stable@vger.kernel.org
Fixes: a3ebb4e1b7 ("mac80211: minstrel_ht: handle peers in dynamic SMPS")
Reported-by: Matías Richart <mrichart@fing.edu.uy>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Public Action frames use special rules for how the BSSID field (Address
3) is set. A wildcard BSSID is used in cases where the transmitter and
recipient are not members of the same BSS. As such, we need to accept
Public Action frames with wildcard BSSID.
Commit db8e173245 ("mac80211: ignore frames between TDLS peers when
operating as AP") added a rule that drops Action frames to TDLS-peers
based on an Action frame having different DA (Address 1) and BSSID
(Address 3) values. This is not correct since it misses the possibility
of BSSID being a wildcard BSSID in which case the Address 1 would not
necessarily match.
Fix this by allowing mac80211 to accept wildcard BSSID in an Action
frame when in AP mode.
Fixes: db8e173245 ("mac80211: ignore frames between TDLS peers when operating as AP")
Cc: stable@vger.kernel.org
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Just like for CCMP we need to check that for GCMP the fragments
have PNs that increment by one; the spec was updated to fix this
security issue and now has the following text:
The receiver shall discard MSDUs and MMPDUs whose constituent
MPDU PN values are not incrementing in steps of 1.
Adapt the code for CCMP to work for GCMP as well, luckily the
relevant fields already alias each other so no code duplication
is needed (just check the aliasing with BUILD_BUG_ON.)
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Similarly, we need to update backlog too when we update qlen.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We saw qlen!=0 but backlog==0 on our production machine:
qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 1 direct_packets_stat 0 ver 3.17
Sent 172680457356 bytes 222469449 pkt (dropped 0, overlimits 123575834 requeues 0)
backlog 0b 72p requeues 0
The problem is we only count qlen for HTB qdisc but not backlog.
We need to update backlog too when we update qlen, so that we
can at least know the average packet length.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the bottom qdisc decides to, for example, drop some packet,
it calls qdisc_tree_decrease_qlen() to update the queue length
for all its ancestors, we need to update the backlog too to
keep the stats on root qdisc accurate.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove nearly duplicated code and prepare for the following patch.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While building with W=1 we got the warning:
net/netfilter/xt_osf.c:265:9: warning: variable 'loop_cont' set but not used
The local variable loop_cont was only initialized and then assigned a
value but was never used or checked after that.
While removing the variable, the case of OSFOPT_TS was not removed so
that it will serve as a reminder to us that we can do something in that
particular case.
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Can be used to randomly match packets e.g. for statistic traffic sampling.
See commit 3ad0040573
("bpf: split state from prandom_u32() and consolidate {c, e}BPF prngs")
for more info why this doesn't use prandom_u32 directly.
Unlike bpf nft_meta can be built as a module, so add an EXPORT_SYMBOL
for prandom_seed_full_state too.
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nfacct_filter_alloc doesn't validate the NFACCT_FILTER_MASK and
NFACCT_FILTER_VALUE parameters which can trigger a NULL pointer
dereference. CAP_NET_ADMIN is required to trigger the bug.
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Lists all neighbours detected by the Echo Locating Protocol
(ELP) and their throughput metric.
Initially Developed by Linus during a 6 months trainee study
period in Ascom (Switzerland) AG.
Signed-off-by: Linus Luessing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
In case of an unused wireless link, the mac80211 throughput estimation
won't get updated further. Consequently, the reported throughput metric
will become obsolete.
With this patch unicast sampling is introduced by periodically sending
unicast ELP packets to each neighbor on idle WiFi links. These sampling
packets will fill an entire frame, so that the measurement is as
reliable as possible
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
In case of wireless interface retrieve the throughput by
querying cfg80211. To perform this call a separate work
must be scheduled because the function may sleep and this
is not allowed within an RCU protected context (RCU in this
case is used to iterate over all the neighbours).
Use ethtool to retrieve information about an Ethernet link
like HALF/FULL_DUPLEX and advertised bandwidth (e.g.
100/10Mbps).
The metric is updated each time a new ELP packet is sent,
this way it is possible to timely react to a metric
variation which can imply (for example) a neighbour
disconnection.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
To enable ELP to send probing packets over wireless links
only if needed, batman-adv must keep track of the last time
it sent a unicast packet towards every neighbour.
For this purpose a 2 main changes are introduced:
1) a new member of the elp_neigh_node structure stores the
last time a unicast packet was sent towards this neighbour;
2) a wrapper function for sending unicast packets is
implemented. This function will simply update the member
describe din point 1) and then forward the packet to the
real sending routine.
Point 2) implies that any code-path leading to a unicast
sending now has to use the new wrapper.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
This attribute is exported to user space to disable the link
throughput auto-detection by setting a fixed value.
The throughput override value is used when batman-adv is
computing the link throughput towards a neighbour.
If the value is set to 0 then batman-adv will try to detect
the throughput by itself.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Add the support for recognising new originators in the
network and rebroadcast their OGMs.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
This is the initial implementation of the new OGM protocol
(version 2). It has been designed to work on top of the
newly added ELP.
In the previous version the OGM protocol was used to both
measure link qualities and flood the network with the metric
information. In this version the protocol is in charge of
the latter task only, leaving the former to ELP.
This means being able to decouple the interval used by the
neighbor discovery from the OGM broadcasting, which revealed
to be costly in dense networks and needed to be relaxed so
leading to a less responsive routing protocol.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
This parameter can be set individually on each interface and
allows the configuration of the elp interval for the link
quality measurements during runtime. Usually it is desirable
to set it to a higher (= slower) value on interfaces which
have a more static characteristic (e.g. wired interfaces)
or very dense neighbourhoods to reduce overhead.
Developed by Linus during a 6 months trainee study period in
Ascom (Switzerland) AG.
Signed-off-by: Linus Luessing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
[antonio@open-mesh.com: respin on top of the latest master]
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Initially developed by Linus during a 6 months trainee study
period in Ascom (Switzerland) AG.
Signed-off-by: Linus Luessing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
The B.A.T.M.A.N. protocol originally only used a single
message type (called OGM) to determine the link qualities to
the direct neighbors and spreading these link quality
information through the whole mesh. This procedure is
summarized on the BATMAN concept page and explained in
details in the RFC draft published in 2008.
This approach was chosen for its simplicity during the
protocol design phase and the implementation. However, it
also bears some drawbacks:
* Wireless interfaces usually come with some packet loss,
therefore a higher broadcast rate is desirable to allow
a fast reaction on flaky connections.
Other interfaces of the same host might be connected to
Ethernet LANs / VPNs / etc which rarely exhibit packet
loss would benefit from a lower broadcast rate to reduce
overhead.
* It generally is more desirable to detect local link
quality changes at a faster rate than propagating all
these changes through the entire mesh (the far end of
the mesh does not need to care about local link quality
changes that much). Other optimizations strategies, like
reducing overhead, might be possible if OGMs weren't
used for all tasks in the mesh at the same time.
As a result detecting local link qualities shall be handled
by an independent message type, ELP, whereas the OGM message
type remains responsible for flooding the mesh with these
link quality information and determining the overall path
transmit qualities.
Developed by Linus during a 6 months trainee study period in
Ascom (Switzerland) AG.
Signed-off-by: Linus Luessing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
This allows us to easily add a sysfs parameter for an
unsigned int later, which is not for a batman mesh interface
(e.g. bat0), but for a common interface instead. It allows
reading and writing an atomic_t in hard_iface (instead of
bat_priv compared to the mesh variant).
Developed by Linus during a 6 months trainee study period in
Ascom (Switzerland) AG.
Signed-off-by: Linus Luessing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
[antonio@open-mesh.com: rename functions and move macros]
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
When the send skbuff reaches the end, nlmsg_put and friends returns
-EMSGSIZE but it is silently thrown away in ndo_fdb_dump. It is called
within a for_each_netdev loop and the first fdb entry of a following
netdev could fit in the remaining skbuff. This breaks the mechanism
of cb->args[0] and idx to keep track of the entries that are already
dumped, which results missing entries in bridge fdb show command.
Signed-off-by: Minoura Makoto <minoura@valinux.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
On reviewing the code I realized that GRE and UDP tunnels could cause a
kernel panic if we used GSO to segment a large UDP frame that was sent
through the tunnel with an outer checksum and hardware offloads were not
available.
In order to correct this we need to update the feature flags that are
passed to the skb_segment function so that in the event of UDP
fragmentation being requested for the inner header the segmentation
function will correctly generate the checksum for the payload if we cannot
segment the outer header.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When selecting an address in context of a VRF, the vrf master should be
preferred for address selection. If it isn't, the user has a hard time
getting the system to select to their preference - the code will pick
the address off the first in-VRF interface it can find, which on a
router could well be a non-routable address.
Signed-off-by: David Lamparter <equinox@diac24.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
[dsa: Fixed comment style and removed extra blank link ]
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lamparter noted a use case where the source address selection fails
to pick an address from a VRF interface - unnumbered interfaces.
Relevant commands from his script:
ip addr add 9.9.9.9/32 dev lo
ip link set lo up
ip link add name vrf0 type vrf table 101
ip rule add oif vrf0 table 101
ip rule add iif vrf0 table 101
ip link set vrf0 up
ip addr add 10.0.0.3/32 dev vrf0
ip link add name dummy2 type dummy
ip link set dummy2 master vrf0 up
--> note dummy2 has no address - unnumbered device
ip route add 10.2.2.2/32 dev dummy2 table 101
ip neigh add 10.2.2.2 dev dummy2 lladdr 02:00:00:00:00:02
tcpdump -ni dummy2 &
And using ping instead of his socat example:
$ ping -I vrf0 -c1 10.2.2.2
ping: Warning: source address might be selected on device other than vrf0.
PING 10.2.2.2 (10.2.2.2) from 9.9.9.9 vrf0: 56(84) bytes of data.
>From tcpdump:
12:57:29.449128 IP 9.9.9.9 > 10.2.2.2: ICMP echo request, id 2491, seq 1, length 64
Note the source address is from lo and is not a VRF local address. With
this patch:
$ ping -I vrf0 -c1 10.2.2.2
PING 10.2.2.2 (10.2.2.2) from 10.0.0.3 vrf0: 56(84) bytes of data.
>From tcpdump:
12:59:25.096426 IP 10.0.0.3 > 10.2.2.2: ICMP echo request, id 2113, seq 1, length 64
Now the source address comes from vrf0.
The ipv4 function for selecting source address takes a const argument.
Removing the const requires touching a lot of places, so instead
l3mdev_master_ifindex_rcu is changed to take a const argument and then
do the typecast to non-const as required by netdev_master_upper_dev_get_rcu.
This is similar to what l3mdev_fib_table_rcu does.
IPv6 for unnumbered interfaces appears to be selecting the addresses
properly.
Cc: David Lamparter <david@opensourcerouting.org>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull Ceph fixes from Sage Weil:
"There are two small messenger bug fixes and a log spam regression fix"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
libceph: don't spam dmesg with stray reply warnings
libceph: use the right footer size when skipping a message
libceph: don't bail early from try_read() when skipping a message
This patch fixes the return value in a case which should never occur.
Instead returning "-EINVAL" we return LOWPAN_IPHC_DAM_00 which is
invalid on context based addresses. Also change the WARN_ON_ONCE to
WARN_ONCE which was suggested by Dan Carpenter.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alexander Aring <aar@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
in case of unusually long writes to some system interfaces used by
mountd and other nfs support utilities.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWzlVJAAoJECebzXlCjuG+fVYQAMHQX2fx33juQF03r7R4A+cI
8QITgkAEDyQLcxYu/NdXqOXYc1Nv83uJRXljS2d2Y/D25spRB8ZaIZtpVFN/RlBc
8ss3nnI/iHFd6fqPECZfgHcORwalY+eYMICc2GdXFADYjyMQDD2/xIxF0GqW2RGC
xLKPPxnrcaIcgUGhFgbptpykKeYXHTK4vb/xwjkgI6m+IISFULAG4Z1jfsbIX+4N
NcNtTumgVQ1lFmMrl36K1JXj/0PY9vPPyDG38nSbD0s6zsS1mm6ALCt9N+mPJdX8
yJAq1FL70HtsBD9/oqm/qg9qpJ1dSB3r867HTXlECp6+mYwNi56u3I0jTJKbqwmJ
4eYlgQr1TPVUoa2vzbt/PE4yuKRsbo52TP1cX8VF+/bEtg0R5sz+uSTQZE9hqJE/
Cfk1eXYJxwVY8BnBY5tgmiGwApN13l/1fnhdZltWWsMEZiYvU46JIhVWv/8zMaA7
XOKP1r7JkWXcbPPvI7/KOxi5O4kzs9Mr4kT6xULrAHEbElPak6MIoODwJOLOMh1X
z82HHvfEd3mwRhrDDg2v0MrdikRx+mtaB3zfNFFbXcHAChRCqCQ0OHxOCE9EJhCg
GVsRpRjzafcSW+NwiQrn8OpjKJ7cSaa4gFXFaikQMFHYaYgavpJVWanH3fX6iKWV
RRPufAXYsyRNFmTi4AOb
=ezck
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.5-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfix from Bruce Fields:
"One fix for a bug that could cause a NULL write past the end of a
buffer in case of unusually long writes to some system interfaces used
by mountd and other nfs support utilities"
* tag 'nfsd-4.5-1' of git://linux-nfs.org/~bfields/linux:
sunrpc/cache: fix off-by-one in qword_get()
This patch defines a new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API,
handled by the new get_link_ksettings/set_link_ksettings callbacks.
This API provides support for most legacy ethtool_cmd fields, adds
support for larger link mode masks (up to 4064 bits, variable length),
and removes ethtool_cmd deprecated
fields (transceiver/maxrxpkt/maxtxpkt).
This API is deprecating the legacy ETHTOOL_GSET/SSET API and provides
the following backward compatibility properties:
- legacy ethtool with legacy drivers: no change, still using the
get_settings/set_settings callbacks.
- legacy ethtool with new get/set_link_ksettings drivers: the new
driver callbacks are used, data internally converted to legacy
ethtool_cmd. ETHTOOL_GSET will return only the 1st 32b of each link
mode mask. ETHTOOL_SSET will fail if user tries to set the
ethtool_cmd deprecated fields to
non-0 (transceiver/maxrxpkt/maxtxpkt). A kernel warning is logged if
driver sets higher bits.
- future ethtool with legacy drivers: no change, still using the
get_settings/set_settings callbacks, internally converted to new data
structure. Deprecated fields (transceiver/maxrxpkt/maxtxpkt) will be
ignored and seen as 0 from user space. Note that that "future"
ethtool tool will not allow changes to these deprecated fields.
- future ethtool with new drivers: direct call to the new callbacks.
By "future" ethtool, what is meant is:
- query: first try ETHTOOL_GLINKSETTINGS, and revert to ETHTOOL_GSET if
fails
- set: query first and remember which of ETHTOOL_GLINKSETTINGS or
ETHTOOL_GSET was successful
+ if ETHTOOL_GLINKSETTINGS was successful, then change config with
ETHTOOL_SLINKSETTINGS. A failure there is final (do not try
ETHTOOL_SSET).
+ otherwise ETHTOOL_GSET was successful, change config with
ETHTOOL_SSET. A failure there is final (do not try
ETHTOOL_SLINKSETTINGS).
The interaction user/kernel via the new API requires a small
ETHTOOL_GLINKSETTINGS handshake first to agree on the length of the link
mode bitmaps. If kernel doesn't agree with user, it returns the bitmap
length it is expecting from user as a negative length (and cmd field is
0). When kernel and user agree, kernel returns valid info in all
fields (ie. link mode length > 0 and cmd is ETHTOOL_GLINKSETTINGS).
Data structure crossing user/kernel boundary is 32/64-bit
agnostic. Converted internally to a legal kernel bitmap.
The internal __ethtool_get_settings kernel helper will gradually be
replaced by __ethtool_get_link_ksettings by the time the first
"link_settings" drivers start to appear. So this patch doesn't change
it, it will be removed before it needs to be changed.
Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add the SO_CNX_ADVICE socket option (setsockopt only). The
purpose is to allow an application to give feedback to the kernel about
the quality of the network path for a connected socket. The value
argument indicates the type of quality report. For this initial patch
the only supported advice is a value of 1 which indicates "bad path,
please reroute"-- the action taken by the kernel is to call
dst_negative_advice which will attempt to choose a different ECMP route,
reset the TX hash for flow label and UDP source port in encapsulation,
etc.
This facility should be useful for connected UDP sockets where only the
application can provide any feedback about path quality. It could also
be useful for TCP applications that have additional knowledge about the
path outside of the normal TCP control loop.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, all ipv6 addresses are flushed when the interface is configured
down, including global, static addresses:
$ ip -6 addr show dev eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
inet6 2100:1::2/120 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
valid_lft forever preferred_lft forever
$ ip link set dev eth1 down
$ ip -6 addr show dev eth1
<< nothing; all addresses have been flushed>>
Add a new sysctl to make this behavior optional. The new setting defaults to
flush all addresses to maintain backwards compatibility. When the set global
addresses with no expire times are not flushed on an admin down. The sysctl
is per-interface or system-wide for all interfaces
$ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1
or
$ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1
Will keep addresses on eth1 on an admin down.
$ ip -6 addr show dev eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
inet6 2100:1::2/120 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
valid_lft forever preferred_lft forever
$ ip link set dev eth1 down
$ ip -6 addr show dev eth1
3: eth1: <BROADCAST,MULTICAST> mtu 1500 state DOWN qlen 1000
inet6 2100:1::2/120 scope global tentative
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe79:34bd/64 scope link tentative
valid_lft forever preferred_lft forever
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
msg.dst_sk needs to be set up with a valid socket because some callbacks
later derive the netns from it.
Fixes: 263ea09084d172d ("Revert "genl: Add genlmsg_new_unicast() for unicast message allocation")
Reported-by: Jon Maloy <maloy@donjonn.com>
Bisected-by: Jon Maloy <maloy@donjonn.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the TIPC module is unloaded, we have identified a race condition
that allows a node reference counter to go to zero and the node instance
being freed before the node timer is finished with accessing it. This
leads to occasional crashes, especially in multi-namespace environments.
The scenario goes as follows:
CPU0:(node_stop) CPU1:(node_timeout) // ref == 2
1: if(!mod_timer())
2: if (del_timer())
3: tipc_node_put() // ref -> 1
4: tipc_node_put() // ref -> 0
5: kfree_rcu(node);
6: tipc_node_get(node)
7: // BOOM!
We now clean up this functionality as follows:
1) We remove the node pointer from the node lookup table before we
attempt deactivating the timer. This way, we reduce the risk that
tipc_node_find() may obtain a valid pointer to an instance marked
for deletion; a harmless but undesirable situation.
2) We use del_timer_sync() instead of del_timer() to safely deactivate
the node timer without any risk that it might be reactivated by the
timeout handler. There is no risk of deadlock here, since the two
functions never touch the same spinlocks.
3: We remove a pointless tipc_node_get() + tipc_node_put() from the
timeout handler.
Reported-by: Zhijiang Hu <huzhijiang@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although we have never seen it happen, we have identified the
following problematic scenario when nodes are stopped and deleted:
CPU0: CPU1:
tipc_node_xxx() //ref == 1
tipc_node_put() //ref -> 0
tipc_node_find() // node still in table
tipc_node_delete()
list_del_rcu(n. list)
tipc_node_get() //ref -> 1, bad
kfree_rcu()
tipc_node_put() //ref to 0 again.
kfree_rcu() // BOOM!
We fix this by introducing use of the conditional kref_get_if_not_zero()
instead of kref_get() in the function tipc_node_find(). This eliminates
any risk of post-mortem access.
Reported-by: Zhijiang Hu <huzhijiang@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to update the skb->csum after pulling the skb, otherwise
an unnecessary checksum (re)computation can ocure for IGMP/MLD packets
in the bridge code. Additionally this fixes the following splats for
network devices / bridge ports with support for and enabled RX checksum
offloading:
[...]
[ 43.986968] eth0: hw csum failure
[ 43.990344] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.4.0 #2
[ 43.996193] Hardware name: BCM2709
[ 43.999647] [<800204e0>] (unwind_backtrace) from [<8001cf14>] (show_stack+0x10/0x14)
[ 44.007432] [<8001cf14>] (show_stack) from [<801ab614>] (dump_stack+0x80/0x90)
[ 44.014695] [<801ab614>] (dump_stack) from [<802e4548>] (__skb_checksum_complete+0x6c/0xac)
[ 44.023090] [<802e4548>] (__skb_checksum_complete) from [<803a055c>] (ipv6_mc_validate_checksum+0x104/0x178)
[ 44.032959] [<803a055c>] (ipv6_mc_validate_checksum) from [<802e111c>] (skb_checksum_trimmed+0x130/0x188)
[ 44.042565] [<802e111c>] (skb_checksum_trimmed) from [<803a06e8>] (ipv6_mc_check_mld+0x118/0x338)
[ 44.051501] [<803a06e8>] (ipv6_mc_check_mld) from [<803b2c98>] (br_multicast_rcv+0x5dc/0xd00)
[ 44.060077] [<803b2c98>] (br_multicast_rcv) from [<803aa510>] (br_handle_frame_finish+0xac/0x51c)
[...]
Fixes: 9afd85c9e4 ("net: Export IGMP/MLD message validation code")
Reported-by: Álvaro Fernández Rojas <noltari@gmail.com>
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
The VLAN GetNext operation is specific to some switches, and thus can be
complicated to implement for some drivers.
Remove the support for the vlan_getnext/port_pvid_get approach in favor
of the generic and simpler port_vlan_dump function.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to port_fdb_dump, add a port_vlan_dump function to DSA drivers
which gets passed the switchdev VLAN object and callback.
This function, if implemented, takes precedence over the soon legacy
vlan_getnext/port_pvid_get approach.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently tc actions are stored in a per-module hashtable,
therefore are visible to all network namespaces. This is
probably the last part of the tc subsystem which is not
aware of netns now. This patch makes them per-netns,
several tc action API's need to be adjusted for this.
The tc action API code is ugly due to historical reasons,
we need to refactor that code in the future.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We only release the memory of the hashtable itself, not its
entries inside. This is not a problem yet since we only call
it in module release path, and module is refcount'ed by
actions. This would be a problem after we move the per module
hinfo into per netns in the latter patch.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* stop critical protocol session on disconnect to avoid
it getting stuck
* wext: fix two RTNL message ordering issues
* fix an uninitialized value (found by KASAN)
* fix an out-of-bounds access (also found by KASAN)
* clear connection keys when freeing them in all cases
(IBSS, all other places already did so)
* fix expected throughput unit to get consistent values
* set default TX aggregation timeout to 0 in minstrel
to avoid (really just hide) issues and perform better
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJWzCthAAoJEGt7eEactAAdLjEP/38ujypJkVlwZzm9L8MbSVPk
qPMmRBshVX89xOXtwJLJHOGCRY6/pfe1LKThodOp+hj+NH7OLLuAGFDScjr7eEjG
bAQrXZ8dDIo54uri4EF4jBfkqUsePs+d4VpLgdJ2Gdh2UgIdDlUcggiEyOSicc27
ZP3gRQzad8tP0O+MuqiAjwJE4C3AUzzj+8TXJJXjrT3RvDpvZfA3MClfiQVhRM+4
3qtFhqFFGfIjTSHCoGr+0RYhLJwfjalIyoENR/kYizjHXU4YtZcu7ihdixKmG3Qb
SUIlNl60WSyoh2+PXx3wp0gfTc0BrOVIpB+TQPouRu5D+tLRiPwlieVrlfJyIUJ5
ccU1RWoTZn+3OVSjNF+9IGbpL43cjaJpZUMU/mQES7+z+uEwlzdcnwDScG7sMMC+
KIyKZ4v+2sBPKYYKAzGxHb2lfTMyp9sCs25U30yqLX3RSNtuxJbDwuKrwbKa1CMx
WEjImI+iSApjTch0THAxV8/sO/ZneuRZP76PXmSDm7Mzf9qob8fnpCjFsHTypDpx
eWy8AHPhIhb3ni6Cm3TfsGLjGE4KGwD/lEwmAles2Cj48N4b2UwRjMrJ5GRxuV00
9kh2MvFEmSYBU4dw6LQmemjxNNIa792+JiS4AKJXBqkPK1H1YFvq7i8Hl9a1tQ0P
SDgbH5x6AYvmTRcD97Ph
=01n9
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2016-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Another small set of fixes:
* stop critical protocol session on disconnect to avoid
it getting stuck
* wext: fix two RTNL message ordering issues
* fix an uninitialized value (found by KASAN)
* fix an out-of-bounds access (also found by KASAN)
* clear connection keys when freeing them in all cases
(IBSS, all other places already did so)
* fix expected throughput unit to get consistent values
* set default TX aggregation timeout to 0 in minstrel
to avoid (really just hide) issues and perform better
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The fix in 35e2d1152b ("tunnels: Allow IPv6 UDP checksums to be correctly
controlled.") changed behavior for bpf_set_tunnel_key() when in use with
IPv6 and thus uncovered a bug that TUNNEL_CSUM needed to be set but wasn't.
As a result, the stack dropped ingress vxlan IPv6 packets, that have been
sent via eBPF through collect meta data mode due to checksum now being zero.
Since after LCO, we enable IPv4 checksum by default, so make that analogous
and only provide a flag BPF_F_ZERO_CSUM_TX for the user to turn it off in
IPv4 case.
Fixes: 35e2d1152b ("tunnels: Allow IPv6 UDP checksums to be correctly controlled.")
Fixes: c6c3345407 ("bpf: support ipv6 for bpf_skb_{set,get}_tunnel_key")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flushing/listing entries was not RCU safe, so parallel flush/dump
could lead to kernel crash. Bug reported by Deniz Eren.
Fixes netfilter bugzilla id #1050.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Commit d15f9d694b ("libceph: check data_len in ->alloc_msg()")
mistakenly bumped the log level on the "tid %llu unknown, skipping"
message. Turn it back into a dout() - stray replies are perfectly
normal when OSDs flap, crash, get killed for testing purposes, etc.
Cc: stable@vger.kernel.org # 4.3+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
ceph_msg_footer is 21 bytes long, while ceph_msg_footer_old is only 13.
Don't skip too much when CEPH_FEATURE_MSG_AUTH isn't negotiated.
Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
The contract between try_read() and try_write() is that when called
each processes as much data as possible. When instructed by osd_client
to skip a message, try_read() is violating this contract by returning
after receiving and discarding a single message instead of checking for
more. try_write() then gets a chance to write out more requests,
generating more replies/skips for try_read() to handle, forcing the
messenger into a starvation loop.
Cc: stable@vger.kernel.org # 3.10+
Reported-by: Varada Kari <Varada.Kari@sandisk.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Varada Kari <Varada.Kari@sandisk.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Otherwise we break the contract with GSO to only pass CHECKSUM_PARTIAL
skbs down. This can easily happen with UDP+IPv4 sockets with the first
MSG_MORE write smaller than the MTU, second write is a sendfile.
Returning -EOPNOTSUPP lets the callers fall back into normal sendmsg path,
were we calculate the checksum manually during copying.
Commit d749c9cbff ("ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked
sockets") started to exposes this bug.
Fixes: d749c9cbff ("ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets")
Reported-by: Jiri Benc <jbenc@redhat.com>
Cc: Jiri Benc <jbenc@redhat.com>
Reported-by: Wakko Warner <wakko@animx.eu.org>
Cc: Wakko Warner <wakko@animx.eu.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We want to try and pull the L4 header in if it is available in the first
fragment. As such add the flag to indicate we want to pull the headers on
the first fragment in.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IPv6 parsing was using a local pointer when it could use the same
pointer as the IPv4 portion of the code since the key_addrs can support
both IPv4 and IPv6 as it is just a pointer.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The flow dissector bits handling FCoE didn't bother to actually validate
that the space there was enough for the FCoE header. So we need to update
things so that if there is room we add the header and report a good result,
otherwise we do not add the header, and report the bad result.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It turns out that for IPv4 we were reporting the ip_proto of the fragment,
and for IPv6 we were not. This patch updates that behavior so that we
always report the IP protocol of the fragment. In addition it takes the
steps of updating the payload offset code so that we will determine the
start of the payload not including the L4 header for any fragment after the
first.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch corrects the logic for the IPv4 parsing so that it is consistent
with how we handle IPv6. Specifically if we do not have the flow key
indicating we want the addresses we still may need to take a look at the IP
fragmentation bits and to see if we should stop after we have recognized
the L3 header.
Fixes: 807e165dc4 ("flow_dissector: Add control/reporting of fragmentation")
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the validation checks for the new array-based TCP SO_REUSEPORT
validation was unintentionally dropped in ea8add2b19. This adds it back.
Lack of this check allows the user to allocate multiple sock_reuseport
structures (leaking all but the first).
Fixes: ea8add2b19 ("tcp/dccp: better use of ephemeral ports in bind()")
Signed-off-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bool variable 'fail' is always non-negative, it indicates an error if it
is true.
The problem has been detected using coccinelle script
scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Acked-by: Alexander Aring <aar@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
In case of multicast address we need to set always the LOWPAN_IPHC_M bit
and if a destination context identifier was found for a multicast
address then we need to set the LOWPAN_IPHC_DAC as well.
Signed-off-by: Alexander Aring <aar@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Factor all assignments to rfkill_global_states[].cur into a single
function rfkill_update_global_state().
Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This prepares the driver for removal of platform data.
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Helper for finding the type based on name. Useful if the
type needs to be determined based on device property.
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
[modify rfkill_types array and BUILD_BUG_ON to not cause errors]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add IEEE80211_RADIOTAP_VHT entry to rtap_namespace_sizes array in order to
define alignment and size of VHT info in tx radiotap
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Today, the supplicant will add the RRM capabilities
Information Element in the association request only if
Quiet period is supported (NL80211_FEATURE_QUIET).
Quiet is one of many RRM features, and there are other RRM
features that are not related to Quiet (e.g. neighbor
report). Therefore, requiring Quiet to enable RRM is too
restrictive.
Some of the features, like neighbor report, can be
supported by user space without any help from the kernel.
Hence adding the RRM capabilities IE to association request
should be the sole user space's decision.
Removing the RRM dependency on Quiet in the driver solves
this problem, but using an old driver with a user space
tool that would not require Quiet feature would be
problematic: the user space would add NL80211_ATTR_USE_RRM
in the association request even if the kernel doesn't
advertize NL80211_FEATURE_QUIET and the association would
be denied by the kernel.
This solution adds a global RRM capability, that tells user
space that it can request RRM capabilities IE publishment
without any specific feature support in the kernel.
Signed-off-by: Beni Lev <beni.lev@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Drivers may need to track which vif is using VHT MU-MIMO.
Move the flag indicationg the ownership of MU_MIMO to
ieee80211_vif.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Provide an interface to the lower level driver to set the VHT
MU-MIMO data. This is needed for example when there is an update
of the group data during low power state, where the management
frame will not be passed to the host at all.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since the PNs of all the tx keys are now tracked in the public
part of the key struct (with atomic counter), we no longer
need these functions.
dvm and vt665{5,6} are currently the only users of these functions,
so update them accordingly.
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some drivers/devices might want to set the IVs by
themselves (and still let mac80211 generate MMIC).
Specifically, this is needed when the device does
offloading at certain times, and the driver has
to make sure that the IVs of new tx frames (from
the host) are synchronized with IVs that were
potentially used during the offloading.
Similarly to CCMP, move the TX IVs of TKIP keys to the
public part of the key struct, and export a function
to add the IV right into the crypto header.
The public tx_pn field is defined as atomic64, so define
TKIP_PN_TO_IV16/32 helper macros to convert it to iv16/32
when needed.
Since the iv32 used for the p1k cache is taken
directly from the frame, we can safely remove
iv16/32 from being protected by tkip.txlock.
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fix wiphy supported_band access in tx radiotap parsing introduced
in commit 5ec3aed9ba4c ("mac80211: Parse legacy and HT rate in
injected frames"). In particular, info->band is always set to 0
(IEEE80211_BAND_2GHZ) since it has not assigned yet.
This cause a kernel crash on 5GHz only devices.
Move ieee80211_parse_tx_radiotap() after info->band assignment
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This massively reduces data copying and thus improves rx performance
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
reuse_skb is set to true if the code decides to use the last segment.
Fixes a memory leak
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The source and destintation addresses in the memcpy arguments
are flipped. Fix that.
Fixes: 23a1f8d44c0b("mac80211: process and save VHT MU-MIMO group frame")
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
mpp_path_del() and mesh_path_del() are mostly the same function.
Move common code into a new static function.
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Henning Rogge <henning.rogge@fkie.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Remember the last time when a mpp table entry is used for
rx or tx and remove them after MESH_PATH_EXPIRE time.
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Henning Rogge <henning.rogge@fkie.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Make the mesh_path_del() function remove all mpp table entries
that are proxied by the removed mesh path.
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Henning Rogge <henning.rogge@fkie.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
PBSS (Personal Basic Service Set) is a new BSS type for DMG
networks. It is similar to infrastructure BSS, having an AP-like
entity called PCP (PBSS Control Point), but it has few differences.
PBSS support is mandatory for 11ad devices.
Add support for PBSS by introducing a new PBSS flag attribute.
The PBSS flag is used in the START_AP command to request starting
a PCP instead of an AP, and in the CONNECT command to request
connecting to a PCP instead of an AP.
Signed-off-by: Lior David <liord@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Use skb_copy_bits in preparation for allowing fragmented skbs
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Prepararation for zero-copy A-MSDU support with page fragment SKBs
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
On error we jumped to the error label and returned the error code but we
missed releasing sinfo.
Fixes: 5fe74014172d ("mac80211: avoid excessive stack usage in sta_info")
Reviewed-by: Julian Calaby <julian.calaby@gmail.com>
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Number of deployed 80 MHz capable VHT stations that do not support 80+80
and 160 MHz bandwidths seem to misbehave when trying to connect to an AP
that advertises 80+80 or 160 MHz channel bandwidth in the VHT Operation
element. To avoid such issues with deployed devices, modify the design
based on recently accepted IEEE 802.11 standard changes (*).
This allows poorly implemented VHT 80 MHz stations to connect with the
AP in 80 MHz mode. 80+80 and 160 MHz capable stations need to support
the new workaround mechanism to allow full bandwidth to be used.
However, there are more or less no impacted station with 80+80/160
capability deployed.
The rebased version of this patch is based on the updated version from
Johannes Berg to take the HT/VHT chandef refactoring into account.
(*) Changes in
https://mentor.ieee.org/802.11/dcn/15/11-15-1530-04-000m-vht160-operation-signaling-through-non-zero-ccfs1.docx
were accepted during the IEEE 802.11 January 2016 meeting.
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If any frames are dropped that are part of a BA session, the reorder
buffer will "indefinitely" (until the timeout) wait for them to come
in (or a BAR moving the window) and won't release frames after them.
This means it isn't possible to filter frames within a BA session in
firmware.
Introduce an API function that allows such filtering. Calling this
function will move the BA window forward to the new SSN, and allows
marking frames after the SSN as having been filtered, so any future
reordering activity will release frames while skipping the holes.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Clean up ieee80211_rx_reorder_ready() callers by passing the RX
TID struct and the index, instead of the frames list. This will
make it more extensible as well.
While at it, move the inline to rx.c as it's only used there.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This will allow drivers to make more educated
decisions whether to defer transmission or not.
Relying on wake_tx_queue() call count implicitly
was not possible because it could be called
without queued frame count actually changing on
software tx aggregation start/stop code paths.
It was also not possible to know how long
byte-wise queue was without dequeueing.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Drivers/devices without their own rate control algorithm can get the
information what rates they should use from either the radiotap header of
injected frames or from the rate control algorithm. But the parsing of the
legacy rate information from the radiotap header was removed in commit
e6a9854b05 ("mac80211/drivers: rewrite the rate control API").
The removal of this feature heavily reduced the usefulness of frame
injection when wanting to simulate specific transmission behavior. Having
rate parsing together with MCS rates and retry support allows a fine
grained selection of the tx behavior of injected frames for these kind of
tests.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The mean_chain_len field in struct mesh_table is copied whenever a
new mesh table is allocated, but only ever has the value 2 and is
never otherwise updated, so just remove it and use the related
define instead.
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When CONFIG_OPTIMIZE_INLINING is set, the sta_info_insert_finish
function consumes more stack than normally, exceeding the
1024 byte limit on ARM:
net/mac80211/sta_info.c: In function 'sta_info_insert_finish':
net/mac80211/sta_info.c:561:1: error: the frame size of 1080 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
It turns out that there are two functions that put a 'struct station_info'
on the stack: __sta_info_destroy_part2 and sta_info_insert_finish, and
this structure alone requires up to 792 bytes.
Hoping that both are called rarely enough, this replaces the
on-stack structure with a dynamic allocation, which unfortunately
requires some suboptimal error handling for out-of-memory.
The __sta_info_destroy_part2 function is actually affected by the
stack usage twice because it calls cfg80211_del_sta_sinfo(), which
has another instance of struct station_info on its stack.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 98b6218388 ("mac80211/cfg80211: add station events")
Fixes: 6f7a8d26e2 ("mac80211: send statistics with delete station event")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Make sure there's at least a debug message whenever the
connection to the AP is terminated.
Also change one message from wiphy_debug() to the common
mlme_dbg().
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When joining IBSS a full scan should be initiated in order to search
for existing cell, unless the fixed_channel parameter was set.
A default channel to create the IBSS on if no cell was found is
provided as well.
However - a scan is initiated only on the default channel provided
regardless of whether ifibss->fixed_channel is set or not, with the
obvious result of the cell not joining existing IBSS cell that is
on another channel.
Fixes: 76bed0f43b ("mac80211: IBSS fix scan request")
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The timestamp given by iwlwifi is at the beginning of the
frame over the air, at (or during) the SYNC field. Allow
such timestamps to be given to mac80211, at least (for now)
for frames with non-HT/VHT preambles.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The ieee80211_queue_stopped() expects hw queue
number but it was given raw WMM AC number instead.
This could cause frame drops and problems with
traffic in some cases - most notably if driver
doesn't map AC numbers to queue numbers 1:1 and
uses ieee80211_stop_queues() and
ieee80211_wake_queue() only without ever calling
ieee80211_wake_queues().
On ath10k it was possible to hit this problem in
the following case:
1. wlan0 uses queue 0
(ath10k maps queues per vif)
2. offchannel uses queue 15
3. queues 1-14 are unused
4. ieee80211_stop_queues()
5. ieee80211_wake_queue(q=0)
6. ieee80211_wake_queue(q=15)
(other queues are not woken up because both
driver and mac80211 know other queues are
unused)
7. ieee80211_rx_h_mesh_fwding()
8. ieee80211_select_queue_80211() returns 2
9. ieee80211_queue_stopped(q=2) returns true
10. frame is dropped (oops!)
Fixes: d3c1597b8d ("mac80211: fix forwarded mesh frame queue mapping")
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The driver can access the queue simultanously
while mac80211 tears down the interface. Without
spinlock protection this could lead to corrupting
sk_buff_head and subsequently to an invalid
pointer dereference.
Fixes: ba8c3d6f16 ("mac80211: add an intermediate software queue implementation")
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Remove unnecessory "if" statement and club it with previos "if" block.
Signed-off-by: Sunil Shahu <shshahu@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This was scheduled to be removed in 2012 by:
commit 69c86373c6
Author: florian@mickler.org <florian@mickler.org>
Date: Wed Feb 24 12:05:16 2010 +0100
Document the rfkill sysfs ABI
This moves sysfs ABI info from Documentation/rfkill.txt to the
ABI subfolder and reformats it.
This also schedules the deprecated sysfs parts to be removed in
2012 (claim file) and 2014 (state file).
Signed-off-by: Florian Mickler <florian@mickler.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
__rfkill_set_hw_state() is only one used in rfkill_set_hw_state(), and
none of them are long or complicated, so merging the two makes the code
easier to read.
Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
RFKILL_BLOCK_SW value have just been saved to prev, no need to check it
again in the if expression. This makes code a little bit easier to read.
Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When suspended while polling is paused, polling will erroneously
resume at resume time. Fix this by tracking pause and suspend in
separate state variable and adding the necessary checks.
Clarify the documentation on this as well.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The station MLME and IBSS/mesh ones use entirely different
code for interpreting HT and VHT operation elements. Change
the code that interprets them a bit - it now modifies an
existing chandef - and use it also in the MLME code.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Print a warning whenever an expected callback function
lacks implementation.
Signed-off-by: Ola Olsson <ola.olsson@sonymobile.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In VHT, the specification allows to limit the number of
MSDUs in an A-MSDU in the Extended Capabilities IE. There
is also a limitation on the byte size in the VHT IE.
In HT, the only limitation is on the byte size.
Parse the capabilities from the peer and make them
available to the driver.
In HT, there is another limitation when a BA agreement
is active: the byte size can't be greater than 4095.
This is not enforced here.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The minimum chandef bandwidth calculation was done only in case
a new station was inserted (or when an existing station was removed).
However, it is possible that stations are inserted before they are
associated, e.g., when FULL_AP_CLIENT_STATE is supported and user
space adds stations unassociated.
Fix this by calling ieee80211_recalc_min_chandef() whenever
a station transitions in/out the associated state, and only
consider station marked as associated.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some drivers offload some frames internally (e.g.
AddBa). Reporting such frames to mac80211 would
only confuse MLME. However it would be useful to
be able to pass such frames to monitor interfaces
for sniffing purposes, e.g. when running AP +
monitor.
To do that allow drivers to tell mac80211 whether
a given frame should be:
- processed but not delivered to any monitor vif
- not processed but delievered to monitor vifs
only
Signed-off-by: Grzegorz Bajorski <grzegorz.bajorski@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Enable driver to manage the reordering logic itself.
This is needed for example for the iwlwifi driver that
will support hardware assisted reordering.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Stable bugfixes:
- Fix nfs_size_to_loff_t
- NFSv4: Fix a dentry leak on alias use
Other bugfixes:
- Don't schedule a layoutreturn if the layout segment can be freed immediately.
- Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode
- rpcrdma_bc_receive_call() should init rq_private_buf.len
- fix stateid handling for the NFS v4.2 operations
- pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page
- fix panic in gss_pipe_downcall() in fips mode
- Fix a race between layoutget and pnfs_destroy_layout
- Fix a race between layoutget and bulk recalls
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWzKShAAoJEGcL54qWCgDyN0QQALiX8v2wvn07vE5ZeXB5uONq
+mfx8avhEoc3NVrpG6F4Kj+yJmHeAbkgIygnhZn4tcM/2YRxGDwlVLHb++yUTHO9
8zEi+tiKx9f5pK2PxRQ0PjavVxO/xOyO0/QNrUdnj8hSNR9ow+YOVjEYUulbuhIg
VAI3oSy5qIKgtDyW7w5PuPpTXLo74hPmyqHaa+ZIr2et//nJMSsw++vAmSg3oqXq
6QkLWPHt/8yvDRRn2hKkbD9gOrFCVfaZIGLM6Q0zRWAcGTzJi94ELzPdm8cVpD1o
eXKcufgLXPt3GOeAmxZ9kwQeebR6IFcvkYom5dsPhtMBuzXu1wpanU8PGgYIQ0VA
88b2YNl+TZpiVbRzxSEellZq5b+zapH/VVVnYptZiq9wUTACc7jK6W2heqe5PzaT
iepTGCAE21tV5JewcITMQHDZiOjRNdtbBzgixI7pNfMN8whU6e5NHYj6psZqT7cf
xEEZzL+RBJuCFKhXSPbBefccA4HCRkDEpT+2QgrMbS4KKfWOg36UNbJ2kgbvcRVi
HTqoRONR6zMzYBhyMlLaUuJ1co8nSHgEsL81Q3MwWSY6gucSW7jeJ2stR20KJIo1
7qgod9Ac/BAIozjzywi0LtmxouPyPU8cqaboMhSRVPDKfFlqZBNBkFLNWwgoYXMa
r1afZQwNeRRbZUR3RulE
=/WDS
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Stable bugfixes:
- Fix nfs_size_to_loff_t
- NFSv4: Fix a dentry leak on alias use
Other bugfixes:
- Don't schedule a layoutreturn if the layout segment can be freed
immediately.
- Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode
- rpcrdma_bc_receive_call() should init rq_private_buf.len
- fix stateid handling for the NFS v4.2 operations
- pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page
- fix panic in gss_pipe_downcall() in fips mode
- Fix a race between layoutget and pnfs_destroy_layout
- Fix a race between layoutget and bulk recalls"
* tag 'nfs-for-4.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4.x/pnfs: Fix a race between layoutget and bulk recalls
NFSv4.x/pnfs: Fix a race between layoutget and pnfs_destroy_layout
auth_gss: fix panic in gss_pipe_downcall() in fips mode
pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page
nfs4: fix stateid handling for the NFS v4.2 operations
NFSv4: Fix a dentry leak on alias use
xprtrdma: rpcrdma_bc_receive_call() should init rq_private_buf.len
pNFS: Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode
pNFS: Fix pnfs_mark_matching_lsegs_return()
nfs: fix nfs_size_to_loff_t
IPCB may contain data from previous layers (in the observed case the
qdisc layer). In the observed scenario, the data was misinterpreted as
ip header options, which later caused the ihl to be set to an invalid
value (<5). This resulted in an infinite loop in the mips implementation
of ip_fast_csum.
This patch clears IPCB(skb)->opt before dst_link_failure can be called for
various types of tunnels. This change only applies to encapsulated ipv4
packets.
The code introduced in 11c21a30 which clears all of IPCB has been removed
to be consistent with these changes, and instead the opt field is cleared
unconditionally in ip_tunnel_xmit. The change in ip_tunnel_xmit applies to
SIT, GRE, and IPIP tunnels.
The relevant vti, l2tp, and pptp functions already contain similar code for
clearing the IPCB.
Signed-off-by: Bernie Harris <bernie.harris@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently it's converted into msecs, thus HZ=1000 intact.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 740b0f1841 ("tcp: switch rtt estimations to usec resolution")
Signed-off-by: David S. Miller <davem@davemloft.net>
DSA drivers may support multiple bridge groups with the same hardware
VLAN. The mv88e6xxx driver which cannot yet, already has its own check
for overlapping bridges. Thus remove the check from the DSA layer.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some DSA drivers may or may not support multiple software bridges on top
of an hardware switch.
It is more convenient for them to access the bridge's net_device for
finer configuration.
Removing the need to craft and access a bitmask also simplifies the
code.
This patch changes the signature of bridge related functions, update DSA
drivers, and removes dsa_slave_br_port_mask.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixed the return value of netdev notifier. If the command is
a don't care a NOTIFY_DONE should be returned. If the command matched a
NOTIFY_OK should be returned.
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Alexander Aring <aar@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch introduce support for IPHC stateful address compression. It
will offer the context table via one debugfs entry.
This debugfs has and directory for each cid entry for the context table.
Inside each cid directory there exists the following files:
- "active": If the entry is added or deleted. The context table is
original a list implementation, this flag will indicate if the
context is part of list or not.
- "prefix": The ipv6 prefix.
- "prefix_length": The prefix length for the prefix.
- "compression": The compression flag according RFC6775.
This part should be moved into sysfs after some testing time.
Also the debugfs entry contains a "show" file which is a pretty-printout
for the current context table information.
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Alexander Aring <aar@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Destroying the workqueue before unregistering the net device caused a
kernel oops
Signed-off-by: Koen Zandberg <koen@bergzand.net>
Acked-by: Alexander Aring <aar@pengutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When the HCI_AUTO_OFF flag is cleared, the power_off delayed work need
to be cancel or HCI will be powered off even if it's managed.
Signed-off-by: Wei-Ning Huang <wnhuang@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Recently a managed version of led_trigger_register was introduced.
Using devm_led_trigger_register allows to simplify the LED trigger code.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Add support for LED triggers to the Bluetooth subsystem and add kernel
config symbol BT_LEDS for it.
For now one trigger for indicating "HCI is powered up" is supported.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The qword_get() function NUL-terminates its output buffer. If the input
string is in hex format \xXXXX... and the same length as the output
buffer, there is an off-by-one:
int qword_get(char **bpp, char *dest, int bufsize)
{
...
while (len < bufsize) {
...
*dest++ = (h << 4) | l;
len++;
}
...
*dest = '\0';
return len;
}
This patch ensures the NUL terminator doesn't fall outside the output
buffer.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When user-space has started a critical protocol session and a disconnect
event occurs, the rdev::crit_prot_nlportid remains set. This caused a
subsequent NL80211_CMD_CRIT_PROTO_START to fail (-EBUSY). Fix this by
clearing the rdev attribute and call .crit_proto_stop() callback upon
disconnect event.
Reviewed-by: Hante Meuleman <meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieterpg@broadcom.com>
Signed-off-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The connection keys are zeroed out in all other cases except this
one. Let's fix the last one as well.
Signed-off-by: Ola Olsson <ola.olsson@sonymobile.com>
Reviewed-by: Julian Calaby <julian.calaby@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The value 5000 was put here with the addition of the timeout field to
ieee80211_start_tx_ba_session. It was originally added in mac80211 to
save resources for drivers like iwlwifi, which only supports a limited
number of concurrent aggregation sessions.
Since iwlwifi does not use minstrel_ht and other drivers don't need
this, 0 is a better default - especially since there have been
recent reports of aggregation setup related issues reproduced with
ath9k. This should improve stability without causing any adverse
effects.
Cc: stable@vger.kernel.org
Acked-by: Avery Pennarun <apenwarr@gmail.com>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The batman-adv source code is the only place in the kernel which uses the
*_free_ref naming scheme for the *_put functions. Changing it to *_put
makes it more consistent and makes it easier to understand the connection
to the *_get functions.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
The batman-adv source code is the only place in the kernel which uses the
*_free_ref naming scheme for the *_put functions. Changing it to *_put
makes it more consistent and makes it easier to understand the connection
to the *_get functions.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
The batman-adv source code is the only place in the kernel which uses the
*_free_ref naming scheme for the *_put functions. Changing it to *_put
makes it more consistent and makes it easier to understand the connection
to the *_get functions.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
The batman-adv source code is the only place in the kernel which uses the
*_free_ref naming scheme for the *_put functions. Changing it to *_put
makes it more consistent and makes it easier to understand the connection
to the *_get functions.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
The batman-adv source code is the only place in the kernel which uses the
*_free_ref naming scheme for the *_put functions. Changing it to *_put
makes it more consistent and makes it easier to understand the connection
to the *_get functions.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
The batman-adv source code is the only place in the kernel which uses the
*_free_ref naming scheme for the *_put functions. Changing it to *_put
makes it more consistent and makes it easier to understand the connection
to the *_get functions.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
The batman-adv source code is the only place in the kernel which uses the
*_free_ref naming scheme for the *_put functions. Changing it to *_put
makes it more consistent and makes it easier to understand the connection
to the *_get functions.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>