Provide a method for read-only access to the vlan device egress mapping.
Do this by refactoring vlan_dev_get_egress_qos_mask() such that now it
receives as an argument the skb priority instead of pointer to the skb.
Such an access is needed for the IBoE stack where the control plane
goes through the network stack. This is an add-on step on top of commit
d4a968658c "net/route: export symbol ip_tos2prio" which allowed the RDMA-CM
to use ip_tos2prio.
Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If appending a received fragment to the pending fragment chain
in a unicast link fails, the current code tries to force a retransmission
of the fragment by decrementing the 'next received sequence number'
field in the link. This is done under the assumption that the failure
is caused by an out-of-memory situation, an assumption that does
not hold true after the previous patch in this series.
A failure to append a fragment can now only be caused by a protocol
violation by the sending peer, and it must hence be assumed that it
is either malicious or buggy. Either way, the correct behavior is now
to reset the link instead of trying to revert its sequence number.
So, this is what we do in this commit.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the first fragment of a long data data message is received on a link, a
reassembly buffer large enough to hold the data from this and all subsequent
fragments of the message is allocated. The payload of each new fragment is
copied into this buffer upon arrival. When the last fragment is received, the
reassembled message is delivered upwards to the port/socket layer.
Not only is this an inefficient approach, but it may also cause bursts of
reassembly failures in low memory situations. since we may fail to allocate
the necessary large buffer in the first place. Furthermore, after 100 subsequent
such failures the link will be reset, something that in reality aggravates the
situation.
To remedy this problem, this patch introduces a different approach. Instead of
allocating a big reassembly buffer, we now append the arriving fragments
to a reassembly chain on the link, and deliver the whole chain up to the
socket layer once the last fragment has been received. This is safe because
the retransmission layer of a TIPC link always delivers packets in strict
uninterrupted order, to the reassembly layer as to all other upper layers.
Hence there can never be more than one fragment chain pending reassembly at
any given time in a link, and we can trust (but still verify) that the
fragments will be chained up in the correct order.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a message fragment is received in a broadcast or unicast link,
the reception code will append the fragment payload to a big reassembly
buffer through a call to the function tipc_recv_fragm(). However, after
the return of that call, the logics goes on and passes the fragment
buffer to the function tipc_net_route_msg(), which will simply drop it.
This behavior is a remnant from the now obsolete multi-cluster
functionality, and has no relevance in the current code base.
Although currently harmless, this unnecessary call would be fatal
after applying the next patch in this series, which introduces
a completely new reassembly algorithm. So we change the code to
eliminate the redundant call.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Highlights include:
- Changes to the RPC socket code to allow NFSv4 to turn off timeout+retry
- Detect TCP connection breakage through the "keepalive" mechanism
- Add client side support for NFSv4.x migration (Chuck Lever)
- Add support for multiple security flavour arguments to the "sec=" mount
option (Dros Adamson)
- fs-cache bugfixes from David Howells:
- Fix an issue whereby caching can be enabled on a file that is open for
writing
- More NFSv4 open code stable bugfixes
- Various Labeled NFS (selinux) bugfixes, including one stable fix
- Fix buffer overflow checking in the RPCSEC_GSS upcall encoding
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQIcBAABAgAGBQJSe8TEAAoJEGcL54qWCgDydu0QAJVtVhfwlUKm/HZ4oAy0Q5T8
rJOWupqGnwyqTNLIRTlNegFSwMY+bABbkihXzSoj641o5zRb200KePlNxknzzlu1
Q715035LDeEC1jrrHHeztTa9uWxAZ9B6gstMzilJYbV72VRYuWA6Q5LstXwQy/jN
ViSldrGJ4sRZUe6wpNLPBRDBfOMWOtZdyRqqqjm71ZHJJnaqQWLBvThTG4MsLlpg
j/khi5189MxJWePTKI9zGZdnXZAZ0ar1tAi1QWDNv044EwsS3LZZIko+YdBh6LZx
9IBwk6TqOXFY0jxPDsIZtTfWPf4pjewRrPINMkjlZl3TJEf97sIlavZ7gWqvVIz5
eXzFGy7D2XBgub8TGcmZM/7keHY/sqghz7lXZ8FulXlVem52r/95NiQ9tu8l8hq3
Ab0FUnjtXeuaDFPBCHlKb3zmCMGFF89VqtpCj2plCPvfcGgJvXJqddWBRisQw9St
UgD1PQWRFGtkrHv5EcQkd5boVdRNjAVAC9PaCWNpOpSVDjJyuUE+v/k75+ZwDcG8
afAFMJSbCwRxW+cFlLAsQTfQztzuWTTOOVQvJDxfyYulcWshyIruhiYItRDfJqRp
RynuVzrBERzUs5wsefnBbC218C/WSlOrodPbsZvdhKolvRx1RNtWT29ilZ6+p2tH
4378ZRLtQvm9RXBnAkRc
=gflJ
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
- Changes to the RPC socket code to allow NFSv4 to turn off
timeout+retry:
* Detect TCP connection breakage through the "keepalive" mechanism
- Add client side support for NFSv4.x migration (Chuck Lever)
- Add support for multiple security flavour arguments to the "sec="
mount option (Dros Adamson)
- fs-cache bugfixes from David Howells:
* Fix an issue whereby caching can be enabled on a file that is
open for writing
- More NFSv4 open code stable bugfixes
- Various Labeled NFS (selinux) bugfixes, including one stable fix
- Fix buffer overflow checking in the RPCSEC_GSS upcall encoding"
* tag 'nfs-for-3.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (68 commits)
NFSv4.2: Remove redundant checks in nfs_setsecurity+nfs4_label_init_security
NFSv4: Sanity check the server reply in _nfs4_server_capabilities
NFSv4.2: encode_readdir - only ask for labels when doing readdirplus
nfs: set security label when revalidating inode
NFSv4.2: Fix a mismatch between Linux labeled NFS and the NFSv4.2 spec
NFS: Fix a missing initialisation when reading the SELinux label
nfs: fix oops when trying to set SELinux label
nfs: fix inverted test for delegation in nfs4_reclaim_open_state
SUNRPC: Cleanup xs_destroy()
SUNRPC: close a rare race in xs_tcp_setup_socket.
SUNRPC: remove duplicated include from clnt.c
nfs: use IS_ROOT not DCACHE_DISCONNECTED
SUNRPC: Fix buffer overflow checking in gss_encode_v0_msg/gss_encode_v1_msg
SUNRPC: gss_alloc_msg - choose _either_ a v0 message or a v1 message
SUNRPC: remove an unnecessary if statement
nfs: Use PTR_ERR_OR_ZERO in 'nfs/nfs4super.c'
nfs: Use PTR_ERR_OR_ZERO in 'nfs41_callback_up' function
nfs: Remove useless 'error' assignment
sunrpc: comment typo fix
SUNRPC: Add correct rcu_dereference annotation in rpc_clnt_set_transport
...
Here's the big driver core / sysfs update for 3.13-rc1.
There's lots of dev_groups updates for different subsystems, as they all
get slowly migrated over to the safe versions of the attribute groups
(removing userspace races with the creation of the sysfs files.) Also
in here are some kobject updates, devres expansions, and the first round
of Tejun's sysfs reworking to enable it to be used by other subsystems
as a backend for an in-kernel filesystem.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlJ6xAMACgkQMUfUDdst+yk1kQCfcHXhfnrvFZ5J/mDP509IzhNS
ddEAoLEWoivtBppNsgrWqXpD1vi4UMsE
=JmVW
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core / sysfs patches from Greg KH:
"Here's the big driver core / sysfs update for 3.13-rc1.
There's lots of dev_groups updates for different subsystems, as they
all get slowly migrated over to the safe versions of the attribute
groups (removing userspace races with the creation of the sysfs
files.) Also in here are some kobject updates, devres expansions, and
the first round of Tejun's sysfs reworking to enable it to be used by
other subsystems as a backend for an in-kernel filesystem.
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (83 commits)
sysfs: rename sysfs_assoc_lock and explain what it's about
sysfs: use generic_file_llseek() for sysfs_file_operations
sysfs: return correct error code on unimplemented mmap()
mdio_bus: convert bus code to use dev_groups
device: Make dev_WARN/dev_WARN_ONCE print device as well as driver name
sysfs: separate out dup filename warning into a separate function
sysfs: move sysfs_hash_and_remove() to fs/sysfs/dir.c
sysfs: remove unused sysfs_get_dentry() prototype
sysfs: honor bin_attr.attr.ignore_lockdep
sysfs: merge sysfs_elem_bin_attr into sysfs_elem_attr
devres: restore zeroing behavior of devres_alloc()
sysfs: fix sysfs_write_file for bin file
input: gameport: convert bus code to use dev_groups
input: serio: remove bus usage of dev_attrs
input: serio: use DEVICE_ATTR_RO()
i2o: convert bus code to use dev_groups
memstick: convert bus code to use dev_groups
tifm: convert bus code to use dev_groups
virtio: convert bus code to use dev_groups
ipack: convert bus code to use dev_groups
...
Now rt6_alloc_cow() is only called by ip6_pol_route() when
rt->rt6i_flags doesn't contain both RTF_NONEXTHOP and RTF_GATEWAY,
and rt->rt6i_flags hasn't been changed in ip6_rt_copy().
So there is no neccessary to judge whether rt->rt6i_flags contains
RTF_GATEWAY or not.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 1e2bd517c1 ("udp6: Fix udp
fragmentation for tunnel traffic.") changed the calculation if
there is enough space to include a fragment header in the skb from a
skb->mac_header dervived one to skb_headroom. Because we already peeled
off the skb to transport_header this is wrong. Change this back to check
if we have enough room before the mac_header.
This fixes a panic Saran Neti reported. He used the tbf scheduler which
skb_gso_segments the skb. The offsets get negative and we panic in memcpy
because the skb was erroneously not expanded at the head.
Reported-by: Saran Neti <Saran.Neti@telus.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery,
their sockets won't accept and install new path mtu information and they
will always use the interface mtu for outgoing packets. It is guaranteed
that the packet is not fragmented locally. But we won't set the DF-Flag
on the outgoing frames.
Florian Weimer had the idea to use this flag to ensure DNS servers are
never generating outgoing fragments. They may well be fragmented on the
path, but the server never stores or usees path mtu values, which could
well be forged in an attack.
(The root of the problem with path MTU discovery is that there is
no reliable way to authenticate ICMP Fragmentation Needed But DF Set
messages because they are sent from intermediate routers with their
source addresses, and the IMCP payload will not always contain sufficient
information to identify a flow.)
Recent research in the DNS community showed that it is possible to
implement an attack where DNS cache poisoning is feasible by spoofing
fragments. This work was done by Amir Herzberg and Haya Shulman:
<https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf>
This issue was previously discussed among the DNS community, e.g.
<http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>,
without leading to fixes.
This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode
regarding local fragmentation with UFO/CORK" for the enforcement of the
non-fragmentable checks. If other users than ip_append_page/data should
use this semantic too, we have to add a new flag to IPCB(skb)->flags to
suppress local fragmentation and check for this in ip_finish_output.
Many thanks to Florian Weimer for the idea and feedback while implementing
this patch.
Cc: David S. Miller <davem@davemloft.net>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code of flow label in Linux Kernel follows
the rules of RFC 1809 (an informational one) for
conditions on flow label sharing. There rules are
not in the last proposed standard for flow label
(RFC 6437), or in the previous one (RFC 3697).
Since this code does not follow any current or
old standard, we can remove it.
With this removal, the ipv6_opt_cmp function is
now a dead code and it can be removed too.
Changelog to v1:
* add justification for the change
* remove the condition on IPv6 options
[ Remove ipv6_hdr_cmp and it is now unused as well. -DaveM ]
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
Please accept the following pull request intended for the 3.13 tree...
I had intended to pass most of these to you as much as two weeks ago.
Unfortunately, I failed to account for the effects of bad Internet
connections and my own fatique/laziness while traveling. On the bright
side, at least these have been baking in linux-next for some time!
For the mac80211 bits, Johannes says:
"This time I have two fixes for P2P (which requires not using CCK rates)
and a workaround for APs with broken WMM information."
For the iwlwifi bits, Johannes says:
"I have a few fixes for warnings/issues: one from Alex, fixing scan
timings, one from Emmanuel fixing a WARN_ON in the DVM driver, one from
Stanislaw removing a trigger-happy WARN_ON in the MVM driver and a
change from myself to try to recover when the device isn't processing
commands quickly."
And:
"For this round, I have a lot of changes:
* power management improvements
* BT coexistence improvements/updates
* new device support
* VHT support
* IBSS support (though due to a small bug it requires new firmware)
* various other fixes/improvements."
For the Bluetooth bits, Gustavo says:
"More patches for 3.12, busy times for Bluetooth. More than a 100 commits since
the last pull. The bulk of work comes from Johan and Marcel, they are doing
fixes and improvements all over the Bluetooth subsystem, as the diffstat can
show."
For the ath10k and ath6kl bits, Kalle says:
"Bartosz added support to ath10k for our 10.x AP firmware branch, which
gives us AP specific features and fixes. We still support the main
firmware branch as well just like before, ath10k detects runtime what
firmware is used. Unfortunately the firmware interface in 10.x branch is
somewhat different so there was quite a lot of changes in ath10k for
this.
Michal and Sujith did some performance improvements in ath10k. Vladimir
fixed a compiler warning and Fengguang removed an extra semicolon."
For the NFC bits, Samuel says:
"It's a fairly big one, with the following highlights:
- NFC digital layer implementation: Most NFC chipsets implement the NFC
digital layer in firmware, but others have more basic functionalities
and expect the host to implement the digital layer. This layer sits
below the NFC core.
- Sony's port100 support: This is "soft" NFC USB dongle that expects the
digital layer to be implemented on the host. This is the first user of
our NFC digital stack implementation.
- Secure element API: We now provide a netlink API for enabling,
disabling and discovering NFC attached (embedded or UICC ones) secure
elements. With some userspace help, this allows us to support NFC
payments.
Only the pn544 driver currently supports that API.
- NCI SPI fixes and improvements: In order to support NCI devices over
SPI, we fixed and improved our NCI/SPI implementation. The currently
most deployed NFC NCI chipset, Broadcom's bcm2079x, supports that mode
and we're planning to use our NCI/SPI framework to implement a
driver for it.
- pn533 fragmentation support in target mode: This was the only missing
feature from our pn533 impementation. We now support fragmentation in
both Tx and Rx modes, in target mode."
On top of all that, brcmfmac and rt2x00 both get the usual flurry
of updates. A few other drivers get hit here or there as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sometimes we need to coalesce the rx frags to avoid frag list. One example is
virtio-net driver which tries to use small frags for both MTU sized packet and
GSO packet. So this patch introduce skb_coalesce_rx_frag() to do this.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Michael Dalton <mwdalton@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Slow start now increases cwnd by 1 if an ACK acknowledges some packets,
regardless the number of packets. Consequently slow start performance
is highly dependent on the degree of the stretch ACKs caused by
receiver or network ACK compression mechanisms (e.g., delayed-ACK,
GRO, etc). But slow start algorithm is to send twice the amount of
packets of packets left so it should process a stretch ACK of degree
N as if N ACKs of degree 1, then exits when cwnd exceeds ssthresh. A
follow up patch will use the remainder of the N (if greater than 1)
to adjust cwnd in the congestion avoidance phase.
In addition this patch retires the experimental limited slow start
(LSS) feature. LSS has multiple drawbacks but questionable benefit. The
fractional cwnd increase in LSS requires a loop in slow start even
though it's rarely used. Configuring such an increase step via a global
sysctl on different BDPS seems hard. Finally and most importantly the
slow start overshoot concern is now better covered by the Hybrid slow
start (hystart) enabled by default.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Applications have started to use Fast Open (e.g., Chrome browser has
such an optional flag) and the feature has gone through several
generations of kernels since 3.7 with many real network tests. It's
time to enable this flag by default for applications to test more
conveniently and extensively.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
This batch contains fives nf_tables patches for your net-next tree,
they are:
* Fix possible use after free in the module removal path of the
x_tables compatibility layer, from Dan Carpenter.
* Add filter chain type for the bridge family, from myself.
* Fix Kconfig dependencies of the nf_tables bridge family with
the core, from myself.
* Fix sparse warnings in nft_nat, from Tomasz Bursztyka.
* Remove duplicated include in the IPv4 family support for nf_tables,
from Wei Yongjun.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
This is another batch containing Netfilter/IPVS updates for your net-next
tree, they are:
* Six patches to make the ipt_CLUSTERIP target support netnamespace,
from Gao feng.
* Two cleanups for the nf_conntrack_acct infrastructure, introducing
a new structure to encapsulate conntrack counters, from Holger
Eitzenberger.
* Fix missing verdict in SCTP support for IPVS, from Daniel Borkmann.
* Skip checksum recalculation in SCTP support for IPVS, also from
Daniel Borkmann.
* Fix behavioural change in xt_socket after IP early demux, from
Florian Westphal.
* Fix bogus large memory allocation in the bitmap port set type in ipset,
from Jozsef Kadlecsik.
* Fix possible compilation issues in the hash netnet set type in ipset,
also from Jozsef Kadlecsik.
* Define constants to identify netlink callback data in ipset dumps,
again from Jozsef Kadlecsik.
* Use sock_gen_put() in xt_socket to replace xt_socket_put_sk,
from Eric Dumazet.
* Improvements for the SH scheduler in IPVS, from Alexander Frolkin.
* Remove extra delay due to unneeded rcu barrier in IPVS net namespace
cleanup path, from Julian Anastasov.
* Save some cycles in ip6t_REJECT by skipping checksum validation in
packets leaving from our stack, from Stanislav Fomichev.
* Fix IPVS_CMD_ATTR_MAX definition in IPVS, larger that required, from
Julian Anastasov.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to use the _safe version of list_for_each_entry() here otherwise
we have a use after free bug.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jesse Gross says:
====================
Open vSwitch
A set of updates for net-next/3.13. Major changes are:
* Restructure flow handling code to be more logically organized and
easier to read.
* Rehashing of the flow table is moved from a workqueue to flow
installation time. Before, heavy load could block the workqueue for
excessive periods of time.
* Additional debugging information is provided to help diagnose megaflows.
* It's now possible to match on TCP flags.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a build warning in skb_checksum() by wrapping the
csum_partial() usage in skb_checksum(). The problem is that on a few
architectures, csum_partial is used with prefix asmlinkage whereas
on most architectures it's not. So fix this up generically as we did
with csum_block_add_ext() to match the signature. Introduced by
2817a336d4 ("net: skb_checksum: allow custom update/combine for
walking skb").
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/emulex/benet/be.h
drivers/net/netconsole.c
net/bridge/br_private.h
Three mostly trivial conflicts.
The net/bridge/br_private.h conflict was a function signature (argument
addition) change overlapping with the extern removals from Joe Perches.
In drivers/net/netconsole.c we had one change adjusting a printk message
whilst another changed "printk(KERN_INFO" into "pr_info(".
Lastly, the emulex change was a new inline function addition overlapping
with Joe Perches's extern removals.
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduced in f9e42b8535 ("net: sctp: sideeffect: throw BUG if
primary_path is NULL"), we intended to find a buggy assoc that's
part of the assoc hash table with a primary_path that is NULL.
However, we better remove the BUG_ON for now and find a more
suitable place to assert for these things as Mark reports that
this also triggers the bug when duplication cookie processing
happens, and the assoc is not part of the hash table (so all
good in this case). Such a situation can for example easily be
reproduced by:
tc qdisc add dev eth0 root handle 1: prio bands 2 priomap 1 1 1 1 1 1
tc qdisc add dev eth0 parent 1:2 handle 20: netem loss 20%
tc filter add dev eth0 protocol ip parent 1: prio 2 u32 match ip \
protocol 132 0xff match u8 0x0b 0xff at 32 flowid 1:2
This drops 20% of COOKIE-ACK packets. After some follow-up
discussion with Vlad we came to the conclusion that for now we
should still better remove this BUG_ON() assertion, and come up
with two follow-ups later on, that is, i) find a more suitable
place for this assertion, and possibly ii) have a special
allocator/initializer for such kind of temporary assocs.
Reported-by: Mark Thomas <Mark.Thomas@metaswitch.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
High-availability Seamless Redundancy ("HSR") provides instant failover
redundancy for Ethernet networks. It requires a special network topology where
all nodes are connected in a ring (each node having two physical network
interfaces). It is suited for applications that demand high availability and
very short reaction time.
HSR acts on the Ethernet layer, using a registered Ethernet protocol type to
send special HSR frames in both directions over the ring. The driver creates
virtual network interfaces that can be used just like any ordinary Linux
network interface, for IP/TCP/UDP traffic etc. All nodes in the network ring
must be HSR capable.
This code is a "best effort" to comply with the HSR standard as described in
IEC 62439-3:2010 (HSRv0).
Signed-off-by: Arvid Brodin <arvid.brodin@xdin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Joby Poriyath provided a xen-netback patch to reduce the size of
xenvif structure as some netdev allocation could fail under
memory pressure/fragmentation.
This patch is handling the problem at the core level, allowing
any netdev structures to use vmalloc() if kmalloc() failed.
As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT
to kzalloc() flags to do this fallback only when really needed.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Joby Poriyath <joby.poriyath@citrix.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes an outstanding bug found through IPVS, where SCTP packets
with skb->data_len > 0 (non-linearized) and empty frag_list, but data
accumulated in frags[] member, are forwarded with incorrect checksum
letting SCTP initial handshake fail on some systems. Linearizing each
SCTP skb in IPVS to prevent that would not be a good solution as
this leads to an additional and unnecessary performance penalty on
the load-balancer itself for no good reason (as we actually only want
to update the checksum, and can do that in a different/better way
presented here).
The actual problem is elsewhere, namely, that SCTP's checksumming
in sctp_compute_cksum() does not take frags[] into account like
skb_checksum() does. So while we are fixing this up, we better reuse
the existing code that we have anyway in __skb_checksum() and use it
for walking through the data doing checksumming. This will not only
fix this issue, but also consolidates some SCTP code with core
sk_buff code, bringing it closer together and removing respectively
avoiding reimplementation of skb_checksum() for no good reason.
As crc32c() can use hardware implementation within the crypto layer,
we leave that intact (it wraps around / falls back to e.g. slice-by-8
algorithm in __crc32c_le() otherwise); plus use the __crc32c_le_combine()
combinator for crc32c blocks.
Also, we remove all other SCTP checksumming code, so that we only
have to use sctp_compute_cksum() from now on; for doing that, we need
to transform SCTP checkumming in output path slightly, and can leave
the rest intact.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, skb_checksum walks over 1) linearized, 2) frags[], and
3) frag_list data and calculats the one's complement, a 32 bit
result suitable for feeding into itself or csum_tcpudp_magic(),
but unsuitable for SCTP as we're calculating CRC32c there.
Hence, in order to not re-implement the very same function in
SCTP (and maybe other protocols) over and over again, use an
update() + combine() callback internally to allow for walking
over the skb with different algorithms.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the intent to dump other accounting data later.
This patch is a cleanup.
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Encapsulate counters for both directions into nf_conn_acct. During
that process also consistently name pointers to the extend 'acct',
not 'counters'. This patch is a cleanup.
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We don't validate iph->ihl which may lead a dead loop if we meet a IPIP
skb whose iph->ihl is zero. Fix this by failing immediately when iph->ihl
is evil (less than 5).
This issue were introduced by commit ec5efe7946
(rps: support IPIP encapsulation).
Cc: Eric Dumazet <edumazet@google.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/xfrm/xfrm_policy.c
Minor merge conflict in xfrm_policy.c, consisting of overlapping
changes which were trivial to resolve.
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
1) Fix a possible race on ipcomp scratch buffers because
of too early enabled siftirqs. From Michal Kubecek.
2) The current xfrm garbage collector threshold is too small
for some workloads, resulting in bad performance on these
workloads. Increase the threshold from 1024 to 32768.
3) Some codepaths might not have a dst_entry attached to the
skb when calling xfrm_decode_session(). So add a check
to prevent a null pointer dereference in this case.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Flow->hash can be used to detect hash collisions and avoid flow key
compare in flow lookup.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
tcp_flags=flags/mask
Bitwise match on TCP flags. The flags and mask are 16-bit num‐
bers written in decimal or in hexadecimal prefixed by 0x. Each
1-bit in mask requires that the corresponding bit in port must
match. Each 0-bit in mask causes the corresponding bit to be
ignored.
TCP protocol currently defines 9 flag bits, and additional 3
bits are reserved (must be transmitted as zero), see RFCs 793,
3168, and 3540. The flag bits are, numbering from the least
significant bit:
0: FIN No more data from sender.
1: SYN Synchronize sequence numbers.
2: RST Reset the connection.
3: PSH Push function.
4: ACK Acknowledgement field significant.
5: URG Urgent pointer field significant.
6: ECE ECN Echo.
7: CWR Congestion Windows Reduced.
8: NS Nonce Sum.
9-11: Reserved.
12-15: Not matchable, must be zero.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Widen TCP flags handling from 7 bits (uint8_t) to 12 bits (uint16_t).
The kernel interface remains at 8 bits, which makes no functional
difference now, as none of the higher bits is currently of interest
to the userspace.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
OVS already can handle all types of segmentation offloads that
are supported by the kernel.
Following patch specifically enables UDP and IPV6 segmentation
offloads.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Resolve cherry-picking conflicts:
Conflicts:
mm/huge_memory.c
mm/memory.c
mm/mprotect.c
See this upstream merge commit for more details:
52469b4fcd Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On some codepaths the skb does not have a dst entry
when xfrm_decode_session() is called. So check for
a valid skb_dst() before dereferencing the device
interface index. We use 0 as the device index if
there is no valid skb_dst(), or at reverse decoding
we use skb_iif as device interface index.
Bug was introduced with git commit bafd4bd4dc
("xfrm: Decode sessions with output interface.").
Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
We have one report of a crash in xs_tcp_setup_socket.
The call path to the crash is:
xs_tcp_setup_socket -> inet_stream_connect -> lock_sock_nested.
The 'sock' passed to that last function is NULL.
The only way I can see this happening is a concurrent call to
xs_close:
xs_close -> xs_reset_transport -> sock_release -> inet_release
inet_release sets:
sock->sk = NULL;
inet_stream_connect calls
lock_sock(sock->sk);
which gets NULL.
All calls to xs_close are protected by XPRT_LOCKED as are most
activations of the workqueue which runs xs_tcp_setup_socket.
The exception is xs_tcp_schedule_linger_timeout.
So presumably the timeout queued by the later fires exactly when some
other code runs xs_close().
To protect against this we can move the cancel_delayed_work_sync()
call from xs_destory() to xs_close().
As xs_close is never called from the worker scheduled on
->connect_worker, this can never deadlock.
Signed-off-by: NeilBrown <neilb@suse.de>
[Trond: Make it safe to call cancel_delayed_work_sync() on AF_LOCAL sockets]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch drops the direct memcpy on skb and uses the right skb
memcpy functions. Also remove an unnecessary check if plen is non zero.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is necessary to access network header with the skb_network_header
function instead of calculate the position with mac_len, etc.
Do the same for the transport header, when we replace the IPv6 header
with the 6LoWPAN header.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the mac header length while creating the 802.15.4 mac header.
Drop the function for recalculate mac header length in upper layers
which was static and works for intra pan communication only.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
On receiving side we don't need to set any headers in skb because the
6LoWPAN layer do not access it. Currently these values will set twice
after calling netif_rx.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
After reading the function rt6_check_neigh(), we can
know that the RT6_NUD_FAIL_SOFT can be returned only
when the IS_ENABLE(CONFIG_IPV6_ROUTER_PREF) is false.
so in function find_match(), there is no need to execute
the statement !IS_ENABLED(CONFIG_IPV6_ROUTER_PREF).
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The message dispatching part of tipc_recv_msg() is wrapped layers of
while/if/if/switch, causing out-of-control indentation and does not
look very good. We reduce two indentation levels by separating the
message dispatching from the blocks that checks link state and
sequence numbers, allowing longer function and arg names to be
consistently indented without wrapping. Additionally we also rename
"cont" label to "discard" and add one new label called "unlock_discard"
to make code clearer. In all, these are cosmetic changes that do not
alter the operation of TIPC in any way.
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Andreas Bofjäll <andreas.bofjall@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fast Open currently has a fall back feature to address SYN-data being
dropped but it requires the middle-box to pass on regular SYN retry
after SYN-data. This is implemented in commit aab487435 ("net-tcp:
Fast Open client - detecting SYN-data drops")
However some NAT boxes will drop all subsequent packets after first
SYN-data and blackholes the entire connections. An example is in
commit 356d7d8 "netfilter: nf_conntrack: fix tcp_in_window for Fast
Open".
The sender should note such incidents and fall back to use the regular
TCP handshake on subsequent attempts temporarily as well: after the
second SYN timeouts the original Fast Open SYN is most likely lost.
When such an event recurs Fast Open is disabled based on the number of
recurrences exponentially.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike UDP or TCP, we do not take the pseudo-header into
account in SCTP checksums. So in case port mapping is the
very same, we do not need to recalculate the whole SCTP
checksum in software, which is very expensive.
Also, similarly as in TCP, take into account when a private
helper mangled the packet. In that case, we also need to
recalculate the checksum even if ports might be same.
Thanks for feedback regarding skb->ip_summed checks from
Julian Anastasov; here's a discussion on these checks for
snat and dnat:
* For snat_handler(), we can see CHECKSUM_PARTIAL from
virtual devices, and from LOCAL_OUT, otherwise it
should be CHECKSUM_UNNECESSARY. In general, in snat it
is more complex. skb contains the original route and
ip_vs_route_me_harder() can change the route after
snat_handler. So, for locally generated replies from
local server we can not preserve the CHECKSUM_PARTIAL
mode. It is an chicken or egg dilemma: snat_handler
needs the device after rerouting (to check for
NETIF_F_SCTP_CSUM), while ip_route_me_harder() wants
the snat_handler() to put the new saddr for proper
rerouting.
* For dnat_handler(), we should not see CHECKSUM_COMPLETE
for SCTP, in fact the small set of drivers that support
SCTP offloading return CHECKSUM_UNNECESSARY on correctly
received SCTP csum. We can see CHECKSUM_PARTIAL from
local stack or received from virtual drivers. The idea is
that SCTP decides to avoid csum calculation if hardware
supports offloading. IPVS can change the device after
rerouting to real server but we can preserve the
CHECKSUM_PARTIAL mode if the new device supports
offloading too. This works because skb dst is changed
before dnat_handler and we see the new device. So, checks
in the 'if' part will decide whether it is ok to keep
CHECKSUM_PARTIAL for the output. If the packet was with
CHECKSUM_NONE, hence we deal with unknown checksum. As we
recalculate the sum for IP header in all cases, it should
be safe to use CHECKSUM_UNNECESSARY. We can forward wrong
checksum in this case (without cp->app). In case of
CHECKSUM_UNNECESSARY, the csum was valid on receive.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Currently multicast code attempts to extrace the vlan id from
the skb even when vlan filtering is disabled. This can lead
to mdb entries being created with the wrong vlan id.
Pass the already extracted vlan id to the multicast
filtering code to make the correct id is used in
creation as well as lookup.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross says:
====================
One patch for net/3.12 fixing an issue where devices could be in an
invalid state they are removed while still attached to OVS.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the URLs in the Kconfig file to the new pages at sangoma.com and cisco.com
Signed-off-by: Michael Drüing <michael@drueing.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This work contains a lightweight BPF-based traffic classifier that can
serve as a flexible alternative to ematch-based tree classification, i.e.
now that BPF filter engine can also be JITed in the kernel. Naturally, tc
actions and policies are supported as well with cls_bpf. Multiple BPF
programs/filter can be attached for a class, or they can just as well be
written within a single BPF program, that's really up to the user how he
wishes to run/optimize the code, e.g. also for inversion of verdicts etc.
The notion of a BPF program's return/exit codes is being kept as follows:
0: No match
-1: Select classid given in "tc filter ..." command
else: flowid, overwrite the default one
As a minimal usage example with iproute2, we use a 3 band prio root qdisc
on a router with sfq each as leave, and assign ssh and icmp bpf-based
filters to band 1, http traffic to band 2 and the rest to band 3. For the
first two bands we load the bytecode from a file, in the 2nd we load it
inline as an example:
echo 1 > /proc/sys/net/core/bpf_jit_enable
tc qdisc del dev em1 root
tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
tc qdisc add dev em1 parent 1:1 sfq perturb 16
tc qdisc add dev em1 parent 1:2 sfq perturb 16
tc qdisc add dev em1 parent 1:3 sfq perturb 16
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2
tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3
BPF programs can be easily created and passed to tc, either as inline
'bytecode' or 'bytecode-file'. There are a couple of front-ends that can
compile opcodes, for example:
1) People familiar with tcpdump-like filters:
tcpdump -iem1 -ddd port 22 | tr '\n' ',' > /etc/tc/ssh.bpf
2) People that want to low-level program their filters or use BPF
extensions that lack support by libpcap's compiler:
bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf
ssh.ops example code:
ldh [12]
jne #0x800, drop
ldb [23]
jneq #6, drop
ldh [20]
jset #0x1fff, drop
ldxb 4 * ([14] & 0xf)
ldh [%x + 14]
jeq #0x16, pass
ldh [%x + 16]
jne #0x16, drop
pass: ret #-1
drop: ret #0
It was chosen to load bytecode into tc, since the reverse operation,
tc filter list dev em1, is then able to show the exact commands again.
Possible follow-up work could also include a small expression compiler
for iproute2. Tested with the help of bmon. This idea came up during
the Netfilter Workshop 2013 in Copenhagen. Also thanks to feedback from
Eric Dumazet!
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
This pull request contains the following netfilter fix:
* fix --queue-bypass in xt_NFQUEUE revision 3. While adding the
revision 3 of this target, the bypass flags were not correctly
handled anymore, thus, breaking packet bypassing if no application
is listening from userspace, patch from Holger Eitzenberger,
reported by Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
V3 of the NFQUEUE target ignores the --queue-bypass flag,
causing packets to be dropped when the userspace listener
isn't running.
Regression is in since 8746ddcf12 ("netfilter: xt_NFQUEUE:
introduce CPU fanout").
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
struct esp_data consists of a single pointer, vanishing the need for it
to be a structure. Fold the pointer into 'data' direcly, removing one
level of pointer indirection.
Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The padlen member of struct esp_data is always zero. Get rid of it.
Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
UFO as well as UDP_CORK do not respect IP_PMTUDISC_DO and
IP_PMTUDISC_PROBE well enough.
UFO enabled packet delivery just appends all frags to the cork and hands
it over to the network card. So we just deliver non-DF udp fragments
(DF-flag may get overwritten by hardware or virtual UFO enabled
interface).
UDP_CORK does enqueue the data until the cork is disengaged. At this
point it sets the correct IP_DF and local_df flags and hands it over to
ip_fragment which in this case will generate an icmp error which gets
appended to the error socket queue. This is not reflected in the syscall
error (of course, if UFO is enabled this also won't happen).
Improve this by checking the pmtudisc flags before appending data to the
socket and if we still can fit all data in one packet when IP_PMTUDISC_DO
or IP_PMTUDISC_PROBE is set, only then proceed.
We use (mtu-fragheaderlen) to check for the maximum length because we
ensure not to generate a fragment and non-fragmented data does not need
to have its length aligned on 64 bit boundaries. Also the passed in
ip_options are already aligned correctly.
Maybe, we can relax some other checks around ip_fragment. This needs
more research.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 6ff50cd555 ("tcp: gso: do not generate out of order packets")
had an heuristic that can trigger a warning in skb_try_coalesce(),
because skb->truesize of the gso segments were exactly set to mss.
This breaks the requirement that
skb->truesize >= skb->len + truesizeof(struct sk_buff);
It can trivially be reproduced by :
ifconfig lo mtu 1500
ethtool -K lo tso off
netperf
As the skbs are looped into the TCP networking stack, skb_try_coalesce()
warns us of these skb under-estimating their truesize.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code for privacy extentions is very mature, and making it
configurable only gives marginal memory/code savings in exchange
for obfuscation and hard to read code via CPP ifdef'ery.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the assignment of skb->dev. We don't need it here because
we use the netdev_alloc_skb_ip_align function which already sets the
skb->dev.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch uses the netdev_alloc_skb instead dev_alloc_skb function and
drops the seperate assignment to skb->dev.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The err variable can only be zero in this case.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reviewed-by: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In gss_encode_v1_msg, it is pointless to BUG() after the overflow has
happened. Replace the existing sprintf()-based code with scnprintf(),
and warn if an overflow is ever triggered.
In gss_encode_v0_msg, replace the runtime BUG_ON() with an appropriate
compile-time BUILD_BUG_ON.
Reported-by: Bruce Fields <bfields@fieldses.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add the missing 'break' to ensure that we don't corrupt a legacy 'v0' type
message by appending the 'v1'.
Cc: Bruce Fields <bfields@fieldses.org>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If req allocated failed just goto out_free, no need to check the
'i < num_prealloc'. There is just code simplification, no
functional changes.
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
rpc_clnt_set_transport should use rcu_derefence_protected(), as it is
only safe to be called with the rpc_clnt::cl_lock held.
Cc: Chuck Lever <Chuck.Lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add an RPC client API to redirect an rpc_clnt's transport from a
source server to a destination server during a migration event.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: forward ported to 3.12 ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The rpc_client_register() helper was added in commit e73f4cc0,
"SUNRPC: split client creation routine into setup and registration,"
Mon Jun 24 11:52:52 2013. In a subsequent patch, I'd like to invoke
rpc_client_register() from a context where a struct rpc_create_args
is not available.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch adds the filter chain type which is required to
create filter chains in the bridge family from userspace.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch fixes this:
CHECK net/netfilter/nft_nat.c
net/netfilter/nft_nat.c:50:43: warning: incorrect type in assignment (different base types)
net/netfilter/nft_nat.c:50:43: expected restricted __be32 [addressable] [usertype] ip
net/netfilter/nft_nat.c:50:43: got unsigned int [unsigned] [usertype] <noident>
net/netfilter/nft_nat.c:51:43: warning: incorrect type in assignment (different base types)
net/netfilter/nft_nat.c:51:43: expected restricted __be32 [addressable] [usertype] ip
net/netfilter/nft_nat.c:51:43: got unsigned int [unsigned] [usertype] <noident>
net/netfilter/nft_nat.c:65:37: warning: incorrect type in assignment (different base types)
net/netfilter/nft_nat.c:65:37: expected restricted __be16 [addressable] [assigned] [usertype] all
net/netfilter/nft_nat.c:65:37: got unsigned int [unsigned] <noident>
net/netfilter/nft_nat.c:66:37: warning: incorrect type in assignment (different base types)
net/netfilter/nft_nat.c:66:37: expected restricted __be16 [addressable] [assigned] [usertype] all
net/netfilter/nft_nat.c:66:37: got unsigned int [unsigned] <noident>
Signed-off-by: Tomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
when CONFIG_NF_TABLES[_MODULE] is not enabled,
but CONFIG_NF_TABLES_BRIDGE is enabled:
net/bridge/netfilter/nf_tables_bridge.c: In function 'nf_tables_bridge_init_net':
net/bridge/netfilter/nf_tables_bridge.c:24:5: error: 'struct net' has no member named 'nft'
net/bridge/netfilter/nf_tables_bridge.c:25:9: error: 'struct net' has no member named 'nft'
net/bridge/netfilter/nf_tables_bridge.c:28:2: error: 'struct net' has no member named 'nft'
net/bridge/netfilter/nf_tables_bridge.c:30:34: error: 'struct net' has no member named 'nft'
net/bridge/netfilter/nf_tables_bridge.c:35:11: error: 'struct net' has no member named 'nft'
net/bridge/netfilter/nf_tables_bridge.c: In function 'nf_tables_bridge_exit_net':
net/bridge/netfilter/nf_tables_bridge.c:41:27: error: 'struct net' has no member named 'nft'
net/bridge/netfilter/nf_tables_bridge.c:42:11: error: 'struct net' has no member named 'nft'
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The nl80211 attribute NL80211_ATTR_CSA_C_OFF_BEACON should be nested
inside NL80211_ATTR_CSA_IES, but commit ee4bc9e758
("nl80211: enable IBSS support for channel switch announcements")
added a check in the outer message attributes.
Fix channel switch calls by removing the erroneus condition.
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
[reword commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
CSA completion could call in a driver
bss_info_changed() with a garbled `changed` flag
leading to all sorts of problems.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Trigger the mesh channel switching procedure if the mesh STA
happens to miss the CSA action frame but able to receive the
beacon containing the CSA and MCSP elements from its peer
mesh STAs.
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@cozybit.com>
[fix locking in ieee80211_mesh_process_chnswitch()]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Implement the required procedures for mesh channel switching as defined
in the IEEE Std 802.11-2012 section 10.9.8.4.3 and also handle the CSA
and MCSP elements as followed:
* Add the function for updating the beacon and probe response frames
with CSA and MCSP elements during the period of switching to the new
channel. Both CSA and MCSP elements must be included in beacon and
probe response frames until the intended channel switch time.
* The ifmsh->csa_settings is set to NULL and the CSA and MCSP elements
will then be removed from the beacon or probe response frames once the
new channel is switched to.
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Allow the triggering of CSA frame using mesh interface. The
rules are more or less same with IBSS, such as not allowed to
change between the band and channel width has to be same from
the previous mode. Also, move the ieee80211_send_action_csa
to a common space so that it can be re-used by mesh interface.
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Process the CSA frame according to the procedures define in IEEE Std
802.11-2012 section 10.9.8.4.3 as follow:
* The mesh channel switch parameters element (MCSP) must be availabe.
* If the MCSP's TTL is 1, drop the frame but still process the CSA.
* If the MCSP's precedence value is less than or equal to the current
precedence value, drop the frame and do not process the CSA.
* The CSA frame is forwarded after TTL is decremented by 1 and the
initiator field is set to 0. Transmit restrict field and others
are maintained as is.
* No beacon or probe response frame are handled here.
Also, introduce the debug message used for mesh CSA purpose.
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Refactor the channel switch IE parsing to reduce the number
of function parameters.
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This can be used by a driver to prepare skbs for transmission, which were
obtained via functions such as ieee80211_probereq_get or
ieee80211_nullfunc_get.
This is useful for drivers that want to send those frames directly, but
need rate control information to be prepared first.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Drivers can now use this to parse the regulatory request and
be more verbose when needed.
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch fixes errors in the mesh powersave logic which
cause that remote peers do not get peer power mode change
notifications and mesh peer service periods (MPSPs) got
stuck.
When closing a peer link, set the (now invalid) peer-specific
power mode to 'unknown'.
Avoid overhead when local power mode is unchanged.
Reliably clear MPSP flags on peering status update.
Avoid MPSP flags getting stuck by not requesting a further
MPSP ownership if we already are an MPSP owner.
Signed-off-by: Marco Porsch <marco@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
6c17b77b67 ensures that a device's
mac80211 queues will remain stopped while offchannel. Since the
vif can no longer be offchannel when the queues wake it's not
necessary to check for this before waking its netdev queues.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Including ACPI ID for Broadcom GPS receiver BCM4752.
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Acked-by: Rhyland Klein <rklein@nvidia.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This will add the relevant values like the gpios and the
type in rfkill_gpio_platform_data to the rfkill_gpio_data
structure. It will allow those values to be easily picked
from DT and ACPI tables later.
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Acked-by: Rhyland Klein <rklein@nvidia.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This sets the direction of the gpio once when it's requested,
and uses the spinlock-safe gpio_set_state() to change the
state.
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Acked-by: Rhyland Klein <rklein@nvidia.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Use a simple flag to see the state of the clock, and make
the clock available even without a name. Also, get rid of
HAVE_CLK dependency.
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Acked-by: Rhyland Klein <rklein@nvidia.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
And remove now unneeded resource freeing.
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Acked-by: Rhyland Klein <rklein@nvidia.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Allow changing to DFS channels if the channel is available for
beaconing and userspace controls DFS operation.
Channel switch announcement from other stations on DFS channels will
be interpreted as radar event. These channels will then be marked as
unvailable.
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>