It doesn't make sense to output a tunnel packet using the same
parameters that it was received with since that will generally
just result in the packet going back to us. As a result, userspace
assumes that the tunnel key is cleared when transitioning through
the switch. In the majority of cases this doesn't matter since a
packet is either going to a tunnel port (in which the key is
overwritten with new values) or to a non-tunnel port (in which
case the key is ignored). However, it's theoreticaly possible that
userspace could rely on the documented behavior, so this corrects
it.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Flex array is used to allocate hash buckets which is type struct
hlist_head, but we use `struct hlist_head *` to calculate
array size. Since hlist_head is of size pointer it works fine.
Following patch use correct type.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
git silently included an extra hunk in vport_cmd_set() during
automatic merging. This code is unreachable so it does not actually
introduce a problem but it is clearly incorrect.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Pull more vfs stuff from Al Viro:
"O_TMPFILE ABI changes, Oleg's fput() series, misc cleanups, including
making simple_lookup() usable for filesystems with non-NULL s_d_op,
which allows us to get rid of quite a bit of ugliness"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
sunrpc: now we can just set ->s_d_op
cgroup: we can use simple_lookup() now
efivarfs: we can use simple_lookup() now
make simple_lookup() usable for filesystems that set ->s_d_op
configfs: don't open-code d_alloc_name()
__rpc_lookup_create_exclusive: pass string instead of qstr
rpc_create_*_dir: don't bother with qstr
llist: llist_add() can use llist_add_batch()
llist: fix/simplify llist_add() and llist_add_batch()
fput: turn "list_head delayed_fput_list" into llist_head
fs/file_table.c:fput(): add comment
Safer ABI for O_TMPFILE
Pull networking fixes from David Miller:
"Just a bunch of small fixes and tidy ups:
1) Finish the "busy_poll" renames, from Eliezer Tamir.
2) Fix RCU stalls in IFB driver, from Ding Tianhong.
3) Linearize buffers properly in tun/macvtap zerocopy code.
4) Don't crash on rmmod in vxlan, from Pravin B Shelar.
5) Spinlock used before init in alx driver, from Maarten Lankhorst.
6) A sparse warning fix in bnx2x broke TSO checksums, fix from Dmitry
Kravkov.
7) Dummy and ifb driver load failure paths can oops, fixes from Tan
Xiaojun and Ding Tianhong.
8) Correct MTU calculations in IP tunnels, from Alexander Duyck.
9) Account all TCP retransmits in SNMP stats properly, from Yuchung
Cheng.
10) atl1e and via-rhine do not handle DMA mapping failures properly,
from Neil Horman.
11) Various equal-cost multipath route fixes in ipv6 from Hannes
Frederic Sowa"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (36 commits)
ipv6: only static routes qualify for equal cost multipathing
via-rhine: fix dma mapping errors
atl1e: fix dma mapping warnings
tcp: account all retransmit failures
usb/net/r815x: fix cast to restricted __le32
usb/net/r8152: fix integer overflow in expression
net: access page->private by using page_private
net: strict_strtoul is obsolete, use kstrtoul instead
drivers/net/ieee802154: don't use devm_pinctrl_get_select_default() in probe
drivers/net/ethernet/cadence: don't use devm_pinctrl_get_select_default() in probe
drivers/net/can/c_can: don't use devm_pinctrl_get_select_default() in probe
net/usb: add relative mii functions for r815x
net/tipc: use %*phC to dump small buffers in hex form
qlcnic: Adding Maintainers.
gre: Fix MTU sizing check for gretap tunnels
pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts
pkt_sched: sch_qfq: improve efficiency of make_eligible
gso: Update tunnel segmentation to support Tx checksum offload
inet: fix spacing in assignment
ifb: fix oops when loading the ifb failed
...
Static routes in this case are non-expiring routes which did not get
configured by autoconf or by icmpv6 redirects.
To make sure we actually get an ecmp route while searching for the first
one in this fib6_node's leafs, also make sure it matches the ecmp route
assumptions.
v2:
a) Removed RTF_EXPIRE check in dst.from chain. The check of RTF_ADDRCONF
already ensures that this route, even if added again without
RTF_EXPIRES (in case of a RA announcement with infinite timeout),
does not cause the rt6i_nsiblings logic to go wrong if a later RA
updates the expiration time later.
v3:
a) Allow RTF_EXPIRES routes to enter the ecmp route set. We have to do so,
because an pmtu event could update the RTF_EXPIRES flag and we would
not count this route, if another route joins this set. We now filter
only for RTF_GATEWAY|RTF_ADDRCONF|RTF_DYNAMIC, which are flags that
don't get changed after rt6_info construction.
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change snmp RETRANSFAILS stat to include timeout retransmit failures
in addition to other loss recoveries.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of passing each byte by stack let's use nice specifier for that.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change fixes an MTU sizing issue seen with gretap tunnels when non-gso
packets are sent from the interface.
In my case I was able to reproduce the issue by simply sending a ping of
1421 bytes with the gretap interface created on a device with a standard
1500 mtu.
This fix is based on the fact that the tunnel mtu is already adjusted by
dev->hard_header_len so it would make sense that any packets being compared
against that mtu should also be adjusted by hard_header_len and the tunnel
header instead of just the tunnel header.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Reported-by: Cong Wang <amwang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the forward declaration of qfq_update_agg_ts, by moving
the definition of the function above its first call. This patch also
removes a useless forward declaration of qfq_schedule_agg.
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
In make_eligible, a mask is used to decide which groups must become eligible:
the i-th group becomes eligible only if the i-th bit of the mask (from the
right) is set. The mask is computed by left-shifting a 1 by a given number of
places, and decrementing the result. The shift is performed on a ULL to avoid
problems in case the number of places to shift is higher than 31. On a 32-bit
machine, this is more costly than working on an UL. This patch replaces such a
costly operation with two cheaper branches.
The trick is based on the following fact: in case of a shift of at least 32
places, the resulting mask has at least the 32 less significant bits set,
whereas the total number of groups is lower than 32. As a consequence, in this
case it is enough to just set the 32 less significant bits of the mask with a
cheaper ~0UL. In the other case, the shift can be safely performed on a UL.
Reported-by: David S. Miller <davem@davemloft.net>
Reported-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change makes it so that the GRE and VXLAN tunnels can make use of Tx
checksum offload support provided by some drivers via the hw_enc_features.
Without this fix enabling GSO means sacrificing Tx checksum offload and
this actually leads to a performance regression as shown below:
Utilization
Send
Throughput local GSO
10^6bits/s % S state
6276.51 8.39 enabled
7123.52 8.42 disabled
To resolve this it was necessary to address two items. First
netif_skb_features needed to be updated so that it would correctly handle
the Trans Ether Bridging protocol without impacting the need to check for
Q-in-Q tagging. To do this it was necessary to update harmonize_features
so that it used skb_network_protocol instead of just using the outer
protocol.
Second it was necessary to update the GRE and UDP tunnel segmentation
offloads so that they would reset the encapsulation bit and inner header
offsets after the offload was complete.
As a result of this change I have seen the following results on a interface
with Tx checksum enabled for encapsulated frames:
Utilization
Send
Throughput local GSO
10^6bits/s % S state
7123.52 8.42 disabled
8321.75 5.43 enabled
v2: Instead of replacing refrence to skb->protocol with
skb_network_protocol just replace the protocol reference in
harmonize_features to allow for double VLAN tag checks.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Highlights include:
- Fix an_rpc pipefs regression that causes a deadlock on mount
- Readdir optimisations by Scott Mayhew and Jeff Layton
- clean up the rpc_pipefs dentry operation setup
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJR3vVIAAoJEGcL54qWCgDyBWEP/0blqSlJId4zZj4xDviRFqJ4
93C7b/Vn7LrAcNCgDQsPkkzTwAX5yTB1H5eNtMuyggAdGj89d4n0jXgBniIMHmqI
Pjrr/XMQ65NddehrO491N01iJSfP9wE3CizJodnAv4VxMRO3xqiJG85lcnoLOFea
V1FnEFUu9oi8e93cQt2fe6KdmTu/SuRqlqR7WPGyTFgS26x1l8nkp2OQgulit5Up
lWuaxg4xbKOdj1jfUDXZhWUnDtkFjxyGxnKR63aA2X1DEGCUTJ6gB3tAl9pvnUb2
RTQF3GVj+Bm/E3gE6ULJvqOjhsgWYjLAZn6hDA3yNAIiFyV7aA6gwK4oKy/B47a6
tFEN2O1EupWzCqGyHhTArk+oEBLfUv/EgFyo7+Y0YIFV4sQTu5RbaZ0nQ2geY6LA
50q2GH57tkXTs859gtBPQgKzgRF1ulkF1FDY9EYQHyGiUbNxBfx+6/2OI04ubQt3
1gKUmm9w1WVzYGmHcHbxsXPT53NtAnHXW4ExcMgpaZ1YOPuIILm78ZuAw78XB/dd
mvXRtbhVt/gs7qZAQQPp1iHIv+vnJ0KgjO62gbuTIRftw5jwWrpWcfYMUUZrMnyM
kn326z3f4gn/vSDZI7J4tOfG1Uc7eNy+cJxStjtiNWTs3UzuWJKzJH0rZnoNZdei
xAkLhjIUEybAqIpXJuGH
=NqQf
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull second set of NFS client updates from Trond Myklebust:
"This mainly contains some small readdir optimisations that had
dependencies on Al Viro's readdir rewrite. There is also a fix for a
nasty deadlock which surfaced earlier in this merge window.
Highlights include:
- Fix an_rpc pipefs regression that causes a deadlock on mount
- Readdir optimisations by Scott Mayhew and Jeff Layton
- clean up the rpc_pipefs dentry operation setup"
* tag 'nfs-for-3.11-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Fix a deadlock in rpc_client_register()
rpc_pipe: rpc_dir_inode_operations can be static
NFS: Allow nfs_updatepage to extend a write under additional circumstances
NFS: Make nfs_readdir revalidate less often
NFS: Make nfs_attribute_cache_expired() non-static
rpc_pipe: set dentry operations at d_alloc time
nfs: set verifier on existing dentries in nfs_prime_dcache
This is a follow-up patch to 3630d40067
("ipv6: rt6_check_neigh should successfully verify neigh if no NUD
information are available").
Since the removal of rt->n in rt6_info we can end up with a dst ==
NULL in rt6_check_neigh. In case the kernel is not compiled with
CONFIG_IPV6_ROUTER_PREF we should also select a route with unkown
NUD state but we must not avoid doing round robin selection on routes
with the same target. So introduce and pass down a boolean ``do_rr'' to
indicate when we should update rt->rr_ptr. As soon as no route is valid
we do backtracking and do a lookup on a higher level in the fib trie.
v2:
a) Improved rt6_check_neigh logic (no need to create neighbour there)
and documented return values.
v3:
a) Introduce enum rt6_nud_state to get rid of the magic numbers
(thanks to David Miller).
b) Update and shorten commit message a bit to actualy reflect
the source.
Reported-by: Pierre Emeriaud <petrus.lt@gmail.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several of these patches were rebased in order to correct style issues.
Only stylistic changes were made versus the patches which were in linux-next
for two weeks. The rebases have been in linux-next for 3 days and have
passed my regressions.
The bulk of these are RDMA fixes and improvements. There's also some
additions on the extended attributes front to support some additional
namespaces and a new option for TCP to force allocation of mount requests
from a priviledged port.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: GPGTools - http://gpgtools.org
iQIcBAABAgAGBQJR3rWXAAoJEDZk62b0Tg6xabIP/12I+SkQ57wRN03EQy5fqUdX
gK/YMHKQ9QuDnZPBvrZ2lypesQNqVU0KINay6VEA86JG1gwzPyUd2MnpQ7F0vV3N
XwVD54IoflV/M74xUnrgGWB8YxaPcdacQQ8yazX+mEgOgYGdWmDAl7FHmAkdKAFB
gSl25f3PNJX1Rjay0dssNVXrVPXuJY/fZXKnNQZKtRwXffRWKsWHd8FU0Eq7F30A
kNQB8tmMSfHBBjP+tzR0My6/kQ09jzHdtZOkH9IgVpNzqrd8tfy0l6tEvFypxqGT
5oQFoxHHL/tUW05V0P3gYany2A7lEhSUifPKS6omqHO+vPlw+pDJw+xWlNq9fnDt
8S8znqVuEHhvqRQW7zFdb9ac2MZi8CHHhC2wGIZ7GYjNG2q5XwE8b/QhdXQeFin7
ibugvoW7+ZdcDewpQW27oO0g7B/8hRt8KC+1lc/8rITKIfGxbNJkGzTDl0F4Co7v
IH7Ew5PHPe6ZiuU0QSdU+NBuvk8g8sWGxx04Xvzl3WicwOg7XvN3ivrKB9oN2U1x
50KZRnYpwQQv/9AxyhroYU+Ufje8SF4v++zsq1eMzUcHsC/C73eatw2m764t+X4S
8yMLrgqY1Nzif4nAMi/SDMnB/R1bXeuc8kXD9xT6XD9d2tf6e+zCHhQklVeC0tuK
RiVRJqGrfanbKMnWIG0Y
=n9rI
-----END PGP SIGNATURE-----
Merge tag 'for-linus-3.11-merge-window-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull second round of 9p patches from Eric Van Hensbergen:
"Several of these patches were rebased in order to correct style
issues. Only stylistic changes were made versus the patches which
were in linux-next for two weeks. The rebases have been in linux-next
for 3 days and have passed my regressions.
The bulk of these are RDMA fixes and improvements. There's also some
additions on the extended attributes front to support some additional
namespaces and a new option for TCP to force allocation of mount
requests from a priviledged port"
* tag 'for-linus-3.11-merge-window-part-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
fs/9p: Remove the unused variable "err" in v9fs_vfs_getattr()
9P: Add cancelled() to the transport functions.
9P/RDMA: count posted buffers without a pending request
9P/RDMA: Improve error handling in rdma_request
9P/RDMA: Do not free req->rc in error handling in rdma_request()
9P/RDMA: Use a semaphore to protect the RQ
9P/RDMA: Protect against duplicate replies
9P/RDMA: increase P9_RDMA_MAXSIZE to 1MB
9pnet: refactor struct p9_fcall alloc code
9P/RDMA: rdma_request() needs not allocate req->rc
9P: Fix fcall allocation for rdma
fs/9p: xattr: add trusted and security namespaces
net/9p: add privport option to 9p tcp transport
Pull nfsd changes from Bruce Fields:
"Changes this time include:
- 4.1 enabled on the server by default: the last 4.1-specific issues
I know of are fixed, so we're not going to find the rest of the
bugs without more exposure.
- Experimental support for NFSv4.2 MAC Labeling (to allow running
selinux over NFS), from Dave Quigley.
- Fixes for some delicate cache/upcall races that could cause rare
server hangs; thanks to Neil Brown and Bodo Stroesser for extreme
debugging persistence.
- Fixes for some bugs found at the recent NFS bakeathon, mostly v4
and v4.1-specific, but also a generic bug handling fragmented rpc
calls"
* 'for-3.11' of git://linux-nfs.org/~bfields/linux: (31 commits)
nfsd4: support minorversion 1 by default
nfsd4: allow destroy_session over destroyed session
svcrpc: fix failures to handle -1 uid's
sunrpc: Don't schedule an upcall on a replaced cache entry.
net/sunrpc: xpt_auth_cache should be ignored when expired.
sunrpc/cache: ensure items removed from cache do not have pending upcalls.
sunrpc/cache: use cache_fresh_unlocked consistently and correctly.
sunrpc/cache: remove races with queuing an upcall.
nfsd4: return delegation immediately if lease fails
nfsd4: do not throw away 4.1 lock state on last unlock
nfsd4: delegation-based open reclaims should bypass permissions
svcrpc: don't error out on small tcp fragment
svcrpc: fix handling of too-short rpc's
nfsd4: minor read_buf cleanup
nfsd4: fix decoding of compounds across page boundaries
nfsd4: clean up nfs4_open_delegation
NFSD: Don't give out read delegations on creates
nfsd4: allow client to send no cb_sec flavors
nfsd4: fail attempts to request gss on the backchannel
nfsd4: implement minimal SP4_MACH_CRED
...
Rename LL_SO to BUSY_POLL_SO
Rename sysctl_net_ll_{read,poll} to sysctl_busy_{read,poll}
Fix up users of these variables.
Fix documentation for sysctl.
a patch for the socket.7 man page will follow separately,
because of limitations of my mail setup.
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename ndo_ll_poll to ndo_busy_poll.
Rename sk_mark_ll to sk_mark_napi_id.
Rename skb_mark_ll to skb_mark_napi_id.
Correct all useres of these functions.
Update comments and defines in include/net/busy_poll.h
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename the file and correct all the places where it is included.
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 384816051c (SUNRPC: fix races on
PipeFS MOUNT notifications) introduces a regression when we call
rpc_setup_pipedir() with RPCSEC_GSS as the auth flavour.
By calling rpcauth_create() while holding the sn->pipefs_sb_lock, we
end up deadlocking in gss_pipes_dentries_create_net().
Fix is to register the client and release the mutex before calling
rpcauth_create().
Reported-by: Weston Andros Adamson <dros@netapp.com>
Tested-by: Weston Andros Adamson <dros@netapp.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: <stable@vger.kernel.org> # : 3848160: SUNRPC: fix races on PipeFS MOUNT
Cc: <stable@vger.kernel.org> # : e73f4cc: SUNRPC: split client creation
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Hi Jeff,
FYI, there are new sparse warnings show up in
tree: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git nfs-for-next
head: 296afe1f58d55fd56ed85daaafafcfee39f59ece
commit: 76fa666579 [2/5] rpc_pipe: set dentry operations at d_alloc time
>> net/sunrpc/rpc_pipe.c:496:31: sparse: symbol 'rpc_dir_inode_operations' was not declared. Should it be static?
Please consider folding the attached diff :-)
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Pull networking updates from David Miller:
"This is a re-do of the net-next pull request for the current merge
window. The only difference from the one I made the other day is that
this has Eliezer's interface renames and the timeout handling changes
made based upon your feedback, as well as a few bug fixes that have
trickeled in.
Highlights:
1) Low latency device polling, eliminating the cost of interrupt
handling and context switches. Allows direct polling of a network
device from socket operations, such as recvmsg() and poll().
Currently ixgbe, mlx4, and bnx2x support this feature.
Full high level description, performance numbers, and design in
commit 0a4db187a9 ("Merge branch 'll_poll'")
From Eliezer Tamir.
2) With the routing cache removed, ip_check_mc_rcu() gets exercised
more than ever before in the case where we have lots of multicast
addresses. Use a hash table instead of a simple linked list, from
Eric Dumazet.
3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from
Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski,
Marek Puzyniak, Michal Kazior, and Sujith Manoharan.
4) Support reporting the TUN device persist flag to userspace, from
Pavel Emelyanov.
5) Allow controlling network device VF link state using netlink, from
Rony Efraim.
6) Support GRE tunneling in openvswitch, from Pravin B Shelar.
7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from
Daniel Borkmann and Eric Dumazet.
8) Allow controlling of TCP quickack behavior on a per-route basis,
from Cong Wang.
9) Several bug fixes and improvements to vxlan from Stephen
Hemminger, Pravin B Shelar, and Mike Rapoport. In particular,
support receiving on multiple UDP ports.
10) Major cleanups, particular in the area of debugging and cookie
lifetime handline, to the SCTP protocol code. From Daniel
Borkmann.
11) Allow packets to cross network namespaces when traversing tunnel
devices. From Nicolas Dichtel.
12) Allow monitoring netlink traffic via AF_PACKET sockets, in a
manner akin to how we monitor real network traffic via ptype_all.
From Daniel Borkmann.
13) Several bug fixes and improvements for the new alx device driver,
from Johannes Berg.
14) Fix scalability issues in the netem packet scheduler's time queue,
by using an rbtree. From Eric Dumazet.
15) Several bug fixes in TCP loss recovery handling, from Yuchung
Cheng.
16) Add support for GSO segmentation of MPLS packets, from Simon
Horman.
17) Make network notifiers have a real data type for the opaque
pointer that's passed into them. Use this to properly handle
network device flag changes in arp_netdev_event(). From Jiri
Pirko and Timo Teräs.
18) Convert several drivers over to module_pci_driver(), from Peter
Huewe.
19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a
O(1) calculation instead. From Eric Dumazet.
20) Support setting of explicit tunnel peer addresses in ipv6, just
like ipv4. From Nicolas Dichtel.
21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet.
22) Prevent a single high rate flow from overruning an individual cpu
during RX packet processing via selective flow shedding. From
Willem de Bruijn.
23) Don't use spinlocks in TCP md5 signing fast paths, from Eric
Dumazet.
24) Don't just drop GSO packets which are above the TBF scheduler's
burst limit, chop them up so they are in-bounds instead. Also
from Eric Dumazet.
25) VLAN offloads are missed when configured on top of a bridge, fix
from Vlad Yasevich.
26) Support IPV6 in ping sockets. From Lorenzo Colitti.
27) Receive flow steering targets should be updated at poll() time
too, from David Majnemer.
28) Fix several corner case regressions in PMTU/redirect handling due
to the routing cache removal, from Timo Teräs.
29) We have to be mindful of ipv4 mapped ipv6 sockets in
upd_v6_push_pending_frames(). From Hannes Frederic Sowa.
30) Fix L2TP sequence number handling bugs, from James Chapman."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits)
drivers/net: caif: fix wrong rtnl_is_locked() usage
drivers/net: enic: release rtnl_lock on error-path
vhost-net: fix use-after-free in vhost_net_flush
net: mv643xx_eth: do not use port number as platform device id
net: sctp: confirm route during forward progress
virtio_net: fix race in RX VQ processing
virtio: support unlocked queue poll
net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit
Documentation: Fix references to defunct linux-net@vger.kernel.org
net/fs: change busy poll time accounting
net: rename low latency sockets functions to busy poll
bridge: fix some kernel warning in multicast timer
sfc: Fix memory leak when discarding scattered packets
sit: fix tunnel update via netlink
dt:net:stmmac: Add dt specific phy reset callback support.
dt:net:stmmac: Add support to dwmac version 3.610 and 3.710
dt:net:stmmac: Allocate platform data only if its NULL.
net:stmmac: fix memleak in the open method
ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available
net: ipv6: fix wrong ping_v6_sendmsg return value
...
Currently the way these get set is a little convoluted. If the dentry is
allocated via lookup from userland, then it gets set by simple_lookup.
If it gets allocated when the kernel is populating the directory, then
it gets set via __rpc_lookup_create_exclusive, which has to check
whether they might already be set. Between both of these, this ensures
that all dentries have their d_op pointer set.
Instead of doing that, just have them set at d_alloc time by pointing
sb->s_d_op at them. With that change, we no longer want the lookup op
to set them, so we must move to using our own lookup routine.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* optional security enhancements
* fix path coverage in MAINTAINERS
* switch to using most used protocol and transport as default
* clean up buffer dumps in trace code
Held off on RDMA patches as they need to be cleaned up a bit, but
will try to get the cleaned, checked, and pushed by mid-week.
(attempt 2, hopefully this one won't screw up the history)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: GPGTools - http://gpgtools.org
iQIcBAABAgAGBQJR2iZUAAoJEDZk62b0Tg6xsfQP/i3cYmkpf58lb++WoWDohQdh
iH34P6Tv+5AKcF5SViBFDyXsdkE0D/Ixzl/E6jTsx+6OTSCA0eIw4OYyvPQpzFyp
1+RqnTyEq6v2SQaGZKW7k7NyXDiRhVypXBupuNq8eZpYKS8B3cKdnQ/WFSAXcxQ1
sbKWKUWnnqIZYnRNqNK4LTxz9cbLovXIQOYBhn0F+NoAFinC1ZQrWzuUVbct880i
cSoukTivmJHb37Pt9AKluPc6GGa6XHXkomQewh0WOnBJ/9FR3YUHeRXR04cnAWAL
zpGKagnIhYWtdaTJQXCzO2OMCQakhf9FiBWYGjfM9ysyzS4LDp1cknlyUPox97xF
o9o6MfFF161c8+uC/RpK8Lp3vG6CFPEcMVxp73BydNNI4/1hzbfCs3WcGdpkvAg/
rRik/zyN7l3jEwtvU03Y1WEV79Ep/Q8cvPqi4XZB2L1XYi43fT4yze6zMM/cmQ5K
DLTbFxtN5ILWg2LjQergORyn66WqQjproPqcgd9tVrvJ30Z5KPjIh+CBVcYPWp4V
hxD0Pd0yTySpxUqV4Qx/BMZdWiD1wuBgidKgl+jNldTaCSFtPqQ52LYmTWNpneI1
lcc3SMFRNRhqWMOFhzpcX1xGuXKD5eRiOrQ+L1ecFxGFYVndY5nwa6Pn8gUrfGHW
LEBmADtMsv2YQW2Kahk2
=ktVU
-----END PGP SIGNATURE-----
Merge tag 'for-linus-3.11-merge-window-part-1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull 9p update from Eric Van Hensbergen:
"Grab bag of little fixes and enhancements:
- optional security enhancements
- fix path coverage in MAINTAINERS
- switch to using most used protocol and transport as default
- clean up buffer dumps in trace code
Held off on RDMA patches as they need to be cleaned up a bit, but will
try to get the cleaned, checked, and pushed by mid-week"
* tag 'for-linus-3.11-merge-window-part-1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: Add rest of 9p files to MAINTAINERS entry
9p: trace: use %*ph to dump buffer
net/9p: Handle error in zero copy request correctly for 9p2000.u
net/9p: Use virtio transpart as the default transport
net/9p: Make 9P2000.L the default protocol for 9p file system
This fix has been proposed originally by Vlad Yasevich. He says:
When SCTP makes forward progress (receives a SACK that acks new chunks,
renegs, or answeres 0-window probes) or when HB-ACK arrives, mark
the route as confirmed so we don't unnecessarily send NUD probes.
Having a simple SCTP client/server that exchange data chunks every 1sec,
without this patch ARP requests are sent periodically every 40-60sec.
With this fix applied, an ARP request is only done once right at the
"session" beginning. Also, when clearing the related ARP cache entry
manually during the session, a new request is correctly done. I have
only "backported" this to net-next and tested that it works, so full
credit goes to Vlad.
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull Ceph updates from Sage Weil:
"There is some follow-on RBD cleanup after the last window's code drop,
a series from Yan fixing multi-mds behavior in cephfs, and then a
sprinkling of bug fixes all around. Some warnings, sleeping while
atomic, a null dereference, and cleanups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (36 commits)
libceph: fix invalid unsigned->signed conversion for timespec encoding
libceph: call r_unsafe_callback when unsafe reply is received
ceph: fix race between cap issue and revoke
ceph: fix cap revoke race
ceph: fix pending vmtruncate race
ceph: avoid accessing invalid memory
libceph: Fix NULL pointer dereference in auth client code
ceph: Reconstruct the func ceph_reserve_caps.
ceph: Free mdsc if alloc mdsc->mdsmap failed.
ceph: remove sb_start/end_write in ceph_aio_write.
ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL.
ceph: fix sleeping function called from invalid context.
ceph: move inode to proper flushing list when auth MDS changes
rbd: fix a couple warnings
ceph: clear migrate seq when MDS restarts
ceph: check migrate seq before changing auth cap
ceph: fix race between page writeback and truncate
ceph: reset iov_len when discarding cap release messages
ceph: fix cap release race
libceph: fix truncate size calculation
...
Feature highlights include:
- Add basic client support for NFSv4.2
- Add basic client support for Labeled NFS (selinux for NFSv4.2)
- Fix the use of credentials in NFSv4.1 stateful operations, and
add support for NFSv4.1 state protection.
Bugfix highlights:
- Fix another NFSv4 open state recovery race
- Fix an NFSv4.1 back channel session regression
- Various rpc_pipefs races
- Fix another issue with NFSv3 auth negotiation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJR2vsSAAoJEGcL54qWCgDyWBIP/AqlpBBAblxbNQ1Bl/0m1Pdb
iKH961qgM4U1BzK0svGtHTZqkovpm4o/VbkbKBT5mQ4g6SbbsJ/AsS1plCyfnIZi
bdnKNJyj6zg0NsAkJ3vKWqd4BTaP+icdSfEIlRKQxAPESewN7b5B3OWgY4KdYmnk
q5BP25anC1ryxVycSY67ux8S2IKXVSRZeCZv+RO21rvZ2G0bV5y7t8Om28ztxEnU
RKrHgQHgaaktR7i8QVO0sbiWq3iqLa3GPkUvFLwWGr8PQJtTkYY0QwYSrsV3N4rY
hYpMRUZFHpZ8UG5YvBT6xyOy/XaGwMGKSfZjB9/YG4QVju+tTy50U1JbTil5PEWY
GHWYF68aurIeUkXrhSv8AVnOnhir0mISx5ou/SV7p0QoAZ92V6kq+LkPrW520qlc
z8ILh3j28pN3ZUCIEArcaZhYCt48uO2hwBi5TqevQyyGRsXFGbN1moD5jvHkllft
Fi0XGuCBdvhrzFRZcsEl+PDq7fT8lXUK2BHe8oR5jz9PhUp+jpEl9m/eg3RsjJjN
DuxsHye2U4chScdnRtLBQvpFtdINvWX/Gy8Bi7kdE5tsQySvOa+rdwuBc7h88PHC
+4xI2iX3z4O1+GpsAe/T9+pjW689jEilS+eVDRVEGl6yHGn9q8PYOayjPjwbJHxS
R2mLTRhKu1DKguTzO13f
=wGjn
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Feature highlights include:
- Add basic client support for NFSv4.2
- Add basic client support for Labeled NFS (selinux for NFSv4.2)
- Fix the use of credentials in NFSv4.1 stateful operations, and add
support for NFSv4.1 state protection.
Bugfix highlights:
- Fix another NFSv4 open state recovery race
- Fix an NFSv4.1 back channel session regression
- Various rpc_pipefs races
- Fix another issue with NFSv3 auth negotiation
Please note that Labeled NFS does require some additional support from
the security subsystem. The relevant changesets have all been
reviewed and acked by James Morris."
* tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (54 commits)
NFS: Set NFS_CS_MIGRATION for NFSv4 mounts
NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs
nfs: have NFSv3 try server-specified auth flavors in turn
nfs: have nfs_mount fake up a auth_flavs list when the server didn't provide it
nfs: move server_authlist into nfs_try_mount_request
nfs: refactor "need_mount" code out of nfs_try_mount
SUNRPC: PipeFS MOUNT notification optimization for dying clients
SUNRPC: split client creation routine into setup and registration
SUNRPC: fix races on PipeFS UMOUNT notifications
SUNRPC: fix races on PipeFS MOUNT notifications
NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount
NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount
NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize
NFS: Improve legacy idmapping fallback
NFSv4.1 end back channel session draining
NFS: Apply v4.1 capabilities to v4.2
NFSv4.1: Clean up layout segment comparison helper names
NFSv4.1: layout segment comparison helpers should take 'const' parameters
NFSv4: Move the DNS resolver into the NFSv4 module
rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set
...
Rename functions in include/net/ll_poll.h to busy wait.
Clarify documentation about expected power use increase.
Rename POLL_LL to POLL_BUSY_LOOP.
Add need_resched() testing to poll/select busy loops.
Note, that in select and poll can_busy_poll is dynamic and is
updated continuously to reflect the existence of supported
sockets with valid queue information.
Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As of f025adf191 "sunrpc: Properly decode
kuids and kgids in RPC_AUTH_UNIX credentials" any rpc containing a -1
(0xffff) uid or gid would fail with a badcred error.
Commit afe3c3fd53 "svcrpc: fix failures to
handle -1 uid's and gid's" fixed part of the problem, but overlooked the
gid upcall--the kernel can request supplementary gid's for the -1 uid,
but mountd's attempt write a response will get -EINVAL.
Symptoms were nfsd failing to reply to the first attempt to use a newly
negotiated krb5 context.
Reported-by: Sven Geggus <lists@fuchsschwanzdomain.de>
Tested-by: Sven Geggus <lists@fuchsschwanzdomain.de>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
RDMA needs to post a buffer for each incoming reply.
Hence it needs to keep count of these and needs to be
aware of whether a flushed request has received a reply
or not.
This patch adds the cancelled() callback to the transport modules.
It is called when RFLUSH has been received and that the corresponding
request will never receive a reply.
Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
In rdma_request():
If an error occurs between posting the recv and the send,
there will be a reply context posted without a pending
request.
Since there is no way to "un-post" it, we remember it and
skip post_recv() for the next request.
Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Most importantly:
- do not free the recv context (rpl_context) after a successful post_recv()
- but do free the send context (c) after a failed send.
Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
rdma_request() should never be in charge of freeing rc.
When an error occurs:
* Either the rc buffer has been recv_post()'ed.
then kfree()'ing it certainly is a bad idea.
* Or is has not, and in that case req->rc still points to it,
hence it needs not be freed.
Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
The current code keeps track of the number of buffers posted in the RQ,
and will prevent it from overflowing. But it does so by simply dropping
post requests (And leaking memory in the process).
When this happens there will actually be too few buffers posted, and
soon the 9P server will complain about 'RNR retry counter exceeded'
errors.
Instead, use a semaphore, and block until the RQ is ready for another
buffer to be posted.
Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
A well-behaved server would not send twice the reply to a request.
But if it ever happens...
This additional check prevents the kernel from leaking memory
and possibly more nasty consequences in that unlikely event.
Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
The current value is too low to get good performance.
Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
The current code assumes that when a request in the request array
does have a tc, it also has a rc.
This is normally true, but not always : when using RDMA, req->rc
will temporarily be set to NULL after the request has been sent.
That is usually OK though, as when the reply arrives, req->rc will be
reassigned to a sane value before the request is recycled.
But there is a catch : if the request is flushed, the reply will never
arrive, and req->rc will be NULL, but not req->tc.
This patch fixes p9_tag_alloc to take this into account.
Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
If the privport option is specified, the tcp transport binds local
address to a reserved port before connecting to the 9p server.
In some cases when 9P AUTH cannot be implemented, this is better than
nothing.
Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Several people reported the warning: "kernel BUG at kernel/timer.c:729!"
and the stack trace is:
#7 [ffff880214d25c10] mod_timer+501 at ffffffff8106d905
#8 [ffff880214d25c50] br_multicast_del_pg.isra.20+261 at ffffffffa0731d25 [bridge]
#9 [ffff880214d25c80] br_multicast_disable_port+88 at ffffffffa0732948 [bridge]
#10 [ffff880214d25cb0] br_stp_disable_port+154 at ffffffffa072bcca [bridge]
#11 [ffff880214d25ce8] br_device_event+520 at ffffffffa072a4e8 [bridge]
#12 [ffff880214d25d18] notifier_call_chain+76 at ffffffff8164aafc
#13 [ffff880214d25d50] raw_notifier_call_chain+22 at ffffffff810858f6
#14 [ffff880214d25d60] call_netdevice_notifiers+45 at ffffffff81536aad
#15 [ffff880214d25d80] dev_close_many+183 at ffffffff81536d17
#16 [ffff880214d25dc0] rollback_registered_many+168 at ffffffff81537f68
#17 [ffff880214d25de8] rollback_registered+49 at ffffffff81538101
#18 [ffff880214d25e10] unregister_netdevice_queue+72 at ffffffff815390d8
#19 [ffff880214d25e30] __tun_detach+272 at ffffffffa074c2f0 [tun]
#20 [ffff880214d25e88] tun_chr_close+45 at ffffffffa074c4bd [tun]
#21 [ffff880214d25ea8] __fput+225 at ffffffff8119b1f1
#22 [ffff880214d25ef0] ____fput+14 at ffffffff8119b3fe
#23 [ffff880214d25f00] task_work_run+159 at ffffffff8107cf7f
#24 [ffff880214d25f30] do_notify_resume+97 at ffffffff810139e1
#25 [ffff880214d25f50] int_signal+18 at ffffffff8164f292
this is due to I forgot to check if mp->timer is armed in
br_multicast_del_pg(). This bug is introduced by
commit 9f00b2e7cf (bridge: only expire the mdb entry
when query is received).
Same for __br_mdb_del().
Tested-by: poma <pomidorabelisima@gmail.com>
Reported-by: LiYonghua <809674045@qq.com>
Reported-by: Robert Hancock <hancockrwd@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device can stand in another netns, hence we need to do the lookup in netns
tunnel->net.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>