Commit Graph

57946 Commits

Author SHA1 Message Date
Theuns Verwoerd
160ca01424 rtnetlink: Handle IFLA_MASTER parameter when processing rtnl_newlink
Allow a master interface to be specified as one of the parameters when
creating a new interface via rtnl_newlink.  Previously this would
require invoking interface creation, waiting for it to complete, and
then separately binding that new interface to a master.

In particular, this is used when creating a macvlan child interface for
VRRP in a VRF configuration, allowing the interface creator to specify
directly what master interface should be inherited by the child,
without having to deal with asynchronous complications and potential
race conditions.

Signed-off-by: Theuns Verwoerd <theuns.verwoerd@alliedtelesis.co.nz>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01 11:53:23 -05:00
David S. Miller
04cdf13e34 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2017-02-01

1) Some typo fixes, from Alexander Alemayhu.

2) Don't acquire state lock in get_mtu functions.
   The only rece against a dead state does not matter.
   From Florian Westphal.

3) Remove xfrm4_state_fini, it is unused for more than
   10 years. From Florian Westphal.

4) Various rcu usage improvements. From Florian Westphal.

5) Properly handle crypto arrors in ah4/ah6.
   From Gilad Ben-Yossef.

6) Try to avoid skb linearization in esp4 and esp6.

7) The esp trailer is now set up in different places,
   add a helper for this.

8) With the upcomming usage of gro_cells in IPsec,
   a gro merged skb can have a secpath. Drop it
   before freeing or reusing the skb.

9) Add a xfrm dummy network device for napi. With
   this we can use gro_cells from within xfrm,
   it allows IPsec GRO without impact on the generic
   networking code.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01 11:22:38 -05:00
Alexei Starovoitov
4d1ceea851 net: ethtool: convert large order kmalloc allocations to vzalloc
under memory pressure 'ethtool -S' command may warn:
[ 2374.385195] ethtool: page allocation failure: order:4, mode:0x242c0c0
[ 2374.405573] CPU: 12 PID: 40211 Comm: ethtool Not tainted
[ 2374.423071] Call Trace:
[ 2374.423076]  [<ffffffff8148cb29>] dump_stack+0x4d/0x64
[ 2374.423080]  [<ffffffff811667cb>] warn_alloc_failed+0xeb/0x150
[ 2374.423082]  [<ffffffff81169cd3>] ? __alloc_pages_direct_compact+0x43/0xf0
[ 2374.423084]  [<ffffffff8116a25c>] __alloc_pages_nodemask+0x4dc/0xbf0
[ 2374.423091]  [<ffffffffa0023dc2>] ? cmd_exec+0x722/0xcd0 [mlx5_core]
[ 2374.423095]  [<ffffffff811b3dcc>] alloc_pages_current+0x8c/0x110
[ 2374.423097]  [<ffffffff81168859>] alloc_kmem_pages+0x19/0x90
[ 2374.423099]  [<ffffffff81186e5e>] kmalloc_order_trace+0x2e/0xe0
[ 2374.423101]  [<ffffffff811c0084>] __kmalloc+0x204/0x220
[ 2374.423105]  [<ffffffff816c269e>] dev_ethtool+0xe4e/0x1f80
[ 2374.423106]  [<ffffffff816b967e>] ? dev_get_by_name_rcu+0x5e/0x80
[ 2374.423108]  [<ffffffff816d6926>] dev_ioctl+0x156/0x560
[ 2374.423111]  [<ffffffff811d4c68>] ? mem_cgroup_commit_charge+0x78/0x3c0
[ 2374.423117]  [<ffffffff8169d542>] sock_do_ioctl+0x42/0x50
[ 2374.423119]  [<ffffffff8169d9c3>] sock_ioctl+0x1b3/0x250
[ 2374.423121]  [<ffffffff811f0f42>] do_vfs_ioctl+0x92/0x580
[ 2374.423123]  [<ffffffff8100222b>] ? do_audit_syscall_entry+0x4b/0x70
[ 2374.423124]  [<ffffffff8100287c>] ? syscall_trace_enter_phase1+0xfc/0x120
[ 2374.423126]  [<ffffffff811f14a9>] SyS_ioctl+0x79/0x90
[ 2374.423127]  [<ffffffff81002bb0>] do_syscall_64+0x50/0xa0
[ 2374.423129]  [<ffffffff817e19bc>] entry_SYSCALL64_slow_path+0x25/0x25

~1160 mlx5 counters ~= order 4 allocation which is unlikely to succeed
under memory pressure. Convert them to vzalloc() as ethtool_get_regs() does.
Also take care of drivers without counters similar to
commit 67ae7cf1ee ("ethtool: Allow zero-length register dumps again")
and reduce warn_on to warn_on_once.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-31 13:28:06 -05:00
Neil Brown
2b477c00f3 svcrpc: free contexts immediately on PROC_DESTROY
We currently handle a client PROC_DESTROY request by turning it
CACHE_NEGATIVE, setting the expired time to now, and then waiting for
cache_clean to clean it up later.  Since we forgot to set the cache's
nextcheck value, that could take up to 30 minutes.  Also, though there's
probably no real bug in this case, setting CACHE_NEGATIVE directly like
this probably isn't a great idea in general.

So let's just remove the entry from the cache directly, and move this
bit of cache manipulation to a helper function.

Signed-off-by: Neil Brown <neilb@suse.com>
Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-31 12:31:53 -05:00
J. Bruce Fields
034dd34ff4 svcrpc: fix oops in absence of krb5 module
Olga Kornievskaia says: "I ran into this oops in the nfsd (below)
(4.10-rc3 kernel). To trigger this I had a client (unsuccessfully) try
to mount the server with krb5 where the server doesn't have the
rpcsec_gss_krb5 module built."

The problem is that rsci.cred is copied from a svc_cred structure that
gss_proxy didn't properly initialize.  Fix that.

[120408.542387] general protection fault: 0000 [#1] SMP
...
[120408.565724] CPU: 0 PID: 3601 Comm: nfsd Not tainted 4.10.0-rc3+ #16
[120408.567037] Hardware name: VMware, Inc. VMware Virtual =
Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[120408.569225] task: ffff8800776f95c0 task.stack: ffffc90003d58000
[120408.570483] RIP: 0010:gss_mech_put+0xb/0x20 [auth_rpcgss]
...
[120408.584946]  ? rsc_free+0x55/0x90 [auth_rpcgss]
[120408.585901]  gss_proxy_save_rsc+0xb2/0x2a0 [auth_rpcgss]
[120408.587017]  svcauth_gss_proxy_init+0x3cc/0x520 [auth_rpcgss]
[120408.588257]  ? __enqueue_entity+0x6c/0x70
[120408.589101]  svcauth_gss_accept+0x391/0xb90 [auth_rpcgss]
[120408.590212]  ? try_to_wake_up+0x4a/0x360
[120408.591036]  ? wake_up_process+0x15/0x20
[120408.592093]  ? svc_xprt_do_enqueue+0x12e/0x2d0 [sunrpc]
[120408.593177]  svc_authenticate+0xe1/0x100 [sunrpc]
[120408.594168]  svc_process_common+0x203/0x710 [sunrpc]
[120408.595220]  svc_process+0x105/0x1c0 [sunrpc]
[120408.596278]  nfsd+0xe9/0x160 [nfsd]
[120408.597060]  kthread+0x101/0x140
[120408.597734]  ? nfsd_destroy+0x60/0x60 [nfsd]
[120408.598626]  ? kthread_park+0x90/0x90
[120408.599448]  ret_from_fork+0x22/0x30

Fixes: 1d658336b0 "SUNRPC: Add RPC based upcall mechanism for RPCGSS auth"
Cc: stable@vger.kernel.org
Cc: Simo Sorce <simo@redhat.com>
Reported-by: Olga Kornievskaia <kolga@netapp.com>
Tested-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-31 12:29:24 -05:00
Simon Horman
040587af31 net/sched: cls_flower: Correct matching on ICMPv6 code
When matching on the ICMPv6 code ICMPV6_CODE rather than
ICMPV4_CODE attributes should be used.

This corrects what appears to be a typo.

Sample usage:

tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
	indev eth0 ip_proto icmpv6 type 128 code 0 action drop

Without this change the code parameter above is effectively ignored.

Fixes: 7b684884fb ("net/sched: cls_flower: Support matching on ICMP type and code")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 16:42:09 -05:00
David S. Miller
0d29ed28da Merge tag 'linux-can-fixes-for-4.10-20170130' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:

====================
pull-request: can 2017-01-30

this is a pull request of one patch.

The patch is by Oliver Hartkopp and fixes the hrtimer/tasklet termination in
bcm op removal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 16:38:39 -05:00
Dan Carpenter
cdaf25dfc0 smc: some potential use after free bugs
Say we got really unlucky and these failed on the last iteration, then
it could lead to a use after free bug.

Fixes: cd6851f303 ("smc: remote memory buffers (RMBs)")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 16:37:55 -05:00
Florian Fainelli
f50f212749 net: dsa: Add plumbing for port mirroring
Add necessary plumbing at the slave network device level to have switch
drivers implement ndo_setup_tc() and most particularly the cls_matchall
classifier. We add support for two switch operations:

port_add_mirror and port_del_mirror() which configure, on a per-port
basis the mirror parameters requested from the cls_matchall classifier.

Code is largely borrowed from the Mellanox Spectrum switch driver.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 15:55:46 -05:00
Vlad Yasevich
2b89ed65a6 ipv6: Paritially checksum full MTU frames
IPv6 will mark data that is smaller that mtu - headersize as
CHECKSUM_PARTIAL, but if the data will completely fill the mtu,
the packet checksum will be computed in software instead.
Extend the conditional to include the data that fills the mtu
as well.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 15:51:12 -05:00
David Ahern
30357d7d8a lwtunnel: remove device arg to lwtunnel_build_state
Nothing about lwt state requires a device reference, so remove the
input argument.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 15:14:22 -05:00
Robert Shearman
63a6fff353 net: Avoid receiving packets with an l3mdev on unbound UDP sockets
Packets arriving in a VRF currently are delivered to UDP sockets that
aren't bound to any interface. TCP defaults to not delivering packets
arriving in a VRF to unbound sockets. IP route lookup and socket
transmit both assume that unbound means using the default table and
UDP applications that haven't been changed to be aware of VRFs may not
function correctly in this case since they may not be able to handle
overlapping IP address ranges, or be able to send packets back to the
original sender if required.

So add a sysctl, udp_l3mdev_accept, to control this behaviour with it
being analgous to the existing tcp_l3mdev_accept, namely to allow a
process to have a VRF-global listen socket. Have this default to off
as this is the behaviour that users will expect, given that there is
no explicit mechanism to set unmodified VRF-unaware application into a
default VRF.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 15:00:58 -05:00
Florian Fainelli
bf9f26485d net: dsa: Hook {get,set}_rxnfc ethtool operations
In preparation for adding support for CFP/TCAMP in the bcm_sf2 driver add the
plumbing to call into driver specific {get,set}_rxnfc operations.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 14:49:57 -05:00
NeilBrown
4c3ffd058c SUNRPC: two small improvements to rpcauth shrinker.
1/ If we find an entry that is too young to be pruned,
  return SHRINK_STOP to ensure we don't get called again.
  This is more correct, and avoids wasting a little CPU time.
  Prior to 3.12, it can prevent drop_slab() from spinning indefinitely.

2/ Return a precise number from rpcauth_cache_shrink_count(), rather than
  rounding down to a multiple of 100 (of whatever sysctl_vfs_cache_pressure is).
  This ensures that when we "echo 3 > /proc/sys/vm/drop_caches", this cache is
  still purged, even if it has fewer than 100 entires.

Neither of these are really important, they just make behaviour
more predicatable, which can be helpful when debugging related issues.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-01-30 13:14:50 -05:00
Oliver Hartkopp
a06393ed03 can: bcm: fix hrtimer/tasklet termination in bcm op removal
When removing a bcm tx operation either a hrtimer or a tasklet might run.
As the hrtimer triggers its associated tasklet and vice versa we need to
take care to mutually terminate both handlers.

Reported-by: Michael Josenhans <michael.josenhans@web.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Tested-by: Michael Josenhans <michael.josenhans@web.de>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-01-30 11:05:04 +01:00
Steffen Klassert
1995876a06 xfrm: Add a dummy network device for napi.
This patch adds a dummy network device so that we can
use gro_cells for IPsec GRO. With this, we handle IPsec
GRO with no impact on the generic networking code.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-30 06:45:43 +01:00
Steffen Klassert
f991bb9da1 net: Drop secpath on free after gro merge.
With a followup patch, a gro merged skb can have a secpath.
So drop it before freeing or reusing the skb.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-30 06:45:38 +01:00
Rafał Miłecki
40be0dda07 net: add devm version of alloc_etherdev_mqs function
This patch adds devm_alloc_etherdev_mqs function and devm_alloc_etherdev
macro. These can be used for simpler netdev allocation without having to
care about calling free_netdev.

Thanks to this change drivers, their error paths and removal paths may
get simpler by a bit.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29 19:24:12 -05:00
David S. Miller
936f459bea Merge tag 'batadv-next-for-davem-20170128' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:

====================
Here are two fixes for batman-adv for net-next:

 - fix double call of dev_queue_xmit(), caused by the recent introduction
   of net_xmit_eval(), by Sven Eckelmann

 - Fix includes for IS_ERR/ERR_PTR, by Sven Eckelmann
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29 19:21:26 -05:00
Yuchung Cheng
678550c651 tcp: include locally failed retries in retransmission stats
Currently the retransmission stats are not incremented if the
retransmit fails locally. But we always increment the other packet
counters that track total packet/bytes sent.  Awkwardly while we
don't count these failed retransmits in RETRANSSEGS, we do count
them in FAILEDRETRANS.

If the qdisc is dropping many packets this could under-estimate
TCP retransmission rate substantially from both SNMP or per-socket
TCP_INFO stats. This patch changes this by always incrementing
retransmission stats on retransmission attempts and failures.

Another motivation is to properly track retransmists in
SCM_TIMESTAMPING_OPT_STATS. Since SCM_TSTAMP_SCHED collection is
triggered in tcp_transmit_skb(), If tp->total_retrans is incremented
after the function, we'll always mis-count by the amount of the
latest retransmission.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29 19:17:23 -05:00
Yuchung Cheng
7e98102f48 tcp: record pkts sent and retransmistted
Add two stats in SCM_TIMESTAMPING_OPT_STATS:

TCP_NLA_DATA_SEGS_OUT: total data packets sent including retransmission
TCP_NLA_TOTAL_RETRANS: total data packets retransmitted

The names are picked to be consistent with corresponding fields in
TCP_INFO. This allows applications that are using the timestamping
API to measure latency stats to also retrive retransmission rate
of application write.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29 19:17:23 -05:00
andy zhou
5b8784aaf2 openvswitch: Simplify do_execute_actions().
do_execute_actions() implements a worthwhile optimization: in case
an output action is the last action in an action list, skb_clone()
can be avoided by outputing the current skb. However, the
implementation is more complicated than necessary.  This patch
simplify this logic.

Signed-off-by: Andy Zhou <azhou@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29 19:00:16 -05:00
Vivien Didelot
f123f2fbed net: dsa: pass bridge device when a port leaves
Upon reception of the NETDEV_CHANGEUPPER, a leaving port is already
unbridged, so reflect this by assigning the port's bridge_dev pointer to
NULL before calling the port_bridge_leave DSA driver operation.

Now that the bridge_dev pointer is exposed to the drivers, reflecting
the current state of the DSA switch fabric is necessary for the drivers
to adjust their port based VLANs correctly.

Pass the bridge device pointer to the port_bridge_leave operation so
that drivers have all information to re-program their chips properly,
and do not need to cache it anymore.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29 18:42:46 -05:00
Vivien Didelot
a5e9a02e1f net: dsa: move bridge device in dsa_port
Move the bridge_dev pointer from dsa_slave_priv to dsa_port so that DSA
drivers can access this information and remove the need to cache it.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29 18:42:46 -05:00
Vivien Didelot
afdcf151c1 net: dsa: store a dsa_port in dsa_slave_priv
Store a pointer to the dsa_port structure in the dsa_slave_priv
structure, instead of the switch/port index.

This will allow to store more information such as the bridge device,
needed in DSA drivers for multi-chip configuration.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29 18:42:46 -05:00
Vivien Didelot
818be8489d net: dsa: add ds and index to dsa_port
Add the physical switch instance and port index a DSA port belongs to to
the dsa_port structure.

That can be used later to retrieve information about a physical port
when configuring a switch fabric, or lighten up struct dsa_slave_priv.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29 18:42:46 -05:00
Vivien Didelot
26895e299c net: dsa: use ds->num_ports when possible
The dsa_switch structure contains the number of ports. Use it where the
structure is valid instead of the DSA_MAX_PORTS value.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29 18:42:46 -05:00
Vivien Didelot
a0c02161ec net: dsa: variable number of ports
Change the ports[DSA_MAX_PORTS] array of the dsa_switch structure for a
zero-length array, allocated at the same time as the dsa_switch
structure itself. A dsa_switch_alloc() helper is provided for that.

This commit brings no functional change yet since we pass DSA_MAX_PORTS
as the number of ports for the moment. Future patches can update the DSA
drivers separately to support dynamic number of ports.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29 18:42:46 -05:00
Eric Dumazet
f1712c7371 can: Fix kernel panic at security_sock_rcv_skb
Zhang Yanmin reported crashes [1] and provided a patch adding a
synchronize_rcu() call in can_rx_unregister()

The main problem seems that the sockets themselves are not RCU
protected.

If CAN uses RCU for delivery, then sockets should be freed only after
one RCU grace period.

Recent kernels could use sock_set_flag(sk, SOCK_RCU_FREE), but let's
ease stable backports with the following fix instead.

[1]
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81495e25>] selinux_socket_sock_rcv_skb+0x65/0x2a0

Call Trace:
 <IRQ>
 [<ffffffff81485d8c>] security_sock_rcv_skb+0x4c/0x60
 [<ffffffff81d55771>] sk_filter+0x41/0x210
 [<ffffffff81d12913>] sock_queue_rcv_skb+0x53/0x3a0
 [<ffffffff81f0a2b3>] raw_rcv+0x2a3/0x3c0
 [<ffffffff81f06eab>] can_rcv_filter+0x12b/0x370
 [<ffffffff81f07af9>] can_receive+0xd9/0x120
 [<ffffffff81f07beb>] can_rcv+0xab/0x100
 [<ffffffff81d362ac>] __netif_receive_skb_core+0xd8c/0x11f0
 [<ffffffff81d36734>] __netif_receive_skb+0x24/0xb0
 [<ffffffff81d37f67>] process_backlog+0x127/0x280
 [<ffffffff81d36f7b>] net_rx_action+0x33b/0x4f0
 [<ffffffff810c88d4>] __do_softirq+0x184/0x440
 [<ffffffff81f9e86c>] do_softirq_own_stack+0x1c/0x30
 <EOI>
 [<ffffffff810c76fb>] do_softirq.part.18+0x3b/0x40
 [<ffffffff810c8bed>] do_softirq+0x1d/0x20
 [<ffffffff81d30085>] netif_rx_ni+0xe5/0x110
 [<ffffffff8199cc87>] slcan_receive_buf+0x507/0x520
 [<ffffffff8167ef7c>] flush_to_ldisc+0x21c/0x230
 [<ffffffff810e3baf>] process_one_work+0x24f/0x670
 [<ffffffff810e44ed>] worker_thread+0x9d/0x6f0
 [<ffffffff810e4450>] ? rescuer_thread+0x480/0x480
 [<ffffffff810ebafc>] kthread+0x12c/0x150
 [<ffffffff81f9ccef>] ret_from_fork+0x3f/0x70

Reported-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29 18:30:56 -05:00
Linus Torvalds
d56a5ca366 Merge tag 'nfs-for-4.10-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
 "Stable patches:
   - NFSv4.1: Fix a deadlock in layoutget
   - NFSv4 must not bump sequence ids on NFS4ERR_MOVED errors
   - NFSv4 Fix a regression with OPEN EXCLUSIVE4 mode
   - Fix a memory leak when removing the SUNRPC module

  Bugfixes:
   - Fix a reference leak in _pnfs_return_layout"

* tag 'nfs-for-4.10-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  pNFS: Fix a reference leak in _pnfs_return_layout
  nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED"
  SUNRPC: cleanup ida information when removing sunrpc module
  NFSv4.0: always send mode in SETATTR after EXCLUSIVE4
  nfs: Don't increment lock sequence ID after NFS4ERR_MOVED
  NFSv4.1: Fix a deadlock in layoutget
2017-01-28 11:50:17 -08:00
David S. Miller
4e8f2fc1a5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two trivial overlapping changes conflicts in MPLS and mlx5.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-28 10:33:06 -05:00
Sven Eckelmann
3e7514afc7 batman-adv: Fix includes for IS_ERR/ERR_PTR
IS_ERR/ERR_PTR are not defined in linux/device.h but in linux/err.h. The
files using these macros therefore have to include the correct one.

Reported-by: Linus Luessing <linus.luessing@web.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-28 10:40:35 +01:00
Sven Eckelmann
7c946062b3 batman-adv: Fix double call of dev_queue_xmit
The net_xmit_eval has side effects because it is not making sure that e
isn't evaluated twice.

    #define net_xmit_eval(e)        ((e) == NET_XMIT_CN ? 0 : (e))

The code requested by David Miller [1]

    return net_xmit_eval(dev_queue_xmit(skb));

will get transformed into

    return ((dev_queue_xmit(skb)) == NET_XMIT_CN ? 0 : (dev_queue_xmit(skb)))

dev_queue_xmit will therefore be tried again (with an already consumed skb)
whenever the return code is not NET_XMIT_CN.

[1] https://lkml.kernel.org/r/20170125.225624.965229145391320056.davem@davemloft.net

Fixes: c33705188c ("batman-adv: Treat NET_XMIT_CN as transmit successfully")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-28 10:40:35 +01:00
Linus Torvalds
1b1bc42c16 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) GTP fixes from Andreas Schultz (missing genl module alias, clear IP
    DF on transmit).

 2) Netfilter needs to reflect the fwmark when sending resets, from Pau
    Espin Pedrol.

 3) nftable dump OOPS fix from Liping Zhang.

 4) Fix erroneous setting of VIRTIO_NET_HDR_F_DATA_VALID on transmit,
    from Rolf Neugebauer.

 5) Fix build error of ipt_CLUSTERIP when procfs is disabled, from Arnd
    Bergmann.

 6) Fix regression in handling of NETIF_F_SG in harmonize_features(),
    from Eric Dumazet.

 7) Fix RTNL deadlock wrt. lwtunnel module loading, from David Ahern.

 8) tcp_fastopen_create_child() needs to setup tp->max_window, from
    Alexey Kodanev.

 9) Missing kmemdup() failure check in ipv6 segment routing code, from
    Eric Dumazet.

10) Don't execute unix_bind() under the bindlock, otherwise we deadlock
    with splice. From WANG Cong.

11) ip6_tnl_parse_tlv_enc_lim() potentially reallocates the skb buffer,
    therefore callers must reload cached header pointers into that skb.
    Fix from Eric Dumazet.

12) Fix various bugs in legacy IRQ fallback handling in alx driver, from
    Tobias Regnery.

13) Do not allow lwtunnel drivers to be unloaded while they are
    referenced by active instances, from Robert Shearman.

14) Fix truncated PHY LED trigger names, from Geert Uytterhoeven.

15) Fix a few regressions from virtio_net XDP support, from John
    Fastabend and Jakub Kicinski.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (102 commits)
  ISDN: eicon: silence misleading array-bounds warning
  net: phy: micrel: add support for KSZ8795
  gtp: fix cross netns recv on gtp socket
  gtp: clear DF bit on GTP packet tx
  gtp: add genl family modules alias
  tcp: don't annotate mark on control socket from tcp_v6_send_response()
  ravb: unmap descriptors when freeing rings
  virtio_net: reject XDP programs using header adjustment
  virtio_net: use dev_kfree_skb for small buffer XDP receive
  r8152: check rx after napi is enabled
  r8152: re-schedule napi for tx
  r8152: avoid start_xmit to schedule napi when napi is disabled
  r8152: avoid start_xmit to call napi_schedule during autosuspend
  net: dsa: Bring back device detaching in dsa_slave_suspend()
  net: phy: leds: Fix truncated LED trigger names
  net: phy: leds: Break dependency of phy.h on phy_led_triggers.h
  net: phy: leds: Clear phy_num_led_triggers on failure to avoid crash
  net-next: ethernet: mediatek: change the compatible string
  Documentation: devicetree: change the mediatek ethernet compatible string
  bnxt_en: Fix RTNL lock usage on bnxt_get_port_module_status().
  ...
2017-01-27 12:54:16 -08:00
Eric Dumazet
158f323b98 net: adjust skb->truesize in pskb_expand_head()
Slava Shwartsman reported a warning in skb_try_coalesce(), when we
detect skb->truesize is completely wrong.

In his case, issue came from IPv6 reassembly coping with malicious
datagrams, that forced various pskb_may_pull() to reallocate a bigger
skb->head than the one allocated by NIC driver before entering GRO
layer.

Current code does not change skb->truesize, leaving this burden to
callers if they care enough.

Blindly changing skb->truesize in pskb_expand_head() is not
easy, as some producers might track skb->truesize, for example
in xmit path for back pressure feedback (sk->sk_wmem_alloc)

We can detect the cases where it should be safe to change
skb->truesize :

1) skb is not attached to a socket.
2) If it is attached to a socket, destructor is sock_edemux()

My audit gave only two callers doing their own skb->truesize
manipulation.

I had to remove skb parameter in sock_edemux macro when
CONFIG_INET is not set to avoid a compile error.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27 12:03:29 -05:00
Pablo Neira
92e55f412c tcp: don't annotate mark on control socket from tcp_v6_send_response()
Unlike ipv4, this control socket is shared by all cpus so we cannot use
it as scratchpad area to annotate the mark that we pass to ip6_xmit().

Add a new parameter to ip6_xmit() to indicate the mark. The SCTP socket
family caches the flowi6 structure in the sctp_transport structure, so
we cannot use to carry the mark unless we later on reset it back, which
I discarded since it looks ugly to me.

Fixes: bf99b4ded5 ("tcp: fix mark propagation with fwmark_reflect enabled")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27 10:33:56 -05:00
Felix Jia
45ce0fd19d net/ipv6: support more tunnel interfaces for EUI64 link-local generation
Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27 10:25:34 -05:00
Felix Jia
d35a00b8e3 net/ipv6: allow sysctl to change link-local address generation mode
The address generation mode for IPv6 link-local can only be configured
by netlink messages. This patch adds the ability to change the address
generation mode via sysctl.

v1 -> v2
Removed the rtnl lock and switch to use RCU lock to iterate through
the netdev list.

v2 -> v3
Removed the addrgenmode variable from the idev structure and use the
systcl storage for the flag.

Simplifed the logic for sysctl handling by removing the supported
for all operation.

Added support for more types of tunnel interfaces for link-local
address generation.

Based the patches from net-next.

v3 -> v4
Removed unnecessary whitespace changes.

Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27 10:25:34 -05:00
David Ahern
1f17e2f2c8 net: ipv6: ignore null_entry on route dumps
lkp-robot reported a BUG:
[   10.151226] BUG: unable to handle kernel NULL pointer dereference at 00000198
[   10.152525] IP: rt6_fill_node+0x164/0x4b8
[   10.153307] *pdpt = 0000000012ee5001 *pde = 0000000000000000
[   10.153309]
[   10.154492] Oops: 0000 [#1]
[   10.154987] CPU: 0 PID: 909 Comm: netifd Not tainted 4.10.0-rc4-00722-g41e8c70ee162-dirty #10
[   10.156482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[   10.158254] task: d0deb000 task.stack: d0e0c000
[   10.159059] EIP: rt6_fill_node+0x164/0x4b8
[   10.159780] EFLAGS: 00010296 CPU: 0
[   10.160404] EAX: 00000000 EBX: d10c2358 ECX: c1f7c6cc EDX: c1f6ff44
[   10.161469] ESI: 00000000 EDI: c2059900 EBP: d0e0dc4c ESP: d0e0dbe4
[   10.162534]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[   10.163482] CR0: 80050033 CR2: 00000198 CR3: 10d94660 CR4: 000006b0
[   10.164535] Call Trace:
[   10.164993]  ? paravirt_sched_clock+0x9/0xd
[   10.165727]  ? sched_clock+0x9/0xc
[   10.166329]  ? sched_clock_cpu+0x19/0xe9
[   10.166991]  ? lock_release+0x13e/0x36c
[   10.167652]  rt6_dump_route+0x4c/0x56
[   10.168276]  fib6_dump_node+0x1d/0x3d
[   10.168913]  fib6_walk_continue+0xab/0x167
[   10.169611]  fib6_walk+0x2a/0x40
[   10.170182]  inet6_dump_fib+0xfb/0x1e0
[   10.170855]  netlink_dump+0xcd/0x21f

This happens when the loopback device is set down and a ipv6 fib route
dump is requested.

ip6_null_entry is the root of all ipv6 fib tables making it integrated
into the table and hence passed to the ipv6 route dump code. The
null_entry route uses the loopback device for dst.dev but may not have
rt6i_idev set because of the order in which initializations are done --
ip6_route_net_init is run before addrconf_init has initialized the
loopback device. Fixing the initialization order is a much bigger problem
with no obvious solution thus far.

The BUG is triggered when the loopback is set down and the netif_running
check added by a1a22c1206 fails. The fill_node descends to checking
rt->rt6i_idev for ignore_routes_with_linkdown and since rt6i_idev is
NULL it faults.

The null_entry route should not be processed in a dump request. Catch
and ignore. This check is done in rt6_dump_route as it is the highest
place in the callchain with knowledge of both the route and the network
namespace.

Fixes: a1a22c1206("net: ipv6: Keep nexthop of multipath route on admin down")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 18:39:16 -05:00
David Ahern
3b7b2b0acd net: ipv6: remove skb_reserve in getroute
Remove skb_reserve and skb_reset_mac_header from inet6_rtm_getroute. The
allocated skb is not passed through the routing engine (like it is for
IPv4) and has not since the beginning of git time.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 18:36:58 -05:00
Florian Fainelli
bc1727d242 net: dsa: Move ports assignment closer to error checking
Move the assignment of ports in _dsa_register_switch() closer to where
it is checked, no functional change. Re-order declarations to be
preserve the inverted christmas tree style.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 15:43:53 -05:00
Florian Fainelli
3512a8e95e net: dsa: Suffix function manipulating device_node with _dn
Make it clear that these functions take a device_node structure pointer

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 15:43:53 -05:00
Florian Fainelli
293784a8f8 net: dsa: Make most functions take a dsa_port argument
In preparation for allowing platform data, and therefore no valid
device_node pointer, make most DSA functions takes a pointer to a
dsa_port structure whenever possible. While at it, introduce a
dsa_port_is_valid() helper function which checks whether port->dn is
NULL or not at the moment.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 15:43:53 -05:00
Florian Fainelli
55ed0ce089 net: dsa: Pass device pointer to dsa_register_switch
In preparation for allowing dsa_register_switch() to be supplied with
device/platform data, pass down a struct device pointer instead of a
struct device_node.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 15:43:52 -05:00
David S. Miller
49b3eb7725 Merge tag 'batadv-next-for-davem-20170126' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - ignore self-generated loop detect MAC addresses in translation table,
   by Simon Wunderlich

 - install uapi batman_adv.h header, by Sven Eckelmann

 - bump copyright years, by Sven Eckelmann

 - Remove an unused variable in translation table code, by Sven Eckelmann

 - Handle NET_XMIT_CN like NET_XMIT_SUCCESS (revised according to Davids
   suggestion), and a follow up code clean up, by Gao Feng (2 patches)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 14:31:08 -05:00
David S. Miller
086cb6a412 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains a large batch with Netfilter fixes for
your net tree, they are:

1) Two patches to solve conntrack garbage collector cpu hogging, one to
   remove GC_MAX_EVICTS and another to look at the ratio (scanned entries
   vs. evicted entries) to make a decision on whether to reduce or not
   the scanning interval. From Florian Westphal.

2) Two patches to fix incorrect set element counting if NLM_F_EXCL is
   is not set. Moreover, don't decrenent set->nelems from abort patch
   if -ENFILE which leaks a spare slot in the set. This includes a
   patch to deconstify the set walk callback to update set->ndeact.

3) Two fixes for the fwmark_reflect sysctl feature: Propagate mark to
   reply packets both from nf_reject and local stack, from Pau Espin Pedrol.

4) Fix incorrect handling of loopback traffic in rpfilter and nf_tables
   fib expression, from Liping Zhang.

5) Fix oops on stateful objects netlink dump, when no filter is specified.
   Also from Liping Zhang.

6) Fix a build error if proc is not available in ipt_CLUSTERIP, related
   to fix that was applied in the previous batch for net. From Arnd Bergmann.

7) Fix lack of string validation in table, chain, set and stateful
   object names in nf_tables, from Liping Zhang. Moreover, restrict
   maximum log prefix length to 127 bytes, otherwise explicitly bail
   out.

8) Two patches to fix spelling and typos in nf_tables uapi header file
   and Kconfig, patches from Alexander Alemayhu and William Breathitt Gray.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 12:54:50 -05:00
Amadeusz Sławiński
731977e97b mac80211: use helper function to access ieee802_1d_to_ac[]
cleanup patch to make use of ieee80211_ac_from_tid() to retrieve ac from
ieee802_1d_to_ac[]

Signed-off-by: Amadeusz Sławiński <amadeusz.slawinski@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-26 09:50:44 +01:00
Gao Feng
c33705188c batman-adv: Treat NET_XMIT_CN as transmit successfully
The tc could return NET_XMIT_CN as one congestion notification, but
it does not mean the packet is lost. Other modules like ipvlan,
macvlan, and others treat NET_XMIT_CN as success too.

So batman-adv should handle NET_XMIT_CN also as NET_XMIT_SUCCESS.

Signed-off-by: Gao Feng <gfree.wind@gmail.com>
[sven@narfation.org: Moved NET_XMIT_CN handling to batadv_send_skb_packet]
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-26 08:41:18 +01:00
Gao Feng
0843f197c4 batman-adv: Remove one condition check in batadv_route_unicast_packet
It could decrease one condition check to collect some statements in the
first condition block.

Signed-off-by: Gao Feng <gfree.wind@gmail.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-26 08:37:01 +01:00
Sven Eckelmann
269cee6218 batman-adv: Remove unused variable in batadv_tt_local_set_flags
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-26 08:34:20 +01:00