linux

Author	SHA1	Message	Date
Achiad Shochat	fe9f4fe58d	net/mlx5e: Return error in case mlx5e_set_features() fails In case mlx5e_set_features() fails, return the failure status rather than 0. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-03 10:41:50 -05:00
Achiad Shochat	3435ab59d3	net/mlx5e: Don't allow more than max supported channels Consider MLX5E_MAX_NUM_CHANNELS @ethtool set/get_channels Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-03 10:41:50 -05:00
Achiad Shochat	61d0e73e0a	net/mlx5_core: Use the the real irqn in eq->irqn Instead of storing the msix array index in eq->irqn (vecidx), store the real irq number. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-03 10:41:50 -05:00
Achiad Shochat	01c196a2d3	net/mlx5e: Wait for RX buffers initialization in a more proper manner Use jiffies rather than wait loop with msleep(). The wait loop didn't take into consideration time when the process was not executing. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-03 10:41:50 -05:00
Achiad Shochat	a198574090	net/mlx5e: Avoid NULL pointer access in case of configuration failure In case a configuration operation that involves closing and re-opening resources (e.g RX/TX queue size change) fails at the re-opening stage these resources will remain closed. So when executing (following) configuration operations (e.g ifconfig down) we cannot assume that these resources are available. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-03 10:41:50 -05:00
Jarod Wilson	fd867d51f8	net/core: generic support for disabling netdev features down stack There are some netdev features, which when disabled on an upper device, such as a bonding master or a bridge, must be disabled and cannot be re-enabled on underlying devices. This is a rework of an earlier more heavy-handed appraoch, which simply disables and prevents re-enabling of netdev features listed in a new define in include/net/netdev_features.h, NETIF_F_UPPER_DISABLES. Any upper device that disables a flag in that feature mask, the disabling will propagate down the stack, and any lower device that has any upper device with one of those flags disabled should not be able to enable said flag. Initially, only LRO is included for proof of concept, and because this code effectively does the same thing as dev_disable_lro(), though it will also activate from the ethtool path, which was one of the goals here. [root@dell-per730-01 ~]# ethtool -k bond0 \|grep large large-receive-offload: on [root@dell-per730-01 ~]# ethtool -k p5p1 \|grep large large-receive-offload: on [root@dell-per730-01 ~]# ethtool -K bond0 lro off [root@dell-per730-01 ~]# ethtool -k bond0 \|grep large large-receive-offload: off [root@dell-per730-01 ~]# ethtool -k p5p1 \|grep large large-receive-offload: off dmesg dump: [ 1033.277986] bond0: Disabling feature 0x0000000000008000 on lower dev p5p2. [ 1034.067949] bnx2x 0000:06:00.1 p5p2: using MSI-X IRQs: sp 74 fp[0] 76 ... fp[7] 83 [ 1034.753612] bond0: Disabling feature 0x0000000000008000 on lower dev p5p1. [ 1035.591019] bnx2x 0000:06:00.0 p5p1: using MSI-X IRQs: sp 62 fp[0] 64 ... fp[7] 71 This has been successfully tested with bnx2x, qlcnic and netxen network cards as slaves in a bond interface. Turning LRO on or off on the master also turns it on or off on each of the slaves, new slaves are added with LRO in the same state as the master, and LRO can't be toggled on the slaves. Also, this should largely remove the need for dev_disable_lro(), and most, if not all, of its call sites can be replaced by simply making sure NETIF_F_LRO isn't included in the relevant device's feature flags. Note that this patch is driven by bug reports from users saying it was confusing that bonds and slaves had different settings for the same features, and while it won't be 100% in sync if a lower device doesn't support a feature like LRO, I think this is a good step in the right direction. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Nikolay Aleksandrov <razor@blackwall.org> CC: Michal Kubecek <mkubecek@suse.cz> CC: Alexander Duyck <alexander.duyck@gmail.com> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 23:41:31 -05:00
Sergei Shtylyov	c238041f51	sh_eth: fix typo in RX descriptor bit name The correct name of the RX descriptor 0 bit 30 is RDLE (receive descriptor list end), not RDEL. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 23:01:06 -05:00
David S. Miller	a176ded3f4	Merge branch 'bonding-actor-updates' Mahesh Bandewar says: ==================== re-org actor admin/oper key updates I was observing machines entering into weird LACP state when the partner is in passive mode. This issue is not because of the partners in passive state but probably because of some operational key update which is pushing the state-machine is that weird state. This was happening randomly on about 1% of the machine (when the sample size is a large set of machines with a variety of NICs/ports bonded). In this patch-series I'm attempting to unify the logic of actor-key / operational-key changes to one place to avoid possible errors in update. Also this eliminates the need for the event-handler to decide if the key needs update. After this patch-set none of the machines (from same sample set) were exhibiting LACP-weirdness that was observed earlier. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 22:52:24 -05:00
Mahesh Bandewar	52bc671681	bonding: simplify / unify event handling code for 3ad mode. Old logic of updating state-machine is not required since ad_update_actor_keys() does it implicitly. The only loss is the notification differentiation between speed vs. duplex change. Now only one unified notification is printed. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 22:52:24 -05:00
Mahesh Bandewar	7bb11dc9f5	bonding: unify all places where actor-oper key needs to be updated. actor_admin, and actor_oper key is changed at multiple locations in the code. This patch brings all those updates into one location in an attempt to avoid possible inconsistent updates causing LACP state machine to go in weird state. The unified place is ad_update_actor_key() with simple state-machine logic - (a) If port is "duplex" then only it can participate in LACP (b) Speed change reinitializes the LACP state-machine. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 22:52:24 -05:00
Mahesh Bandewar	b25c2e7d3c	bonding: Simplify __get_duplex function. Eliminate 'else' clause by simply initializing variable Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 22:52:24 -05:00
David S. Miller	12d4309636	Merge branch 'bpf-persistent' Daniel Borkmann says: ==================== BPF updates This set adds support for persistent maps/progs. Please see individual patches for further details. A man-page update to bpf(2) will be sent later on, also a iproute2 patch for support in tc. v1 -> v2: - Reworked most of patch 4 and 5 - Rebased to latest net-next ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 22:48:39 -05:00
Daniel Borkmann	42984d7c1e	bpf: add sample usages for persistent maps/progs This patch adds a couple of stand-alone examples on how BPF_OBJ_PIN and BPF_OBJ_GET commands can be used. Example with maps: # ./fds_example -F /sys/fs/bpf/m -P -m -k 1 -v 42 bpf: map fd:3 (Success) bpf: pin ret:(0,Success) bpf: fd:3 u->(1:42) ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 bpf: get fd:3 (Success) bpf: fd:3 l->(1):42 ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 -v 24 bpf: get fd:3 (Success) bpf: fd:3 u->(1:24) ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 bpf: get fd:3 (Success) bpf: fd:3 l->(1):24 ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m2 -P -m bpf: map fd:3 (Success) bpf: pin ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m2 -G -m -k 1 bpf: get fd:3 (Success) bpf: fd:3 l->(1):0 ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m2 -G -m bpf: get fd:3 (Success) Example with progs: # ./fds_example -F /sys/fs/bpf/p -P -p bpf: prog fd:3 (Success) bpf: pin ret:(0,Success) bpf sock:4 <- fd:3 attached ret:(0,Success) # ./fds_example -F /sys/fs/bpf/p -G -p bpf: get fd:3 (Success) bpf: sock:4 <- fd:3 attached ret:(0,Success) # ./fds_example -F /sys/fs/bpf/p2 -P -p -o ./sockex1_kern.o bpf: prog fd:5 (Success) bpf: pin ret:(0,Success) bpf: sock:3 <- fd:5 attached ret:(0,Success) # ./fds_example -F /sys/fs/bpf/p2 -G -p bpf: get fd:3 (Success) bpf: sock:4 <- fd:3 attached ret:(0,Success) Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 22:48:39 -05:00
Daniel Borkmann	b2197755b2	bpf: add support for persistent maps/progs This work adds support for "persistent" eBPF maps/programs. The term "persistent" is to be understood that maps/programs have a facility that lets them survive process termination. This is desired by various eBPF subsystem users. Just to name one example: tc classifier/action. Whenever tc parses the ELF object, extracts and loads maps/progs into the kernel, these file descriptors will be out of reach after the tc instance exits. So a subsequent tc invocation won't be able to access/relocate on this resource, and therefore maps cannot easily be shared, f.e. between the ingress and egress networking data path. The current workaround is that Unix domain sockets (UDS) need to be instrumented in order to pass the created eBPF map/program file descriptors to a third party management daemon through UDS' socket passing facility. This makes it a bit complicated to deploy shared eBPF maps or programs (programs f.e. for tail calls) among various processes. We've been brainstorming on how we could tackle this issue and various approches have been tried out so far, which can be read up further in the below reference. The architecture we eventually ended up with is a minimal file system that can hold map/prog objects. The file system is a per mount namespace singleton, and the default mount point is /sys/fs/bpf/. Any subsequent mounts within a given namespace will point to the same instance. The file system allows for creating a user-defined directory structure. The objects for maps/progs are created/fetched through bpf(2) with two new commands (BPF_OBJ_PIN/BPF_OBJ_GET). I.e. a bpf file descriptor along with a pathname is being passed to bpf(2) that in turn creates (we call it eBPF object pinning) the file system nodes. Only the pathname is being passed to bpf(2) for getting a new BPF file descriptor to an existing node. The user can use that to access maps and progs later on, through bpf(2). Removal of file system nodes is being managed through normal VFS functions such as unlink(2), etc. The file system code is kept to a very minimum and can be further extended later on. The next step I'm working on is to add dump eBPF map/prog commands to bpf(2), so that a specification from a given file descriptor can be retrieved. This can be used by things like CRIU but also applications can inspect the meta data after calling BPF_OBJ_GET. Big thanks also to Alexei and Hannes who significantly contributed in the design discussion that eventually let us end up with this architecture here. Reference: https://lkml.org/lkml/2015/10/15/925 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 22:48:39 -05:00
Daniel Borkmann	e9d8afa90b	bpf: consolidate bpf_prog_put{, _rcu} dismantle paths We currently have duplicated cleanup code in bpf_prog_put() and bpf_prog_put_rcu() cleanup paths. Back then we decided that it was not worth it to make it a common helper called by both, but with the recent addition of resource charging, we could have avoided the fix in commit `ac00737f4e` ("bpf: Need to call bpf_prog_uncharge_memlock from bpf_prog_put") if we would have had only a single, common path. We can simplify it further by assigning aux->prog only once during allocation time. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 22:48:39 -05:00
Daniel Borkmann	c210129760	bpf: align and clean bpf_{map,prog}_get helpers Add a bpf_map_get() function that we're going to use later on and align/clean the remaining helpers a bit so that we have them a bit more consistent: - __bpf_map_get() and __bpf_prog_get() that both work on the fd struct, check whether the descriptor is eBPF and return the pointer to the map/prog stored in the private data. Also, we can return f.file->private_data directly, the function signature is enough of a documentation already. - bpf_map_get() and bpf_prog_get() that both work on u32 user fd, call their respective __bpf_map_get()/__bpf_prog_get() variants, and take a reference. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 22:48:39 -05:00
Daniel Borkmann	aa79781b65	bpf: abstract anon_inode_getfd invocations Since we're going to use anon_inode_getfd() invocations in more than just the current places, make a helper function for both, so that we only need to pass a map/prog pointer to the helper itself in order to get a fd. The new helpers are called bpf_map_new_fd() and bpf_prog_new_fd(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 22:48:39 -05:00
Eric Dumazet	1d6119baf0	net: fix percpu memory leaks This patch fixes following problems : 1) percpu_counter_init() can return an error, therefore init_frag_mem_limit() must propagate this error so that inet_frags_init_net() can do the same up to its callers. 2) If ip[46]_frags_ns_ctl_register() fail, we must unwind properly and free the percpu_counter. Without this fix, we leave freed object in percpu_counters global list (if CONFIG_HOTPLUG_CPU) leading to crashes. This bug was detected by KASAN and syzkaller tool (http://github.com/google/syzkaller) Fixes: `6d7b857d54` ("net: use lib/percpu_counter API for fragmentation mem accounting") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 22:47:14 -05:00
Simon Horman	c451113291	ravb: use pdev rather than ndev for error messages This corrects what appear to be typos, making the code consistent with itself, and allowing meaningful prefixes to be displayed with the errors in question. Before: (null): failed to initialize MDIO (null): Cannot allocate desc base address table (size 176 bytes) After: ravb e6800000.ethernet: failed to initialize MDIO ravb e6800000.ethernet: Cannot allocate desc base address table (size 176 bytes) Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 16:32:37 -05:00
Eric Dumazet	9e17f8a475	net: make skb_set_owner_w() more robust skb_set_owner_w() is called from various places that assume skb->sk always point to a full blown socket (as it changes sk->sk_wmem_alloc) We'd like to attach skb to request sockets, and in the future to timewait sockets as well. For these kind of pseudo sockets, we need to take a traditional refcount and use sock_edemux() as the destructor. It is now time to un-inline skb_set_owner_w(), being too big. Fixes: `ca6fb06518` ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Bisected-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 16:28:49 -05:00
Ido Schimmel	eca1e006cf	bridge: vlan: Use rcu_dereference instead of rtnl_dereference br_should_learn() is protected by RCU and not by RTNL, so use correct flavor of nbp_vlan_group(). Fixes: `907b1e6e83` ("bridge: vlan: use proper rcu for the vlgrp member") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 16:27:39 -05:00
Vivien Didelot	b9b377136e	net: dsa: mv88e6xxx: lookup switch name All the mv88e6xxx drivers use the exact same code in their probe function to lookup the switch name given its ID. Thus introduce a mv88e6xxx_switch_id structure and a mv88e6xxx_lookup_name function in the common mv88e6xxx code. In the meantime make __mv88e6xxx_reg_{read,write} static since we do not need to expose these low-level r/w routines anymore. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 15:54:13 -05:00
Vivien Didelot	3996a4ffb0	net: dsa: mv88e6xxx: assert SMI lock It's easy to forget to lock the smi_mutex before calling the low-level _mv88e6xxx_reg_{read,write}, so add a assert_smi_lock function in them. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 15:54:13 -05:00
Yang Shi	ac00881f92	bpf: convert hashtab lock to raw lock When running bpf samples on rt kernel, it reports the below warning: BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917 in_atomic(): 1, irqs_disabled(): 128, pid: 477, name: ping Preemption disabled at:[<ffff80000017db58>] kprobe_perf_func+0x30/0x228 CPU: 3 PID: 477 Comm: ping Not tainted 4.1.10-rt8 #4 Hardware name: Freescale Layerscape 2085a RDB Board (DT) Call trace: [<ffff80000008a5b0>] dump_backtrace+0x0/0x128 [<ffff80000008a6f8>] show_stack+0x20/0x30 [<ffff8000007da90c>] dump_stack+0x7c/0xa0 [<ffff8000000e4830>] ___might_sleep+0x188/0x1a0 [<ffff8000007e2200>] rt_spin_lock+0x28/0x40 [<ffff80000018bf9c>] htab_map_update_elem+0x124/0x320 [<ffff80000018c718>] bpf_map_update_elem+0x40/0x58 [<ffff800000187658>] __bpf_prog_run+0xd48/0x1640 [<ffff80000017ca6c>] trace_call_bpf+0x8c/0x100 [<ffff80000017db58>] kprobe_perf_func+0x30/0x228 [<ffff80000017dd84>] kprobe_dispatcher+0x34/0x58 [<ffff8000007e399c>] kprobe_handler+0x114/0x250 [<ffff8000007e3bf4>] kprobe_breakpoint_handler+0x1c/0x30 [<ffff800000085b80>] brk_handler+0x88/0x98 [<ffff8000000822f0>] do_debug_exception+0x50/0xb8 Exception stack(0xffff808349687460 to 0xffff808349687580) 7460: 4ca2b600 ffff8083 4a3a7000 ffff8083 49687620 ffff8083 0069c5f8 ffff8000 7480: 00000001 00000000 007e0628 ffff8000 496874b0 ffff8083 007e1de8 ffff8000 74a0: 496874d0 ffff8083 0008e04c ffff8000 00000001 00000000 4ca2b600 ffff8083 74c0: 00ba2e80 ffff8000 49687528 ffff8083 49687510 ffff8083 000e5c70 ffff8000 74e0: 00c22348 ffff8000 00000000 ffff8083 49687510 ffff8083 000e5c74 ffff8000 7500: 4ca2b600 ffff8083 49401800 ffff8083 00000001 00000000 00000000 00000000 7520: 496874d0 ffff8083 00000000 00000000 00000000 00000000 00000000 00000000 7540: 2f2e2d2c 33323130 00000000 00000000 4c944500 ffff8083 00000000 00000000 7560: 00000000 00000000 008751e0 ffff8000 00000001 00000000 124e2d1d 00107b77 Convert hashtab lock to raw lock to avoid such warning. Signed-off-by: Yang Shi <yang.shi@linaro.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 15:45:43 -05:00
David S. Miller	21086b990b	Merge branch 'bridge_vlan_fixes' Nikolay Aleksandrov says: ==================== bridge: vlan: failure path and comment fixes This is a set from Ido which takes care of one failure path error in nbp_vlan_init (patch 1) and a few comment errors (patch 2). I must admit I didn't expect the port init continues after a vlan init failure but should've checked to make sure. Thanks to Ido for catching these! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 15:40:11 -05:00
Ido Schimmel	ddd611d3ff	bridge: vlan: Use correct flag name in comment The flag used to indicate if a VLAN should be used for filtering - as opposed to context only - on the bridge itself (e.g. br0) is called 'brentry' and not 'brvlan'. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 15:40:11 -05:00
Ido Schimmel	07bc588fc1	bridge: vlan: Prevent possible use-after-free When adding a port to a bridge we initialize VLAN filtering on it. We do not bail out in case an error occurred in nbp_vlan_init, as it can be used as a non VLAN filtering bridge. However, if VLAN filtering is required and an error occurred in nbp_vlan_init, we should set vlgrp to NULL, so that VLAN filtering functions (e.g. br_vlan_find, br_get_pvid) will know the struct is invalid and will not try to access it. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 15:40:10 -05:00
Eric Dumazet	ce1050089c	tcp/dccp: fix ireq->pktopts race IPv6 request sockets store a pointer to skb containing the SYN packet to be able to transfer it to full blown socket when 3WHS is done (ireq->pktopts -> np->pktoptions) As explained in commit `5e0724d027` ("tcp/dccp: fix hashdance race for passive sessions"), we must transfer the skb only if we won the hashdance race, if multiple cpus receive the 'ack' packet completing 3WHS at the same time. Fixes: `e994b2f0fb` ("tcp: do not lock listener to process SYN packets") Fixes: `079096f103` ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 15:38:26 -05:00
santosh.shilimkar@oracle.com	7b5654349e	RDS: convert bind hash table to re-sizable hashtable To further improve the RDS connection scalabilty on massive systems where number of sockets grows into tens of thousands of sockets, there is a need of larger bind hashtable. Pre-allocated 8K or 16K table is not very flexible in terms of memory utilisation. The rhashtable infrastructure gives us the flexibility to grow the hashtbable based on use and also comes up with inbuilt efficient bucket(chain) handling. Reviewed-by: David Miller <davem@davemloft.net> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 15:36:23 -05:00
Saurabh Sengar	d3ffaefa1b	net: rds: changing the return type from int to void as result of function rds_iw_flush_mr_pool is nowhere checked, changing its return type from int to void. also removing the unused variable rc as there is nothing to return Signed-off-by: Saurabh Sengar <saurabh.truth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 15:35:19 -05:00
David S. Miller	64cf370887	Merge branch 'encx24j600-fixes' Javier Martinez Canillas says: ==================== net: encx24j600: Fix SPI driver module autoload Recently I've been trying to fix module autoloading for all SPI drivers and found that the encx24j600 driver does not fill module alias information due missing a MODULE_DEVICE_TABLE() so module autload won't work and the driver Kconfig symbol is tristate which means the driver can be built as a module. But also the SPI id table is not correctly defined so this series fixes both issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 15:28:57 -05:00
Javier Martinez Canillas	07f56c616d	net: encx24j600: Export missing SPI module alias information The driver Kconfig symbol is tristate which means that it can be built as a module but the module alias information is not added to the module info so module autoload won't work since user-space won't have the information. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 15:28:56 -05:00
Javier Martinez Canillas	d0cb48cd19	net: encx24j600: Fix SPI id table definition A driver's SPI id table is expected to be an array of struct spi_device_id that ends with a zero-initialized sentinel entry. But this driver defines the table as a single struct spi_device_id and sets .id_table to a pointer to this struct. But spi_match_id() has a loop that iterates while the struct spi_device_id .name[0] is not NULL, so not having a sentinel can cause a NULL pointer deference error. This patch defines the SPI id table correctly as all other SPI drivers do. Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 15:28:56 -05:00
Govindarajulu Varadarajan	322cf7e3a4	enic: assign affinity hint to interrupts The affinity hint is used by the user space daemon, irqbalancer, to indicate a preferred CPU mask for irqs. This patch sets the irq affinity hint to local numa core first, when exausted we try non-local numa cores. Also set tx xps cpus mask bassed on affinity hint. v2: remove the global affinity policy. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 15:08:00 -05:00
Paolo Abeni	9920e48b83	ipv4: use l4 hash for locally generated multipath flows This patch changes how the multipath hash is computed for locally generated flows: now the hash comprises l4 information. This allows better utilization of the available paths when the existing flows have the same source IP and the same destination IP: with l3 hash, even when multiple connections are in place simultaneously, a single path will be used, while with l4 hash we can use all the available paths. v2 changes: - use get_hash_from_flowi4() instead of implementing just another l4 hash function Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 14:38:43 -05:00
Tina Ruchandani	1032a66871	Use 64-bit timekeeping This patch changes the use of struct timespec in dccp_probe to use struct timespec64 instead. timespec uses a 32-bit seconds field which will overflow in the year 2038 and beyond. timespec64 uses a 64-bit seconds field. Note that the correctness of the code isn't changed, since the original code only uses the timestamps to compute a small elapsed interval. This patch is part of a larger attempt to remove instances of 32-bit timekeeping structures (timespec, timeval, time_t) from the kernel so it is easier to identify where the real 2038 issues are. Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-01 17:01:16 -05:00
huangdaode	1cf7d8dda2	net: hisilicon: Remove .owner assignment from platform_driver platform_driver doesn't need to set .owner, because platform_driver_register() will set it. Signed-off-by: huangdaode <huangdaode@hisilicon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-01 16:03:41 -05:00
Vivien Didelot	76e398a627	net: dsa: use switchdev obj for VLAN add/del ops Simplify DSA by pushing the switchdev objects for VLAN add and delete operations down to its drivers. Currently only mv88e6xxx is affected. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-01 15:56:11 -05:00
Stefan Hajnoczi	ea3803c193	VSOCK: define VSOCK_SS_LISTEN once only The SS_LISTEN socket state is defined by both af_vsock.c and vmci_transport.c. This is risky since the value could be changed in one file and the other would be out of sync. Rename from SS_LISTEN to VSOCK_SS_LISTEN since the constant is not part of enum socket_state (SS_CONNECTED, ...). This way it is clear that the constant is vsock-specific. The big text reflow in af_vsock.c was necessary to keep to the maximum line length. Text is unchanged except for s/SS_LISTEN/VSOCK_SS_LISTEN/. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-01 12:14:47 -05:00
David S. Miller	24c39de009	Merge branch 'csum_partial_frags' Hannes Frederic Sowa says: ==================== net: clean up interactions of CHECKSUM_PARTIAL and fragmentation This series fixes wrong checksums on the wire for IPv4 and IPv6. Large send buffers and especially NFS lead to wrong checksums in both IPv4 and IPv6. CHECKSUM_PARTIAL skbs should not receive the respective fragmentations functions, so we add WARN_ON_ONCE to those functions to fix up those as soon as they get reported. Changelog: v2: added v4 checks v3: removed WARN_ON_ONCES (advice by Tom Herbert) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-01 12:01:37 -05:00
Hannes Frederic Sowa	405c92f7a5	ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one of those warn about them once and handle them gracefully by recalculating the checksum. Fixes: commit `32dce968dd` ("ipv6: Allow for partial checksums on non-ufo packets") See-also: commit `72e843bb09` ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL") Cc: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-01 12:01:28 -05:00
Hannes Frederic Sowa	682b1a9d3f	ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets We cannot reliable calculate packet size on MSG_MORE corked sockets and thus cannot decide if they are going to be fragmented later on, so better not use CHECKSUM_PARTIAL in the first place. The IPv6 code also intended to protect and not use CHECKSUM_PARTIAL in the existence of IPv6 extension headers, but the condition was wrong. Fix it up, too. Also the condition to check whether the packet fits into one fragment was wrong and has been corrected. Fixes: commit `32dce968dd` ("ipv6: Allow for partial checksums on non-ufo packets") See-also: commit `72e843bb09` ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL") Cc: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-01 12:01:27 -05:00
Hannes Frederic Sowa	dbd3393c56	ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one of those warn about them once and handle them gracefully by recalculating the checksum. Cc: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-01 12:01:27 -05:00
Hannes Frederic Sowa	d749c9cbff	ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets We cannot reliable calculate packet size on MSG_MORE corked sockets and thus cannot decide if they are going to be fragmented later on, so better not use CHECKSUM_PARTIAL in the first place. Cc: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-01 12:01:27 -05:00
David S. Miller	b75ec3af27	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2015-11-01 00:15:30 -04:00
Linus Torvalds	523e13455e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fix from Sage Weil: "This sets the stable pages flag on the RBD block device when we have CRCs enabled. (This is necessary since the default assumption for block devices changed in 3.9)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: require stable pages if message data CRCs are enabled	2015-10-31 15:19:36 -07:00
Linus Torvalds	4bb0fb57f3	Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs bug fixes from Miklos Szeredi: "This contains fixes for bugs that appeared in earlier kernels (all are marked for -stable)" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: free lower_mnt array in ovl_put_super ovl: free stack of paths in ovl_fill_super ovl: fix open in stacked overlay ovl: fix dentry reference leak ovl: use O_LARGEFILE in ovl_copy_up()	2015-10-31 14:49:19 -07:00
Linus Torvalds	c94eee8a3b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix two regressions in ipv6 route lookups, particularly wrt output interface specifications in the lookup key. From David Ahern. 2) Fix checks in ipv6 IPSEC tunnel pre-encap fragmentation, from Herbert Xu. 3) Fix mis-advertisement of 1000BASE-T on bcm63xx_enet, from Simon Arlott. 4) Some smsc phys misbehave with energy detect mode enabled, so add a DT property and disable it on such switches. From Heiko Schocher. 5) Fix TSO corruption on TX in mv643xx_eth, from Philipp Kirchhofer. 6) Fix regression added by removal of openvswitch vport stats, from James Morse. 7) Vendor Kconfig options should be bool, not tristate, from Andreas Schwab. 8) Use non-_BH() net stats bump in tcp_xmit_probe_skb(), otherwise we barf during TCP REPAIR operations. 9) Fix various bugs in openvswitch conntrack support, from Joe Stringer. 10) Fix NETLINK_LIST_MEMBERSHIPS locking, from David Herrmann. 11) Don't have VSOCK do sock_put() in interrupt context, from Jorgen Hansen. 12) Fix skb_realloc_headroom() failures properly in ISDN, from Karsten Keil. 13) Add some device IDs to qmi_wwan, from Bjorn Mork. 14) Fix ovs egress tunnel information when using lwtunnel devices, from Pravin B Shelar. 15) Add missing NETIF_F_FRAGLIST to macvtab feature list, from Jason Wang. 16) Fix incorrect handling of throw routes when the result of the throw cannot find a match, from Xin Long. 17) Protect ipv6 MTU calculations from wrap-around, from Hannes Frederic Sowa. 18) Fix failed autonegotiation on KSZ9031 micrel PHYs, from Nathan Sullivan. 19) Add missing memory barries in descriptor accesses or xgbe driver, from Thomas Lendacky. 20) Fix release conditon test in pppoe_release(), from Guillaume Nault. 21) Fix gianfar bugs wrt filter configuration, from Claudiu Manoil. 22) Fix violations of RX buffer alignment in sh_eth driver, from Sergei Shtylyov. 23) Fixing missing of_node_put() calls in various places around the networking, from Julia Lawall. 24) Fix incorrect leaf now walking in ipv4 routing tree, from Alexander Duyck. 25) RDS doesn't check pskb_pull()/pskb_trim() return values, from Sowmini Varadhan. 26) Fix VLAN configuration in mlx4 driver, from Jack Morgenstein. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits) ipv6: protect mtu calculation of wrap-around and infinite loop by rounding issues Revert "Merge branch 'ipv6-overflow-arith'" net/mlx4: Copy/set only sizeof struct mlx4_eqe bytes net/mlx4_en: Explicitly set no vlan tags in WQE ctrl segment when no vlan is present vhost: fix performance on LE hosts bpf: sample: define aarch64 specific registers amd-xgbe: Fix race between access of desc and desc index RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv forcedeth: fix unilateral interrupt disabling in netpoll path openvswitch: Fix skb leak using IPv6 defrag ipv6: Export nf_ct_frag6_consume_orig() openvswitch: Fix double-free on ip_defrag() errors fib_trie: leaf_walk_rcu should not compute key if key is less than pn->key net: mv643xx_eth: add missing of_node_put ath6kl: add missing of_node_put net: phy: mdio: add missing of_node_put netdev/phy: add missing of_node_put net: netcp: add missing of_node_put net: thunderx: add missing of_node_put ipv6: gre: support SIT encapsulation ...	2015-10-31 11:52:20 -07:00
Linus Torvalds	38dab9ac1c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input layer fixes from Dmitry Torokhov: - a change to the ALPS driver where we had limit the quirk for trackstick handling from being active on all Dells to just a few models - a fix for a build dependency issue in the sur40 driver - a small clock handling fixup in the LPC32xx touchscreen driver * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: alps - only the Dell Latitude D420/430/620/630 have separate stick button bits Input: sur40 - add dependency on VIDEO_V4L2 Input: lpc32xx_ts - fix warnings caused by enabling unprepared clock	2015-10-30 18:49:44 -07:00
Linus Torvalds	f9793e379b	PCI update for v4.3: NUMA - Prevent out of bounds access in sysfs numa_node override (Sasha Levin) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWMqyJAAoJEFmIoMA60/r8PXMP/jk33V2VHXmcdAnRpdUM09at VEosZlsPXmAGwlcbf0y9BMdwu8M8Kd1LAtiTXhbFchoWi3NG8ECPsIlYnQfqVzhm VEEE4IyMHNfLOgQAf+ZfD5duBjsTDoyl3D2XYa8ugV1jJVs4Vpf2lyWVnwrG32E9 MTf2plaHWtjsser78PA0hQ5w5jJz41acgv9P88mdWmYyr+u2h+G8w+Ro2bLyVsiW dcSIM7L1R6j9Kp52BqXq31rwHXQIF8v+yDaHNTKR6PzcufyuHKsK2fALa7LSam2P EJEj7D8FVPFqYs2XRdPiYI+/wjMcM59CETIZ5NtEzjkQvoeTQhLa3iA8LrS4OMNI JQWbPIHu9dB2Y2fFyeO31kW8+G8zgSKPcdhg9gAdoPspVX387+KHR+aiSMOlGsTu wCyMQsuQSqcNkKGAyPcaQe6AUaI+3Ri3awuBV3/o20tNq2upPqeljvZa6v3W/Ua+ OSKE9rdRxsMzi1M3sLIDYIg0mD3K+horH52A3cjoOXehhSFX8pucbuk6bvYszPxq 0rPLX7fasbVo/yTLz4RgIk9LK2yxpg7TO1MRQb4byCbBqVJU+7R9JxakqstmJGXv W0huOvn776rtcpxItfbckyfCsVhqcZ13xP1osjCcFLciSe+eq4dKCY1iH0OyvSOu S+TkJWdpKEU9L0Tfxjvo =nuxn -----END PGP SIGNATURE----- Merge tag 'pci-v4.3-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "Sorry for this last-minute update; it's been in -next for quite a while, but I forgot about it until I started getting ready for the merge window. It's small and fixes a way a user could cause a panic via sysfs, so I think it's worth getting it in v4.3. NUMA: - Prevent out of bounds access in sysfs numa_node override (Sasha Levin)" * tag 'pci-v4.3-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Prevent out of bounds access in numa_node override	2015-10-30 18:47:18 -07:00

1 2 3 4 5 ...

550124 Commits