linux

Author	SHA1	Message	Date
pravin shelar	fee1fad7c7	vxlan: simplify RTF_LOCAL handling. Avoid code duplicate code for handling RTF_LOCAL routes. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-15 12:16:13 -05:00
pravin shelar	655c3de165	vxlan: improve vxlan route lookup checks. Move route sanity check to respective vxlan[4/6]_get_route functions. This allows us to perform all sanity checks before caching the dst so that we can avoid these checks on subsequent packets. This give move accurate metadata information for packet from fill_metadata_dst(). Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-15 12:16:13 -05:00
pravin shelar	c46b7897ad	vxlan: simplify exception handling vxlan egress path error handling has became complicated, it need to handle IPv4 and IPv6 tunnel cases. Earlier patch removes vlan handling from vxlan_build_skb(), so vxlan_build_skb does not need to free skb and we can simplify the xmit path by having single error handling for both type of tunnels. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-15 12:16:13 -05:00
pravin shelar	03dc52a86d	vxlan: avoid checking socket multiple times. Check the vxlan socket in vxlan6_getroute(). Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-15 12:16:13 -05:00
pravin shelar	4a4f86cc7d	vxlan: avoid vlan processing in vxlan device. VxLan device does not have special handling for vlan taging on egress. Therefore it does not make sense to expose vlan offloading feature. This patch does not change vxlan functinality. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-15 12:16:13 -05:00
Paolo Abeni	c915fe13cb	udplite: fix NULL pointer dereference The commit `850cbaddb5` ("udp: use it's own memory accounting schema") assumes that the socket proto has memory accounting enabled, but this is not the case for UDPLITE. Fix it enabling memory accounting for UDPLITE and performing fwd allocated memory reclaiming on socket shutdown. UDP and UDPLITE share now the same memory accounting limits. Also drop the backlog receive operation, since is no more needed. Fixes: `850cbaddb5` ("udp: use it's own memory accounting schema") Reported-by: Andrei Vagin <avagin@gmail.com> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-15 11:59:38 -05:00
David S. Miller	e6ca4f16a6	Merge branch 'bpf-lru' Martin KaFai Lau says: ==================== bpf: LRU map This patch set adds LRU map implementation to the existing BPF map family. The first few patches introduce the basic BPF LRU list implementation. The later patches introduce the LRU versions of the existing BPF_MAP_TYPE_LRU_[PERCPU_]HASH maps by leveraging the BPF LRU list. v2: - Added a percpu LRU list option which can be specified as a map attribute. [Note: percpu LRU list has nothing to do with the map's value] - Removed the cpu variable from the struct bpf_lru_locallist since it is not needed. - Changed the __bpf_lru_node_move_out to __bpf_lru_node_move_to_free in patch 1 to prepare the percpu LRU list in patch 2. - Moved the test_lru_map under selftests - Refactored a few things in the test codes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-15 11:50:51 -05:00
Martin KaFai Lau	5db58faf98	bpf: Add tests for the LRU bpf_htab This patch has some unit tests and a test_lru_dist. The test_lru_dist reads in the numeric keys from a file. The files used here are generated by a modified fio-genzipf tool originated from the fio test suit. The sample data file can be found here: https://github.com/iamkafai/bpf-lru The zipf.* data files have 100k numeric keys and the key is also ranged from 1 to 100k. The test_lru_dist outputs the number of unique keys (nr_unique). F.e. The following means, 61239 of them is unique out of 100k keys. nr_misses means it cannot be found in the LRU map, so nr_misses must be >= nr_unique. test_lru_dist also simulates a perfect LRU map as a comparison: [root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \ /root/zipf.100k.a1_01.out 4000 1 ... test_parallel_lru_dist (map_type:9 map_flags:0x0): task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31603(/100000) task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000) .... test_parallel_lru_dist (map_type:9 map_flags:0x2): task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31710(/100000) task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000) [root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \ /root/zipf.100k.a0_01.out 40000 1 ... test_parallel_lru_dist (map_type:9 map_flags:0x0): task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67054(/100000) task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000) ... test_parallel_lru_dist (map_type:9 map_flags:0x2): task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67068(/100000) task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000) LRU map has also been added to map_perf_test: /* Global LRU / [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \ ./map_perf_test 16 $i \| awk '{r += $3}END{print r " updates"}'; done 1 cpus: 2934082 updates 4 cpus: 7391434 updates 8 cpus: 6500576 updates / Percpu LRU */ [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \ ./map_perf_test 32 $i \| awk '{r += $3}END{print r " updates"}'; done 1 cpus: 2896553 updates 4 cpus: 9766395 updates 8 cpus: 17460553 updates Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-15 11:50:43 -05:00
Martin KaFai Lau	8f8449384e	bpf: Add BPF_MAP_TYPE_LRU_PERCPU_HASH Provide a LRU version of the existing BPF_MAP_TYPE_PERCPU_HASH Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-15 11:50:20 -05:00
Martin KaFai Lau	29ba732acb	bpf: Add BPF_MAP_TYPE_LRU_HASH Provide a LRU version of the existing BPF_MAP_TYPE_HASH. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-15 11:50:20 -05:00
Martin KaFai Lau	fd91de7b3c	bpf: Refactor codes handling percpu map Refactor the codes that populate the value of a htab_elem in a BPF_MAP_TYPE_PERCPU_HASH typed bpf_map. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-15 11:50:20 -05:00
Martin KaFai Lau	961578b634	bpf: Add percpu LRU list Instead of having a common LRU list, this patch allows a percpu LRU list which can be selected by specifying a map attribute. The map attribute will be added in the later patch. While the common use case for LRU is #reads >> #updates, percpu LRU list allows bpf prog to absorb unusual #updates under pathological case (e.g. external traffic facing machine which could be under attack). Each percpu LRU is isolated from each other. The LRU nodes (including free nodes) cannot be moved across different LRU Lists. Here are the update performance comparison between common LRU list and percpu LRU list (the test code is at the last patch): [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \ ./map_perf_test 16 $i \| awk '{r += $3}END{print r " updates"}'; done 1 cpus: 2934082 updates 4 cpus: 7391434 updates 8 cpus: 6500576 updates [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \ ./map_perf_test 32 $i \| awk '{r += $3}END{printr " updates"}'; done 1 cpus: 2896553 updates 4 cpus: 9766395 updates 8 cpus: 17460553 updates Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-15 11:50:20 -05:00
Martin KaFai Lau	3a08c2fd76	bpf: LRU List Introduce bpf_lru_list which will provide LRU capability to the bpf_htab in the later patch. * General Thoughts: 1. Target use case. Read is more often than update. (i.e. bpf_lookup_elem() is more often than bpf_update_elem()). If bpf_prog does a bpf_lookup_elem() first and then an in-place update, it still counts as a read operation to the LRU list concern. 2. It may be useful to think of it as a LRU cache 3. Optimize the read case 3.1 No lock in read case 3.2 The LRU maintenance is only done during bpf_update_elem() 4. If there is a percpu LRU list, it will lose the system-wise LRU property. A completely isolated percpu LRU list has the best performance but the memory utilization is not ideal considering the work load may be imbalance. 5. Hence, this patch starts the LRU implementation with a global LRU list with batched operations before accessing the global LRU list. As a LRU cache, #read >> #update/#insert operations, it will work well. 6. There is a local list (for each cpu) which is named 'struct bpf_lru_locallist'. This local list is not used to sort the LRU property. Instead, the local list is to batch enough operations before acquiring the lock of the global LRU list. More details on this later. 7. In the later patch, it allows a percpu LRU list by specifying a map-attribute for scalability reason and for use cases that need to prepare for the worst (and pathological) case like DoS attack. The percpu LRU list is completely isolated from each other and the LRU nodes (including free nodes) cannot be moved across the list. The following description is for the global LRU list but mostly applicable to the percpu LRU list also. * Global LRU List: 1. It has three sub-lists: active-list, inactive-list and free-list. 2. The two list idea, active and inactive, is borrowed from the page cache. 3. All nodes are pre-allocated and all sit at the free-list (of the global LRU list) at the beginning. The pre-allocation reasoning is similar to the existing BPF_MAP_TYPE_HASH. However, opting-out prealloc (BPF_F_NO_PREALLOC) is not supported in the LRU map. * Active/Inactive List (of the global LRU list): 1. The active list, as its name says it, maintains the active set of the nodes. We can think of it as the working set or more frequently accessed nodes. The access frequency is approximated by a ref-bit. The ref-bit is set during the bpf_lookup_elem(). 2. The inactive list, as its name also says it, maintains a less active set of nodes. They are the candidates to be removed from the bpf_htab when we are running out of free nodes. 3. The ordering of these two lists is acting as a rough clock. The tail of the inactive list is the older nodes and should be released first if the bpf_htab needs free element. * Rotating the Active/Inactive List (of the global LRU list): 1. It is the basic operation to maintain the LRU property of the global list. 2. The active list is only rotated when the inactive list is running low. This idea is similar to the current page cache. Inactive running low is currently defined as "# of inactive < # of active". 3. The active list rotation always starts from the tail. It moves node without ref-bit set to the head of the inactive list. It moves node with ref-bit set back to the head of the active list and then clears its ref-bit. 4. The inactive rotation is pretty simply. It walks the inactive list and moves the nodes back to the head of active list if its ref-bit is set. The ref-bit is cleared after moving to the active list. If the node does not have ref-bit set, it just leave it as it is because it is already in the inactive list. * Shrinking the Inactive List (of the global LRU list): 1. Shrinking is the operation to get free nodes when the bpf_htab is full. 2. It usually only shrinks the inactive list to get free nodes. 3. During shrinking, it will walk the inactive list from the tail, delete the nodes without ref-bit set from bpf_htab. 4. If no free node found after step (3), it will forcefully get one node from the tail of inactive or active list. Forcefully is in the sense that it ignores the ref-bit. * Local List: 1. Each CPU has a 'struct bpf_lru_locallist'. The purpose is to batch enough operations before acquiring the lock of the global LRU. 2. A local list has two sub-lists, free-list and pending-list. 3. During bpf_update_elem(), it will try to get from the free-list of (the current CPU local list). 4. If the local free-list is empty, it will acquire from the global LRU list. The global LRU list can either satisfy it by its global free-list or by shrinking the global inactive list. Since we have acquired the global LRU list lock, it will try to get at most LOCAL_FREE_TARGET elements to the local free list. 5. When a new element is added to the bpf_htab, it will first sit at the pending-list (of the local list) first. The pending-list will be flushed to the global LRU list when it needs to acquire free nodes from the global list next time. * Lock Consideration: The LRU list has a lock (lru_lock). Each bucket of htab has a lock (buck_lock). If both locks need to be acquired together, the lock order is always lru_lock -> buck_lock and this only happens in the bpf_lru_list.c logic. In hashtab.c, both locks are not acquired together (i.e. one lock is always released first before acquiring another lock). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-15 11:50:20 -05:00
David S. Miller	bb598c1b8c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Several cases of bug fixes in 'net' overlapping other changes in 'net-next-. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-15 10:54:36 -05:00
Linus Torvalds	e76d21c40b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix off by one wrt. indexing when dumping /proc/net/route entries, from Alexander Duyck. 2) Fix lockdep splats in iwlwifi, from Johannes Berg. 3) Cure panic when inserting certain netfilter rules when NFT_SET_HASH is disabled, from Liping Zhang. 4) Memory leak when nft_expr_clone() fails, also from Liping Zhang. 5) Disable UFO when path will apply IPSEC tranformations, from Jakub Sitnicki. 6) Don't bogusly double cwnd in dctcp module, from Florian Westphal. 7) skb_checksum_help() should never actually use the value "0" for the resulting checksum, that has a special meaning, use CSUM_MANGLED_0 instead. From Eric Dumazet. 8) Per-tx/rx queue statistic strings are wrong in qed driver, fix from Yuval MIntz. 9) Fix SCTP reference counting of associations and transports in sctp_diag. From Xin Long. 10) When we hit ip6tunnel_xmit() we could have come from an ipv4 path in a previous layer or similar, so explicitly clear the ipv6 control block in the skb. From Eli Cooper. 11) Fix bogus sleeping inside of inet_wait_for_connect(), from WANG Cong. 12) Correct deivce ID of T6 adapter in cxgb4 driver, from Hariprasad Shenai. 13) Fix potential access past the end of the skb page frag array in tcp_sendmsg(). From Eric Dumazet. 14) 'skb' can legitimately be NULL in inet{,6}_exact_dif_match(). Fix from David Ahern. 15) Don't return an error in tcp_sendmsg() if we wronte any bytes successfully, from Eric Dumazet. 16) Extraneous unlocks in netlink_diag_dump(), we removed the locking but forgot to purge these unlock calls. From Eric Dumazet. 17) Fix memory leak in error path of __genl_register_family(). We leak the attrbuf, from WANG Cong. 18) cgroupstats netlink policy table is mis-sized, from WANG Cong. 19) Several XDP bug fixes in mlx5, from Saeed Mahameed. 20) Fix several device refcount leaks in network drivers, from Johan Hovold. 21) icmp6_send() should use skb dst device not skb->dev to determine L3 routing domain. From David Ahern. 22) ip_vs_genl_family sets maxattr incorrectly, from WANG Cong. 23) We leak new macvlan port in some cases of maclan_common_netlink() errors. Fix from Gao Feng. 24) Similar to the icmp6_send() fix, icmp_route_lookup() should determine L3 routing domain using skb_dst(skb)->dev not skb->dev. Also from David Ahern. 25) Several fixes for route offloading and FIB notification handling in mlxsw driver, from Jiri Pirko. 26) Properly cap __skb_flow_dissect()'s return value, from Eric Dumazet. 27) Fix long standing regression in ipv4 redirect handling, wrt. validating the new neighbour's reachability. From Stephen Suryaputra Lin. 28) If sk_filter() trims the packet excessively, handle it reasonably in tcp input instead of exploding. From Eric Dumazet. 29) Fix handling of napi hash state when copying channels in sfc driver, from Bert Kenward. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (121 commits) mlxsw: spectrum_router: Flush FIB tables during fini net: stmmac: Fix lack of link transition for fixed PHYs sctp: change sk state only when it has assocs in sctp_shutdown bnx2: Wait for in-flight DMA to complete at probe stage Revert "bnx2: Reset device during driver initialization" ps3_gelic: fix spelling mistake in debug message net: ethernet: ixp4xx_eth: fix spelling mistake in debug message ibmvnic: Fix size of debugfs name buffer ibmvnic: Unmap ibmvnic_statistics structure sfc: clear napi_hash state when copying channels mlxsw: spectrum_router: Correctly dump neighbour activity mlxsw: spectrum: Fix refcount bug on span entries bnxt_en: Fix VF virtual link state. bnxt_en: Fix ring arithmetic in bnxt_setup_tc(). Revert "include/uapi/linux/atm_zatm.h: include linux/time.h" tcp: take care of truncations done by sk_filter() ipv4: use new_gw for redirect neigh lookup r8152: Fix error path in open function net: bpqether.h: remove if_ether.h guard net: __skb_flow_dissect() must cap its return value ...	2016-11-14 14:15:53 -08:00
Linus Torvalds	d4b9532367	Merge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile Pull arch/tile bugfix from Chris Metcalf: "This just fixes an incompatibility with tile __ro_after_init" * 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: tile: handle __ro_after_init like parisc does	2016-11-14 14:07:13 -08:00
Linus Torvalds	ac38126b2b	RTC for 4.9 #2 Drivers: - asm9260: fix module autoload - cmos: fix crashes - omap: fix clock handling -----BEGIN PGP SIGNATURE----- iQIcBAABCgAGBQJYKh8UAAoJENiigzvaE+LCbHYP/36xKV8c9vnWFuEtmRLKWBS+ VjZzh3Mcrr3W9XfvCajKopObFEKjCa3KQH0OVO6gty2Nd6T93mvbsYHS8x2TGe9C AM0Tr9FrlLBGkOgdiOMBILhBJcDEy9XI+0sw4U3F9WtJRM0ZQD86IiDLFy90rirY UM4xsNUzOrKLqr7zhTr4d0EuP2q6lbQemV796wQ5xz6rkflFLssT1buS7kFuvR8C XLpFXLKBKoww1zECX9tEYFKn06WMH8s8D7cWddhFHZnDOoiuojoI8ZdChm4moEBx wafJT+4K4qhx+KbpIl3bk0QfjyB8/wLgTJWJDeTyfVg4Q0pAmgfULxWtYUYKg3b1 yWYjXio1+cUGeWYPit1ZuipCyvfQvNS+lxfEBVjMVS08yLtfpB0RrzRRNv4uDpZr HaB+0yk3Mou1X4U4RBxdUJeL9RqWqr8AqZbGveT78A1npozE1bvdxOFbl4A4buwp g5egEpv6ypK+io3+XHkBf9GQgNy/ZV94VuaRcDUXgf/xlgw4RWa2gBcORcRKFjOv vTarRQ+KEk3pCDiowfDhB+fTkkYloz7ujz6JVQWMYaPWp2A2qQU7XENAW/16MDYp Ayl9VcClRdVsrbO9DY6Poy6wA4qMVIjnDg9lQiA1Ob9ww/Gr6CUGp2DCPa8nC74u S4xFI8oki6h0TYUd/Txr =jG9+ -----END PGP SIGNATURE----- Merge tag 'rtc-4.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC fixes from Alexandre Belloni: "Here are a few driver fixes for 4.9. It has been calm for a while so I don't expect more for this cycle. Drivers: - asm9260: fix module autoload - cmos: fix crashes - omap: fix clock handling" * tag 'rtc-4.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: rtc: omap: prevent disabling of clock/module during suspend rtc: omap: Fix selecting external osc rtc: cmos: Don't enable interrupts in the middle of the interrupt handler rtc: cmos: remove all __exit_p annotations rtc: asm9260: fix module autoload	2016-11-14 14:00:29 -08:00
Chris Metcalf	e123386bc3	tile: handle __ro_after_init like parisc does The tile architecture already marks RO_DATA as read-only in the kernel, so grouping RO_AFTER_INIT_DATA with RO_DATA, as is done by default, means the kernel faults in init when it tries to write to RO_AFTER_INIT_DATA. For now, just arrange that __ro_after_init is handled like __write_once, i.e. __read_mostly. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>	2016-11-14 16:46:41 -05:00
Ido Schimmel	ac571de999	mlxsw: spectrum_router: Flush FIB tables during fini Since commit `b45f64d16d` ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls") we reflect to the device the entire FIB table and not only FIBs that point to netdevs created by the driver. During module removal, FIBs of the second type are removed following NETDEV_UNREGISTER events sent. The other FIBs are still present in both the driver's cache and the device's table. Fix this by iterating over all the FIB tables in the device and flush them. There's no need to take locks, as we're the only writer. Fixes: `b45f64d16d` ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-14 16:45:16 -05:00
Florian Fainelli	eb2ca35f18	mdio: Demote print from info to debug in mdio_driver_register While it is useful to know which MDIO driver is being registered, demote the pr_info() to a pr_debug(). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-14 16:40:16 -05:00
Florian Fainelli	c51e424dc7	net: stmmac: Fix lack of link transition for fixed PHYs Commit `52f95bbfcf` ("stmmac: fix adjust link call in case of a switch is attached") added some logic to avoid polling the fixed PHY and therefore invoking the adjust_link callback more than once, since this is a fixed PHY and link events won't be generated. This works fine the first time, because we start with phydev->irq = PHY_POLL, so we call adjust_link, then we set phydev->irq = PHY_IGNORE_INTERRUPT and we stop polling the PHY. Now, if we called ndo_close(), which calls both phy_stop() and does an explicit netif_carrier_off(), we end up with a link down. Upon calling ndo_open() again, despite starting the PHY state machine, we have PHY_IGNORE_INTERRUPT set, and we generate no link event at all, so the link is permanently down. Fixes: `52f95bbfcf` ("stmmac: fix adjust link call in case of a switch is attached") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-14 16:39:15 -05:00
Gao Feng	d94d02547e	driver: macvlan: Replace integer number with bool value The return value of function macvlan_addr_busy is used as bool value, so use bool value instead of integer number "1" and "0". Signed-off-by: Gao Feng <gfree.wind@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-14 16:36:00 -05:00
Mickaël Salaün	535e7b4b5e	bpf: Use u64_to_user_ptr() Replace the custom u64_to_ptr() function with the u64_to_user_ptr() macro. Signed-off-by: Mickaël Salaün <mic@digikod.net> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-14 16:23:08 -05:00
Xin Long	5bf35ddfee	sctp: change sk state only when it has assocs in sctp_shutdown Now when users shutdown a sock with SEND_SHUTDOWN in sctp, even if this sock has no connection (assoc), sk state would be changed to SCTP_SS_CLOSING, which is not as we expect. Besides, after that if users try to listen on this sock, kernel could even panic when it dereference sctp_sk(sk)->bind_hash in sctp_inet_listen, as bind_hash is null when sock has no assoc. This patch is to move sk state change after checking sk assocs is not empty, and also merge these two if() conditions and reduce indent level. Fixes: `d46e416c11` ("sctp: sctp should change socket state when shutdown is received") Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-14 16:22:33 -05:00
David S. Miller	193f512287	Merge branch 'bnx2-kdump-fix' Baoquan He says: ==================== bnx2: Wait for in-flight DMA to complete at probe stage This is v2 post. In commit `3e1be7a` ("bnx2: Reset device during driver initialization"), firmware requesting code was moved from open stage to probe stage. The reason is in kdump kernel hardware iommu need device be reset in driver probe stage, otherwise those in-flight DMA from 1st kernel will continue going and look up into the newly created io-page tables. However bnx2 chip resetting involves firmware requesting issue, that need be done in open stage. Michale Chan suggested we can just wait for the old in-flight DMA to complete at probe stage, then though without device resetting, we don't need to worry the old in-flight DMA could continue looking up the newly created io-page tables. v1->v2: Michael suggested to wait for the in-flight DMA to complete at probe stage. So give up the old method of trying to reset chip at probe stage, take the new way accordingly. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-14 16:20:54 -05:00
Baoquan He	6df77862f6	bnx2: Wait for in-flight DMA to complete at probe stage In-flight DMA from 1st kernel could continue going in kdump kernel. New io-page table has been created before bnx2 does reset at open stage. We have to wait for the in-flight DMA to complete to avoid it look up into the newly created io-page table at probe stage. Suggested-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-14 16:20:53 -05:00
Baoquan He	5d0d4b91bf	Revert "bnx2: Reset device during driver initialization" This reverts commit `3e1be7ad2d`. When people build bnx2 driver into kernel, it will fail to detect and load firmware because firmware is contained in initramfs and initramfs has not been uncompressed yet during do_initcalls. So revert commit `3e1be7a` and work out a new way in the later patch. Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-14 16:20:53 -05:00
Colin Ian King	7020637bdf	ps3_gelic: fix spelling mistake in debug message Trivial fix to spelling mistake "unmached" to "unmatched" in debug message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-14 13:38:57 -05:00
Philippe Reynes	a788859649	net: atheros: atl2: use new api ethtool_{get\|set}_link_ksettings The ethtool api {get\|set}_settings is deprecated. We move this driver to new api {get\|set}_link_ksettings. The previous implementation of set_settings was modifying the value of advertising, but with the new API, it's not possible. The structure ethtool_link_ksettings is defined as const. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-14 13:37:43 -05:00
Philippe Reynes	1dae02b371	net: atheros: atl1: use new api ethtool_{get\|set}_link_ksettings The ethtool api {get\|set}_settings is deprecated. We move this driver to new api {get\|set}_link_ksettings. The previous implementation of set_settings was modifying the value of advertising, but with the new API, it's not possible. The structure ethtool_link_ksettings is defined as const. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-14 13:37:42 -05:00
Philippe Reynes	58046c7043	net: atheros: atl1c: use new api ethtool_{get\|set}_link_ksettings The ethtool api {get\|set}_settings is deprecated. We move this driver to new api {get\|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-14 13:37:42 -05:00
Philippe Reynes	36a4e690a7	net: alx: use new api ethtool_{get\|set}_link_ksettings The ethtool api {get\|set}_settings is deprecated. We move this driver to new api {get\|set}_link_ksettings. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-14 13:37:42 -05:00
WANG Cong	d9dc8b0f8b	net: fix sleeping for sk_wait_event() Similar to commit `14135f30e3` ("inet: fix sleeping inside inet_wait_for_connect()"), sk_wait_event() needs to fix too, because release_sock() is blocking, it changes the process state back to running after sleep, which breaks the previous prepare_to_wait(). Switch to the new wait API. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-14 13:17:21 -05:00
Linus Torvalds	ee2bd216e1	ASoC: lpass-platform: fix uninitialized variable In commit `022d00ee0b` ("ASoC: lpass-platform: Fix broken pcm data usage") the stream specific information initialization was broken, with the dma channel information not being initialized if there was no alloc_dma_channel() helper function. Before that, the DMA channel number was implicitly initialized to zero because the backing store was allocated with devm_kzalloc(). When the init code was rewritten, that implicit initialization was lost, and gcc rightfully complains about an uninitialized variable being used. Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-11-14 09:46:08 -08:00
Linus Torvalds	f5c9f9c723	Revert "printk: make reading the kernel log flush pending lines" This reverts commit `bfd8d3f23b`. It turns out that this flushes things much too aggressiverly, and causes lines to break up when the system logger races with new continuation lines being printed. There's a pending patch to make printk() flushing much more straightforward, but it's too invasive for 4.9, so in the meantime let's just not make the system message logging flush continuation lines. They'll be flushed by the final newline anyway. Suggested-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-11-14 09:31:52 -08:00
Mauro Carvalho Chehab	b15efc3862	gp8psk-fe: add missing MODULE_foo() macros This file was converted to a separate module at commit `7a0786c19d` ("gp8psk: Fix DVB frontend attach"), because the DVB attach routines require it to work. However, I forgot to copy the MODULE_foo() macros from the original module, causing this warning: WARNING: modpost: missing MODULE_LICENSE() in drivers/media/dvb-frontends/gp8psk-fe.o Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: `7a0786c19d` ("gp8psk: Fix DVB frontend attach") Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-11-14 08:43:13 -08:00
Linus Torvalds	8528d66248	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - fix an Intel/MID boot crash/hang bug - fix a cache topology mis-parsing bug on certain AMD CPUs - fix a virtualization firmware bug by adding a check+quirk workaround on the kernel side" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Deal with broken firmware (VMWare/XEN) x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems x86/platform/intel-mid: Retrofit pci_platform_pm_ops ->get_state hook	2016-11-14 08:39:56 -08:00
Linus Torvalds	5d69561b7b	Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: "This fixes a genirq regression that resulted in the Intel/Broxton pinctrl/GPIO driver (and possibly others) spewing warnings" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Use irq type from irqdata instead of irqdesc	2016-11-14 08:34:56 -08:00
Linus Torvalds	5ad62a9e5c	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "An uncore PMU driver hardware enablement change for Intel SkyLake uncore PMUs (Skylake Y, U, H and S platforms), plus a number of tooling fixes for the histogram handling/displaying code" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Add more Intel uncore IMC PCI IDs for SkyLake perf hists: Fix column length on --hierarchy perf hists browser: Fix column indentation on --hierarchy perf hists browser: Show folded sign properly on --hierarchy perf hists browser: Fix indentation of folded sign on --hierarchy perf hist browser: Fix hierarchy column counts	2016-11-14 08:30:06 -08:00
Linus Torvalds	53381e2e34	Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: "A boot crash fix and a build warning fix" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK=y x86/efi: Fix EFI memmap pointer size warning	2016-11-14 08:26:24 -08:00
Linus Torvalds	28ddafa590	NTB bug fixes for ntb_hw_intel, ntb_perf, and ntb_pingpong. Also, a fixup to use jiffies in schedule_timeout_* call instead of a constant. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJYKOciAAoJEG5mS6x6i9IjNFkP/17g7VxJt+f4jeT6CEJ41WPo waZwwIlgo0yytZEPZRlx/7MuixnytzQ2lCSbjzYHeFXaOb9mSwIDy4LcEIWDN58n zW+G/xVplBfex6sq6C4czYFe/9mQvGUuwzrjXtfk9SBHUSWy2nUIzAStVh1GdCrq eC/jDW7Yxqmsxzv6bk7/Cdt2TK/Tc6UADbVeWBxfvJCnBpiCIO/5fF1Et4ZyQx+d 450qu1VVrzSksv+Zre91zW7We/9cS3iHL0SOJQxoHIWfZ+eEjHK4tlPkGe1I7Gir Ra5wqvPtaFBneiu4CpTM4u7WZUPnTKD1nNbR5Cs8rp/lLTglU8Hz7g8qn5N1VNX2 zWlZJOPCm1cHOHpuschlXdvzAC4xiBLSfLyjKNT/vDL/txvrwRxxZJuGL7aQ8uWG l2M4JdwXxmTCJJbiaKL+4x4ECfSjARvFRdGAAumlUnOIO6JEju3/P91fOKcV8MWK /J0VBacoPfWTQ7CTwlbR5EQD2vjqY6Ctun4WAo7blHx0jjJypVcj7BAu7/0rYtaD zqGKiah+M7ZY1yECCHEwlin3sc3R8+RmE27Q0M8w2ehyjIU0CXq6jFbGlzV10rdl GPr13FzPNdpQZGaU3OlesT5o1zgLB5llmJMOd1nOneMhwWklfeg96lWiXgmv/ja8 vxhFTuNsSqx+kWEcR5Zq =HlJi -----END PGP SIGNATURE----- Merge tag 'ntb-4.9' of git://github.com/jonmason/ntb Pull NTB fixes from Jon Mason: "NTB bug fixes for ntb_hw_intel, ntb_perf, and ntb_pingpong. Also, a fixup to use jiffies in schedule_timeout_* call instead of a constant" * tag 'ntb-4.9' of git://github.com/jonmason/ntb: ntb_perf: potential info leak in debugfs ntb: ntb_hw_intel: init peer_addr in struct intel_ntb_dev ntb: make DMA_OUT_RESOURCE_TO HZ independent ntb_transport: make DMA_OUT_RESOURCE_TO HZ independent NTB: ntb_hw_intel: Fix typo in module parameter descriptions ntb_pingpong: Fix db_init parameter description	2016-11-14 08:14:49 -08:00
David S. Miller	7d384846b9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains a second batch of Netfilter updates for your net-next tree. This includes a rework of the core hook infrastructure that improves Netfilter performance by ~15% according to synthetic benchmarks. Then, a large batch with ipset updates, including a new hash:ipmac set type, via Jozsef Kadlecsik. This also includes a couple of assorted updates. Regarding the core hook infrastructure rework to improve performance, using this simple drop-all packets ruleset from ingress: nft add table netdev x nft add chain netdev x y { type filter hook ingress device eth0 priority 0\; } nft add rule netdev x y drop And generating traffic through Jesper Brouer's samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh script using -i option. perf report shows nf_tables calls in its top 10: 17.30% kpktgend_0 [nf_tables] [k] nft_do_chain 15.75% kpktgend_0 [kernel.vmlinux] [k] __netif_receive_skb_core 10.39% kpktgend_0 [nf_tables_netdev] [k] nft_do_chain_netdev I'm measuring here an improvement of ~15% in performance with this patchset, so we got +2.5Mpps more. I have used my old laptop Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz 4-cores. This rework contains more specifically, in strict order, these patches: 1) Remove compile-time debugging from core. 2) Remove obsolete comments that predate the rcu era. These days it is well known that a Netfilter hook always runs under rcu_read_lock(). 3) Remove threshold handling, this is only used by br_netfilter too. We already have specific code to handle this from br_netfilter, so remove this code from the core path. 4) Deprecate NF_STOP, as this is only used by br_netfilter. 5) Place nf_state_hook pointer into xt_action_param structure, so this structure fits into one single cacheline according to pahole. This also implicit affects nftables since it also relies on the xt_action_param structure. 6) Move state->hook_entries into nf_queue entry. The hook_entries pointer is only required by nf_queue(), so we can store this in the queue entry instead. 7) use switch() statement to handle verdict cases. 8) Remove hook_entries field from nf_hook_state structure, this is only required by nf_queue, so store it in nf_queue_entry structure. 9) Merge nf_iterate() into nf_hook_slow() that results in a much more simple and readable function. 10) Handle NF_REPEAT away from the core, so far the only client is nf_conntrack_in() and we can restart the packet processing using a simple goto to jump back there when the TCP requires it. This update required a second pass to fix fallout, fix from Arnd Bergmann. 11) Set random seed from nft_hash when no seed is specified from userspace. 12) Simplify nf_tables expression registration, in a much smarter way to save lots of boiler plate code, by Liping Zhang. 13) Simplify layer 4 protocol conntrack tracker registration, from Davide Caratti. 14) Missing CONFIG_NF_SOCKET_IPV4 dependency for udp4_lib_lookup, due to recent generalization of the socket infrastructure, from Arnd Bergmann. 15) Then, the ipset batch from Jozsef, he describes it as it follows: * Cleanup: Remove extra whitespaces in ip_set.h * Cleanup: Mark some of the helpers arguments as const in ip_set.h * Cleanup: Group counter helper functions together in ip_set.h * struct ip_set_skbinfo is introduced instead of open coded fields in skbinfo get/init helper funcions. * Use kmalloc() in comment extension helper instead of kzalloc() because it is unnecessary to zero out the area just before explicit initialization. * Cleanup: Split extensions into separate files. * Cleanup: Separate memsize calculation code into dedicated function. * Cleanup: group ip_set_put_extensions() and ip_set_get_extensions() together. * Add element count to hash headers by Eric B Munson. * Add element count to all set types header for uniform output across all set types. * Count non-static extension memory into memsize calculation for userspace. * Cleanup: Remove redundant mtype_expire() arguments, because they can be get from other parameters. * Cleanup: Simplify mtype_expire() for hash types by removing one level of intendation. * Make NLEN compile time constant for hash types. * Make sure element data size is a multiple of u32 for the hash set types. * Optimize hash creation routine, exit as early as possible. * Make struct htype per ipset family so nets array becomes fixed size and thus simplifies the struct htype allocation. * Collapse same condition body into a single one. * Fix reported memory size for hash:* types, base hash bucket structure was not taken into account. * hash:ipmac type support added to ipset by Tomasz Chilinski. * Use setup_timer() and mod_timer() instead of init_timer() by Muhammad Falak R Wani, individually for the set type families. 16) Remove useless connlabel field in struct netns_ct, patch from Florian Westphal. 17) xt_find_table_lock() doesn't return ERR_PTR() anymore, so simplify {ip,ip6,arp}tables code that uses this. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 22:41:25 -05:00
Jiri Pirko	8d419324ef	mlxsw: spectrum_router: Add FIB abort warning Add a warning that the abort mechanism was triggered for device. Also avoid going through the procedure if abort was already done. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 22:36:59 -05:00
David S. Miller	5aad5b42bd	Merge branch 'dsa-mv88e6xxx-post-refactor-fixes' Andrew Lunn says: ==================== dsa: mv88e6xxx: Fixes for port refactoring The patches which refactored setting up the switch MACs introduced a couple of regressions. The RGMII delays for a port can be set using other mechanism than just phy-mode. Don't overwrite the delays unless explicitly asked to. This broke my Armada 370 RD. Also, the mv88e6351 family supports setting RGMII delays, but is missing the necessary entries in the ops structures to allow this. These fixes are to patches currently in net-next. No need for stable etc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 22:36:42 -05:00
Andrew Lunn	94d66ae631	net: dsa: mv88e6xxx: 6351 family also has RGMII delays The recent refactoring of setting the MAC configuration broke setting of RGMII delays, via the phy-mode, on the 6351 family. Add the missing ops to the structure. Fixes: 7340e5ecdbb1 ("net: dsa: mv88e6xxx: setup port's MAC") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 22:36:35 -05:00
Andrew Lunn	fedf18651b	net: dsa: mv88e6xxx: Don't modify RGMII delays when not RGMII mode The RGMII modes delays can be set via strapping pings or EEPROM. Don't change them unless explicitly asked to change them. The recent refactoring of setting the MAC configuration changed this behaviours, in that CPU and DSA ports have any pre-configured RGMII delays removed. This breaks the Armada 370RD board. Restore the previous behaviour, in that RGMII delays are only applied/removed when explicitly asked for via an phy-mode being PHY_INTERFACE_MODE_RGMII* Fixes: 7340e5ecdbb1 ("net: dsa: mv88e6xxx: setup port's MAC") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 22:36:35 -05:00
Dan Carpenter	819baf8859	ntb_perf: potential info leak in debugfs This is a static checker warning, not something I'm desperately concerned about. But snprintf() returns the number of bytes that would have been copied if there were space. We really care about the number of bytes that actually were copied so we should use scnprintf() instead. It probably won't overrun, and in that case we may as well just use sprintf() but these sorts of things make static checkers and code reviewers happier. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2016-11-13 16:48:30 -05:00
Dave Jiang	25ea9f2bf5	ntb: ntb_hw_intel: init peer_addr in struct intel_ntb_dev The peer_addr member of intel_ntb_dev is not set, therefore when acquiring ntb_peer_db and ntb_peer_spad we only get the offset rather than the actual physical address. Adding fix to correct that. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2016-11-13 16:48:29 -05:00
Nicholas Mc Guire	cdc08982a5	ntb: make DMA_OUT_RESOURCE_TO HZ independent schedule_timeout_* takes a timeout in jiffies but the code currently is passing in a constant which makes this timeout HZ dependent, so pass it through msecs_to_jiffies() to fix this up. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2016-11-13 16:48:29 -05:00
Nicholas Mc Guire	c0a88032ef	ntb_transport: make DMA_OUT_RESOURCE_TO HZ independent schedule_timeout_* takes a timeout in jiffies but the code currently is passing in a constant which makes this timeout HZ dependent, so pass it through msecs_to_jiffies() to fix this up. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Jon Mason <jdmason@kudzu.us>	2016-11-13 16:48:29 -05:00

1 2 3 4 5 ...

635500 Commits