linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-23 11:21:33 +00:00

Author	SHA1	Message	Date
Baoquan He	5d0d4b91bf	Revert "bnx2: Reset device during driver initialization" This reverts commit `3e1be7ad2d`. When people build bnx2 driver into kernel, it will fail to detect and load firmware because firmware is contained in initramfs and initramfs has not been uncompressed yet during do_initcalls. So revert commit `3e1be7a` and work out a new way in the later patch. Signed-off-by: Baoquan He <bhe@redhat.com> Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-14 16:20:53 -05:00
Colin Ian King	7020637bdf	ps3_gelic: fix spelling mistake in debug message Trivial fix to spelling mistake "unmached" to "unmatched" in debug message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-14 13:38:57 -05:00
Colin Ian King	7774d46b20	net: ethernet: ixp4xx_eth: fix spelling mistake in debug message Trivial fix to spelling mistake "successed" to "succeeded" in debug message. Also unwrap multi-line literal string. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 13:48:28 -05:00
Thomas Falcon	e1fac0adf0	ibmvnic: Fix size of debugfs name buffer This mistake was causing debugfs directory creation failures when multiple ibmvnic devices were probed. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 13:42:35 -05:00
Thomas Falcon	b7f193da17	ibmvnic: Unmap ibmvnic_statistics structure This structure was mapped but never subsequently unmapped. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 13:42:35 -05:00
Bert Kenward	46d054f8f5	sfc: clear napi_hash state when copying channels efx_copy_channel() doesn't correctly clear the napi_hash related state. This means that when napi_hash_add is called for that channel nothing is done, and we are left with a copy of the napi_hash_node from the old channel. When we later call napi_hash_del() on this channel we have a stale napi_hash_node. Corruption is only seen when there are multiple entries in one of the napi_hash lists. This is made more likely by having a very large number of channels. Testing was carried out with 512 channels - 32 channels on each of 16 ports. This failure typically appears as protection faults within napi_by_id() or napi_hash_add(). efx_copy_channel() is only used when tx or rx ring sizes are changed (ethtool -G). Fixes: `36763266bb` ("sfc: Add support for busy polling") Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 13:41:42 -05:00
David S. Miller	9e37aaa39d	Merge branch 'mlxsw-fixes' Jiri Pirko says: ==================== mlxsw: Couple of fixes Please, queue-up both for stable. Thanks! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 12:51:01 -05:00
Arkadi Sharshevsky	42cdb338f4	mlxsw: spectrum_router: Correctly dump neighbour activity The device's neighbour table is periodically dumped in order to update the kernel about active neighbours. A single dump session may span multiple queries, until the response carries less records than requested or when a record (can contain up to four neighbour entries) is not full. Current code stops the session when the number of returned records is zero, which can result in infinite loop in case of high packet rate. Fix this by stopping the session according to the above logic. Fixes: `c723c735fa` ("mlxsw: spectrum_router: Periodically update the kernel's neigh table") Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 12:51:00 -05:00
Yotam Gigi	2d644d4c75	mlxsw: spectrum: Fix refcount bug on span entries When binding port to a newly created span entry, its refcount is initialized to zero even though it has a bound port. That leads to unexpected behaviour when the user tries to delete that port from the span entry. Fix this by initializing the reference count to 1. Also add a warning to put function. Fixes: `763b4b70af` ("mlxsw: spectrum: Add support in matchall mirror TC offloading") Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 12:51:00 -05:00
David S. Miller	a055450a33	Merge branch 'bnxt_en-fixes' Michael Chan says: ==================== bnxt_en: 2 bug fixes. Bug fixes in bnxt_setup_tc() and VF vitual link state. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 12:37:32 -05:00
Michael Chan	73b9bad63a	bnxt_en: Fix VF virtual link state. If the physical link is down and the VF virtual link is set to "enable", the current code does not always work. If the link is down but the cable is attached, the firmware returns LINK_SIGNAL instead of NO_LINK. The current code is treating LINK_SIGNAL as link up. The fix is to treat link as down when the link_status != LINK. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 12:37:31 -05:00
Michael Chan	3ffb6a39b7	bnxt_en: Fix ring arithmetic in bnxt_setup_tc(). The logic is missing the check on whether the tx and rx rings are sharing completion rings or not. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 12:37:31 -05:00
Mike Frysinger	7b5b74efcc	Revert "include/uapi/linux/atm_zatm.h: include linux/time.h" This reverts commit `cf00713a65` ("include/uapi/linux/atm_zatm.h: include linux/time.h"). This attempted to fix userspace breakage that no longer existed when the patch was merged. Almost one year earlier, commit `70ba07b675` ("atm: remove 'struct zatm_t_hist'") deleted the struct in question. After this patch was merged, we now have to deal with people being unable to include this header in conjunction with standard C library headers like stdlib.h (which linux-atm does). Example breakage: x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I../.. -I./../q2931 -I./../saal \ -I. -DCPPFLAGS_TEST -I../../src/include -O2 -march=native -pipe -g \ -frecord-gcc-switches -freport-bug -Wimplicit-function-declaration \ -Wnonnull -Wstrict-aliasing -Wparentheses -Warray-bounds \ -Wfree-nonheap-object -Wreturn-local-addr -fno-strict-aliasing -Wall \ -Wshadow -Wpointer-arith -Wwrite-strings -Wstrict-prototypes -c zntune.c In file included from /usr/include/linux/atm_zatm.h:17:0, from zntune.c:17: /usr/include/linux/time.h:9:8: error: redefinition of ‘struct timespec’ struct timespec { ^ In file included from /usr/include/sys/select.h:43:0, from /usr/include/sys/types.h:219, from /usr/include/stdlib.h:314, from zntune.c:9: /usr/include/time.h:120:8: note: originally defined here struct timespec ^ Signed-off-by: Mike Frysinger <vapier@gentoo.org> Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 12:35:13 -05:00
Eric Dumazet	ac6e780070	tcp: take care of truncations done by sk_filter() With syzkaller help, Marco Grassi found a bug in TCP stack, crashing in tcp_collapse() Root cause is that sk_filter() can truncate the incoming skb, but TCP stack was not really expecting this to happen. It probably was expecting a simple DROP or ACCEPT behavior. We first need to make sure no part of TCP header could be removed. Then we need to adjust TCP_SKB_CB(skb)->end_seq Many thanks to syzkaller team and Marco for giving us a reproducer. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Marco Grassi <marco.gra@gmail.com> Reported-by: Vladis Dronov <vdronov@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 12:30:02 -05:00
Stephen Suryaputra Lin	969447f226	ipv4: use new_gw for redirect neigh lookup In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0 and then the state of the neigh for the new_gw is checked. If the state isn't valid then the redirected route is deleted. This behavior is maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway is assigned to peer->redirect_learned.a4 before calling ipv4_neigh_lookup(). After commit `5943634fc5` ("ipv4: Maintain redirect and PMTU info in struct rtable again."), ipv4_neigh_lookup() is performed without the rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw) isn't zero, the function uses it as the key. The neigh is most likely valid since the old_gw is the one that sends the ICMP redirect message. Then the new_gw is assigned to fib_nh_exception. The problem is: the new_gw ARP may never gets resolved and the traffic is blackholed. So, use the new_gw for neigh lookup. Changes from v1: - use __ipv4_neigh_lookup instead (per Eric Dumazet). Fixes: `5943634fc5` ("ipv4: Maintain redirect and PMTU info in struct rtable again.") Signed-off-by: Stephen Suryaputra Lin <ssurya@ieee.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 12:24:44 -05:00
Guenter Roeck	ca0a75316d	r8152: Fix error path in open function If usb_submit_urb() called from the open function fails, the following crash may be observed. r8152 8-1:1.0 eth0: intr_urb submit failed: -19 ... r8152 8-1:1.0 eth0: v1.08.3 Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b7b pgd = ffffffc0e7305000 [6b6b6b6b6b6b6b7b] pgd=0000000000000000, pud=0000000000000000 Internal error: Oops: 96000004 [#1] PREEMPT SMP ... PC is at notifier_chain_register+0x2c/0x58 LR is at blocking_notifier_chain_register+0x54/0x70 ... Call trace: [<ffffffc0002407f8>] notifier_chain_register+0x2c/0x58 [<ffffffc000240bdc>] blocking_notifier_chain_register+0x54/0x70 [<ffffffc00026991c>] register_pm_notifier+0x24/0x2c [<ffffffbffc183200>] rtl8152_open+0x3dc/0x3f8 [r8152] [<ffffffc000808000>] __dev_open+0xac/0x104 [<ffffffc0008082f8>] __dev_change_flags+0xb0/0x148 [<ffffffc0008083c4>] dev_change_flags+0x34/0x70 [<ffffffc000818344>] do_setlink+0x2c8/0x888 [<ffffffc0008199d4>] rtnl_newlink+0x328/0x644 [<ffffffc000819e98>] rtnetlink_rcv_msg+0x1a8/0x1d4 [<ffffffc0008373c8>] netlink_rcv_skb+0x68/0xd0 [<ffffffc000817990>] rtnetlink_rcv+0x2c/0x3c [<ffffffc000836d1c>] netlink_unicast+0x16c/0x234 [<ffffffc00083720c>] netlink_sendmsg+0x340/0x364 [<ffffffc0007e85d0>] sock_sendmsg+0x48/0x60 [<ffffffc0007e9c30>] SyS_sendto+0xe0/0x120 [<ffffffc0007e9cb0>] SyS_send+0x40/0x4c [<ffffffc000203e34>] el0_svc_naked+0x24/0x28 Clean up error handling to avoid registering the notifier if the open function is going to fail. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 12:03:20 -05:00
Baruch Siach	10b217681d	net: bpqether.h: remove if_ether.h guard __LINUX_IF_ETHER_H is not defined anywhere, and if_ether.h can keep itself from double inclusion, though it uses a single underscore prefix. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-13 00:57:53 -05:00
Eric Dumazet	34fad54c25	net: __skb_flow_dissect() must cap its return value After Tom patch, thoff field could point past the end of the buffer, this could fool some callers. If an skb was provided, skb->len should be the upper limit. If not, hlen is supposed to be the upper limit. Fixes: `a6e544b0a8` ("flow_dissector: Jump to exit code in __skb_flow_dissect") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Yibin Yang <yibyang@cisco.com Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-12 23:41:53 -05:00
David S. Miller	79774d6bfa	Merge branch 'fix-bpf_redirect' Martin KaFai Lau says: ==================== bpf: Fix bpf_redirect to an ipip/ip6tnl dev This patch set fixes a bug in bpf_redirect(dev, flags) when dev is an ipip/ip6tnl. The current problem is IP-EthHdr-IP is sent out instead of IP-IP. Patch 1 adds a dev->type test similar to dev_is_mac_header_xmit() in act_mirred.c which is only available in net-next. We can consider to refactor it once this patch is pulled into net-next from net. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-12 23:38:08 -05:00
Martin KaFai Lau	90e02896f1	bpf: Add test for bpf_redirect to ipip/ip6tnl The test creates two netns, ns1 and ns2. The host (the default netns) has an ipip or ip6tnl dev configured for tunneling traffic to the ns2. ping VIPS from ns1 <----> host <--tunnel--> ns2 (VIPs at loopback) The test is to have ns1 pinging VIPs configured at the loopback interface in ns2. The VIPs are 10.10.1.102 and 2401:face::66 (which are configured at lo@ns2). [Note: 0x66 => 102]. At ns1, the VIPs are routed _via_ the host. At the host, bpf programs are installed at the veth to redirect packets from a veth to the ipip/ip6tnl. The test is configured in a way so that both ingress and egress can be tested. At ns2, the ipip/ip6tnl dev is configured with the local and remote address specified. The return path is routed to the dev ipip/ip6tnl. During egress test, the host also locally tests pinging the VIPs to ensure that bpf_redirect at egress also works for the direct egress (i.e. not forwarding from dev ve1 to ve2). Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-12 23:38:07 -05:00
Martin KaFai Lau	4e3264d21b	bpf: Fix bpf_redirect to an ipip/ip6tnl dev If the bpf program calls bpf_redirect(dev, 0) and dev is an ipip/ip6tnl, it currently includes the mac header. e.g. If dev is ipip, the end result is IP-EthHdr-IP instead of IP-IP. The fix is to pull the mac header. At ingress, skb_postpull_rcsum() is not needed because the ethhdr should have been pulled once already and then got pushed back just before calling the bpf_prog. At egress, this patch calls skb_postpull_rcsum(). If bpf_redirect(dev, BPF_F_INGRESS) is called, it also fails now because it calls dev_forward_skb() which eventually calls eth_type_trans(skb, dev). The eth_type_trans() will set skb->type = PACKET_OTHERHOST because the mac address does not match the redirecting dev->dev_addr. The PACKET_OTHERHOST will eventually cause the ip_rcv() errors out. To fix this, ____dev_forward_skb() is added. Joint work with Daniel Borkmann. Fixes: `cfc7381b30` ("ip_tunnel: add collect_md mode to IPIP tunnel") Fixes: `8d79266bc4` ("ip6_tunnel: add collect_md mode to IPv6 tunnels") Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-12 23:38:07 -05:00
David S. Miller	23dd831548	Merge branch 'mlxsw-fixes' Jiri Pirko says: ==================== mlxsw: Couple of router fixes v1->v2: - patch2: - use net_eq ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-10 13:02:25 -05:00
Jiri Pirko	0e3715c9c2	mlxsw: spectrum_router: Ignore FIB notification events for non-init namespaces Since now, the table with same id in multiple netnamespaces were squashed to a single virtual router. That is not only incorrect, it also causes error messages when trying to use RALUE register to do double remove of FIB entries, like this one: mlxsw_spectrum 0000:03:00.0: EMAD reg access failed (tid=facb831c00007b20,reg_id=8013(ralue),type=write,status=7(bad parameter)) Since we don't allow ports to change namespaces (NETIF_F_NETNS_LOCAL), and the infrastructure is not yet prepared to handle netnamespaces, just ignore FIB notification events for non-init namespaces. That is clear to do since we don't need to offload them. Fixes: `b45f64d16d` ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-10 13:02:15 -05:00
Jiri Pirko	33b1341cd1	mlxsw: spectrum_router: Fix handling of neighbour structure __neigh_create function works in a different way than assumed. It passes "n" as a parameter to ndo_neigh_construct. But this "n" might be destroyed right away before __neigh_create() returns in case there is already another neighbour struct in the hashtable with the same dev and primary key. That is not expected by mlxsw_sp_router_neigh_construct() and the stored "n" points to freed memory, eventually leading to crash. Fix this by doing tight 1:1 coupling between neighbour struct and internal driver neigh_entry. That allows to narrow down the key in internal driver hashtable to do lookups by "n" only. Fixes: `6cf3c971dc` ("mlxsw: spectrum_router: Add private neigh table") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-10 13:02:15 -05:00
David S. Miller	2ce0af8fd0	Merge branch 'qed-fixes' Yuval Mintz says: ==================== qed: Fix RoCE infrastructure This series fixes 2 basic issues with RoCE support, one handles a missing configuration in the initial infrastructure support while the other is a regression introduced by one of the initial fix submissions. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-10 12:55:26 -05:00
Ram Amrani	5c5f260908	qed: Correct rdma params configuration Previous fix has broken RoCE support as the rdma_pf_params are now being set into the parameters only after the params are alrady assigned into the hw-function. Fixes: `0189efb8f4` ("qed*: Fix Kconfig dependencies with INFINIBAND_QEDR") Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-10 12:55:20 -05:00
Ram Amrani	8d1d8fcb21	qed: configure ll2 RoCE v1/v2 flavor correctly Currently RoCE v2 won't operate with RDMA CM due to missing setting of the roce-flavour in the ll2 configuration. This patch properly sets the flavour, and deletes incorrect HSI that doesn't [yet] exist. Fixes: `abd49676c7` ("qed: Add RoCE ll2 & GSI support") Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-10 12:55:20 -05:00
Lance Richardson	0ace81ec71	ipv4: update comment to document GSO fragmentation cases. This is a follow-up to commit `9ee6c5dc81` ("ipv4: allow local fragmentation in ip_finish_output_gso()"), updating the comment documenting cases in which fragmentation is needed for egress GSO packets. Suggested-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-10 12:01:54 -05:00
David Ahern	9b6c14d51b	net: tcp response should set oif only if it is L3 master Lorenzo noted an Android unit test failed due to `e0d56fdd73`: "The expectation in the test was that the RST replying to a SYN sent to a closed port should be generated with oif=0. In other words it should not prefer the interface where the SYN came in on, but instead should follow whatever the routing table says it should do." Revert the change to ip_send_unicast_reply and tcp_v6_send_response such that the oif in the flow is set to the skb_iif only if skb_iif is an L3 master. Fixes: `e0d56fdd73` ("net: l3mdev: remove redundant calls") Reported-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 22:32:10 -05:00
Allan Chou	8da3cf2a49	Net Driver: Add Cypress GX3 VID=04b4 PID=3610. Add support for Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge Controller (Vendor=04b4 ProdID=3610). Patch verified on x64 linux kernel 4.7.4, 4.8.6, 4.9-rc4 systems with the Kensington SD4600P USB-C Universal Dock with Power, which uses the Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge Controller. A similar patch was signed-off and tested-by Allan Chou <allan@asix.com.tw> on 2015-12-01. Allan verified his similar patch on x86 Linux kernel 4.1.6 system with Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge Controller. Tested-by: Allan Chou <allan@asix.com.tw> Tested-by: Chris Roth <chris.roth@usask.ca> Tested-by: Artjom Simon <artjom.simon@gmail.com> Signed-off-by: Allan Chou <allan@asix.com.tw> Signed-off-by: Chris Roth <chris.roth@usask.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 21:45:34 -05:00
David S. Miller	9fa684ec86	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains a larger than usual batch of Netfilter fixes for your net tree. This series contains a mixture of old bugs and recently introduced bugs, they are: 1) Fix a crash when using nft_dynset with nft_set_rbtree, which doesn't support the set element updates from the packet path. From Liping Zhang. 2) Fix leak when nft_expr_clone() fails, from Liping Zhang. 3) Fix a race when inserting new elements to the set hash from the packet path, also from Liping. 4) Handle segmented TCP SIP packets properly, basically avoid that the INVITE in the allow header create bogus expectations by performing stricter SIP message parsing, from Ulrich Weber. 5) nft_parse_u32_check() should return signed integer for errors, from John Linville. 6) Fix wrong allocation instead of connlabels, allocate 16 instead of 32 bytes, from Florian Westphal. 7) Fix compilation breakage when building the ip_vs_sync code with CONFIG_OPTIMIZE_INLINING on x86, from Arnd Bergmann. 8) Destroy the new set if the transaction object cannot be allocated, also from Liping Zhang. 9) Use device to route duplicated packets via nft_dup only when set by the user, otherwise packets may not follow the right route, again from Liping. 10) Fix wrong maximum genetlink attribute definition in IPVS, from WANG Cong. 11) Ignore untracked conntrack objects from xt_connmark, from Florian Westphal. 12) Allow to use conntrack helpers that are registered NFPROTO_UNSPEC via CT target, otherwise we cannot use the h.245 helper, from Florian. 13) Revisit garbage collection heuristic in the new workqueue-based timer approach for conntrack to evict objects earlier, again from Florian. 14) Fix crash in nf_tables when inserting an element into a verdict map, from Liping Zhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 20:38:18 -05:00
Mathias Krause	f567e950bf	rtnl: reset calcit fptr in rtnl_unregister() To avoid having dangling function pointers left behind, reset calcit in rtnl_unregister(), too. This is no issue so far, as only the rtnl core registers a netlink handler with a calcit hook which won't be unregistered, but may become one if new code makes use of the calcit hook. Fixes: `c7ac8679be` ("rtnetlink: Compute and store minimum ifinfo...") Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 20:18:19 -05:00
Arnd Bergmann	4053ab1bf9	vxlan: hide unused local variable A bugfix introduced a harmless warning in v4.9-rc4: drivers/net/vxlan.c: In function 'vxlan_group_used': drivers/net/vxlan.c:947:21: error: unused variable 'sock6' [-Werror=unused-variable] This hides the variable inside of the same #ifdef that is around its user. The extraneous initialization is removed at the same time, it was accidentally introduced in the same commit. Fixes: `c6fcc4fc5f` ("vxlan: avoid using stale vxlan socket.") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 18:59:50 -05:00
John Allen	6dbcd8fb59	ibmvnic: Start completion queue negotiation at server-provided optimum values Use the opt_* fields to determine the starting point for negotiating the number of tx/rx completion queues with the vnic server. These contain the number of queues that the vnic server estimates that it will be able to allocate. While renegotiation may still occur, using the opt_* fields will reduce the number of times this needs to happen and will prevent driver probe timeout on systems using large numbers of ibmvnic client devices per vnic port. Signed-off-by: John Allen <jallen@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 18:52:41 -05:00
David Ahern	9d1a6c4ea4	net: icmp_route_lookup should use rt dev to determine L3 domain icmp_send is called in response to some event. The skb may not have the device set (skb->dev is NULL), but it is expected to have an rt. Update icmp_route_lookup to use the rt on the skb to determine L3 domain. Fixes: `613d09b30f` ("net: Use VRF device index for lookups on TX") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 18:49:39 -05:00
David S. Miller	fd6f24d75a	Merge branch 'qcom-emac-pause' Timur Tabi says: ==================== net: qcom/emac: ensure that pause frames are enabled The qcom emac driver experiences significant packet loss (through frame check sequence errors) if flow control is not enabled and the phy is not configured to allow pause frames to pass through it. Therefore, we need to enable flow control and force the phy to pass pause frames. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 18:45:36 -05:00
Timur Tabi	df63022e18	net: qcom/emac: enable flow control if requested If the PHY has been configured to allow pause frames, then the MAC should be configured to generate and/or accept those frames. Signed-off-by: Timur Tabi <timur@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 18:45:24 -05:00
Timur Tabi	3e88449344	net: qcom/emac: configure the external phy to allow pause frames Pause frames are used to enable flow control. A MAC can send and receive pause frames in order to throttle traffic. However, the PHY must be configured to allow those frames to pass through. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Timur Tabi <timur@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 18:45:24 -05:00
Rafał Miłecki	cdb26d3387	net: bgmac: fix reversed checks for clock control flag This fixes regression introduced by patch adding feature flags. It was already reported and patch followed (it got accepted) but it appears it was incorrect. Instead of fixing reversed condition it broke a good one. This patch was verified to actually fix SoC hanges caused by bgmac on BCM47186B0. Fixes: `db791eb297` ("net: ethernet: bgmac: convert to feature flags") Fixes: `4af1474e61` ("net: bgmac: Fix errant feature flag check") Cc: Jon Mason <jon.mason@broadcom.com> Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:32:06 -05:00
Benjamin Poirier	d667f78514	bna: Add synchronization for tx ring. We received two reports of BUG_ON in bnad_txcmpl_process() where hw_consumer_index appeared to be ahead of producer_index. Out of order write/read of these variables could explain these reports. bnad_start_xmit(), as a producer of tx descriptors, has a few memory barriers sprinkled around writes to producer_index and the device's doorbell but they're not paired with anything in bnad_txcmpl_process(), a consumer. Since we are synchronizing with a device, we must use mandatory barriers, not smp_*. Also, I didn't see the purpose of the last smp_mb() in bnad_start_xmit(). Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:31:10 -05:00
Tariq Toukan	f91d718156	Revert "net/mlx4_en: Fix panic during reboot" This reverts commit `9d2afba058`. The original issue would possibly exist if an external module tried calling our "ethtool_ops" without checking if it still exists. The right way of solving it is by simply doing the check in the caller side. Currently, no action is required as there's no such use case. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:29:32 -05:00
Maciej Żenczykowski	fb56be83e4	net-ipv6: on device mtu change do not add mtu to mtu-less routes Routes can specify an mtu explicitly or inherit the mtu from the underlying device - this inheritance is implemented in dst->ops->mtu handlers ip6_mtu() and ip6_blackhole_mtu(). Currently changing the mtu of a device adds mtu explicitly to routes using that device. ie. # ip link set dev lo mtu 65536 # ip -6 route add local 2000::1 dev lo # ip -6 route get 2000::1 local 2000::1 dev lo table local src ... metric 1024 pref medium # ip link set dev lo mtu 65535 # ip -6 route get 2000::1 local 2000::1 dev lo table local src ... metric 1024 mtu 65535 pref medium # ip link set dev lo mtu 65536 # ip -6 route get 2000::1 local 2000::1 dev lo table local src ... metric 1024 mtu 65536 pref medium # ip -6 route del local 2000::1 After this patch the route entry no longer changes unless it already has an mtu. There is no need: this inheritance is already done in ip6_mtu() # ip link set dev lo mtu 65536 # ip -6 route add local 2000::1 dev lo # ip -6 route add local 2000::2 dev lo mtu 2000 # ip -6 route get 2000::1; ip -6 route get 2000::2 local 2000::1 dev lo table local src ... metric 1024 pref medium local 2000::2 dev lo table local src ... metric 1024 mtu 2000 pref medium # ip link set dev lo mtu 65535 # ip -6 route get 2000::1; ip -6 route get 2000::2 local 2000::1 dev lo table local src ... metric 1024 pref medium local 2000::2 dev lo table local src ... metric 1024 mtu 2000 pref medium # ip link set dev lo mtu 1501 # ip -6 route get 2000::1; ip -6 route get 2000::2 local 2000::1 dev lo table local src ... metric 1024 pref medium local 2000::2 dev lo table local src ... metric 1024 mtu 1501 pref medium # ip link set dev lo mtu 65536 # ip -6 route get 2000::1; ip -6 route get 2000::2 local 2000::1 dev lo table local src ... metric 1024 pref medium local 2000::2 dev lo table local src ... metric 1024 mtu 65536 pref medium # ip -6 route del local 2000::1 # ip -6 route del local 2000::2 This is desirable because changing device mtu and then resetting it to the previous value shouldn't change the user visible routing table. Signed-off-by: Maciej Żenczykowski <maze@google.com> CC: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:19:32 -05:00
Soheil Hassas Yeganeh	3023898b7d	sock: fix sendmmsg for partial sendmsg Do not send the next message in sendmmsg for partial sendmsg invocations. sendmmsg assumes that it can continue sending the next message when the return value of the individual sendmsg invocations is positive. It results in corrupting the data for TCP, SCTP, and UNIX streams. For example, sendmmsg([["abcd"], ["efgh"]]) can result in a stream of "aefgh" if the first sendmsg invocation sends only the first byte while the second sendmsg goes through. Datagram sockets either send the entire datagram or fail, so this patch affects only sockets of type SOCK_STREAM and SOCK_SEQPACKET. Fixes: `228e548e60` ("net: Add sendmmsg socket system call") Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:18:12 -05:00
Gao Feng	aa5fd0fb77	driver: macvlan: Destroy new macvlan port if macvlan_common_newlink failed. When there is no existing macvlan port in lowdev, one new macvlan port would be created. But it doesn't be destoried when something failed later. It casues some memleak. Now add one flag to indicate if new macvlan port is created. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-09 13:14:47 -05:00
Liping Zhang	58c78e104d	netfilter: nf_tables: fix oops when inserting an element into a verdict map Dalegaard says: The following ruleset, when loaded with 'nft -f bad.txt' ----snip---- flush ruleset table ip inlinenat { map sourcemap { type ipv4_addr : verdict; } chain postrouting { ip saddr vmap @sourcemap accept } } add chain inlinenat test add element inlinenat sourcemap { 100.123.10.2 : jump test } ----snip---- results in a kernel oops: BUG: unable to handle kernel paging request at 0000000000001344 IP: [<ffffffffa07bf704>] nf_tables_check_loops+0x114/0x1f0 [nf_tables] [...] Call Trace: [<ffffffffa07c2aae>] ? nft_data_init+0x13e/0x1a0 [nf_tables] [<ffffffffa07c1950>] nft_validate_register_store+0x60/0xb0 [nf_tables] [<ffffffffa07c74b5>] nft_add_set_elem+0x545/0x5e0 [nf_tables] [<ffffffffa07bfdd0>] ? nft_table_lookup+0x30/0x60 [nf_tables] [<ffffffff8132c630>] ? nla_strcmp+0x40/0x50 [<ffffffffa07c766e>] nf_tables_newsetelem+0x11e/0x210 [nf_tables] [<ffffffff8132c400>] ? nla_validate+0x60/0x80 [<ffffffffa030d9b4>] nfnetlink_rcv+0x354/0x5a7 [nfnetlink] Because we forget to fill the net pointer in bind_ctx, so dereferencing it may cause kernel crash. Reported-by: Dalegaard <dalegaard@gmail.com> Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-11-08 23:53:39 +01:00
Florian Westphal	e0df8cae6c	netfilter: conntrack: refine gc worker heuristics Nicolas Dichtel says: After commit `b87a2f9199` ("netfilter: conntrack: add gc worker to remove timed-out entries"), netlink conntrack deletion events may be sent with a huge delay. Nicolas further points at this line: goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS); and indeed, this isn't optimal at all. Rationale here was to ensure that we don't block other work items for too long, even if nf_conntrack_htable_size is huge. But in order to have some guarantee about maximum time period where a scan of the full conntrack table completes we should always use a fixed slice size, so that once every N scans the full table has been examined at least once. We also need to balance this vs. the case where the system is either idle (i.e., conntrack table (almost) empty) or very busy (i.e. eviction happens from packet path). So, after some discussion with Nicolas: 1. want hard guarantee that we scan entire table at least once every X s -> need to scan fraction of table (get rid of upper bound) 2. don't want to eat cycles on idle or very busy system -> increase interval if we did not evict any entries 3. don't want to block other worker items for too long -> make fraction really small, and prefer small scan interval instead 4. Want reasonable short time where we detect timed-out entry when system went idle after a burst of traffic, while not doing scans all the time. -> Store next gc scan in worker, increasing delays when no eviction happened and shrinking delay when we see timed out entries. The old gc interval is turned into a max number, scans can now happen every jiffy if stale entries are present. Longest possible time period until an entry is evicted is now 2 minutes in worst case (entry expires right after it was deemed 'not expired'). Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-11-08 23:53:38 +01:00
Florian Westphal	6114cc516d	netfilter: conntrack: fix CT target for UNSPEC helpers Thomas reports its not possible to attach the H.245 helper: iptables -t raw -A PREROUTING -p udp -j CT --helper H.245 iptables: No chain/target/match by that name. xt_CT: No such helper "H.245" This is because H.245 registers as NFPROTO_UNSPEC, but the CT target passes NFPROTO_IPV4/IPV6 to nf_conntrack_helper_try_module_get. We should treat UNSPEC as wildcard and ignore the l3num instead. Reported-by: Thomas Woerner <twoerner@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-11-08 23:53:37 +01:00
Florian Westphal	fb9c9649a1	netfilter: connmark: ignore skbs with magic untracked conntrack objects The (percpu) untracked conntrack entries can end up with nonzero connmarks. The 'untracked' conntrack objects are merely a way to distinguish INVALID (i.e. protocol connection tracker says payload doesn't meet some requirements or packet was never seen by the connection tracking code) from packets that are intentionally not tracked (some icmpv6 types such as neigh solicitation, or by using 'iptables -j CT --notrack' option). Untracked conntrack objects are implementation detail, we might as well use invalid magic address instead to tell INVALID and UNTRACKED apart. Check skb->nfct for untracked dummy and behave as if skb->nfct is NULL. Reported-by: XU Tianwen <evan.xu.tianwen@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-11-08 23:53:36 +01:00
WANG Cong	8fbfef7f50	ipvs: use IPVS_CMD_ATTR_MAX for family.maxattr family.maxattr is the max index for policy[], the size of ops[] is determined with ARRAY_SIZE(). Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-11-08 23:53:30 +01:00
Alexander Duyck	fd0285a39b	fib_trie: Correct /proc/net/route off by one error The display of /proc/net/route has had a couple issues due to the fact that when I originally rewrote most of fib_trie I made it so that the iterator was tracking the next value to use instead of the current. In addition it had an off by 1 error where I was tracking the first piece of data as position 0, even though in reality that belonged to the SEQ_START_TOKEN. This patch updates the code so the iterator tracks the last reported position and key instead of the next expected position and key. In addition it shifts things so that all of the leaves start at 1 instead of trying to report leaves starting with offset 0 as being valid. With these two issues addressed this should resolve any off by one errors that were present in the display of /proc/net/route. Fixes: `25b97c016b` ("ipv4: off-by-one in continuation handling in /proc/net/route") Cc: Andy Whitcroft <apw@canonical.com> Reported-by: Jason Baron <jbaron@akamai.com> Tested-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-07 20:40:27 -05:00

1 2 3 4 5 ...

634209 Commits