linux

Author	SHA1	Message	Date
Jean Sacren	69c1d70ab6	i40evf: add missing kernel-doc argument @flush has been missing since the inception of i40evf_irq_enable(). Add it for the kernel doc. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-23 05:25:14 -07:00
Andy Shevchenko	a3524e95ac	i40e: re-use %ph specifier to hexdump a data Instead of using a custom approach change the code to use %ph format specifier. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-23 05:22:13 -07:00
Catherine Sullivan	0e320516b2	i40e/i40evf: Bump i40e to 1.3.46 and i40evf to 1.3.33 Bump up the version... Change-ID: Ib8d501021671ba20250115ed54330e2c182255b7 Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-23 05:11:17 -07:00
Akeem G Abodunrin	de445b3dc2	i40e: Disable VEB bridge mode with SR-IOV failure If a call to enable SR-IOV in the kernel failed, we need to disable I40E_FLAG_VEB_MODE_ENABLED, so that bridge mode could fall back to VEPA, which is a default. Change-ID: I12b6f776769506db85b29bea94b9c88d0b5ee65e Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-23 04:51:51 -07:00
Carolyn Wyborny	2efaad86b5	i40e: Fix an incorrect OEM version string This patch fixes a problem where the driver output of the OEM version string varied from the other tools. The mask value and the order of operations were incorrect, per the original change request. Without this patch, the version string will appear incorrect from the driver. Change-ID: Ie1ca6485284b4ce3b57e5a99b18b7641617c7ef7 Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-23 04:38:35 -07:00
Helin Zhang	58fc3267f1	i40e: fix inconsistent statuses after a PF reset This patch fixes a problem of possibly getting inconsistent flow control statuses after a PF reset. Requested_mode was being set with a default value during probing, but the initial HW state could be different from this mode. Change-ID: I772bf07b78616e87086418d4bd87954b66fa17cd Signed-off-by: Helin Zhang <helin.zhang@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-23 04:32:54 -07:00
Mitch Williams	40d01366e6	i40evf: use correct struct for list manipulation Not sure how this compiles at all. Use the correct struct for manipulating the VLAN filter list. Without this, the VLAN filter list doesn't get processed correctly, and VLAN filters will not be re-enabled after any kind of reset. Change-ID: Iceff2dc089f303058fb71ecb08419eed471e0e90 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-23 04:29:32 -07:00
Akeem G Abodunrin	09603eaa5c	i40e: Fix VEB/VEPA bridge mode mismatch issue Fix i40e_is_vsi_uplink_mode_veb to check if bridge is actually in VEB mode before allowing LB in the add VSI routine, instead of unconditionally returning VEB bridge mode. Change-ID: I162397b1bdd02367735fe9baaeb51465be2a3ce9 Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-23 04:26:18 -07:00
Anjali Singhai Jain	10dc0358e8	i40e: fix a bug in debugfs with add/del macaddr The new code flow requires us to grab the filter list lock before adding/deleting the filter. Change-ID: I4eaef508ab4da2d1b2e23f20f2a78d931d5b6aeb Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-23 04:22:49 -07:00
Anjali Singhai Jain	e7358f54a3	i40e/i40evf: Add a workaround to drop all flow control frames This patch adds a workaround to drop any flow control frames from being transmitted from any VSI. FW can still send flow control frames if flow control is enabled. With this patch in place a malicious VF cannot send flow control or PFC packets out on the wire. Change-ID: I4303b24e98b93066d2767fec24dfe78be591c277 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2015-10-23 04:17:55 -07:00
Paolo Abeni	7b1311807f	ipv4: implement support for NOPREFIXROUTE ifa flag for ipv4 address Currently adding a new ipv4 address always cause the creation of the related network route, with default metric. When a host has multiple interfaces on the same network, multiple routes with the same metric are created. If the userspace wants to set specific metric on each routes, i.e. giving better metric to ethernet links in respect to Wi-Fi ones, the network routes must be deleted and recreated, which is error-prone. This patch implements the support for IFA_F_NOPREFIXROUTE for ipv4 address. When an address is added with such flag set, no associated network route is created, no network route is deleted when said IP is gone and it's up to the user space manage such route. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-23 02:54:54 -07:00
Michael Chan	c0c050c58d	bnxt_en: New Broadcom ethernet driver. Broadcom ethernet driver for the new family of NetXtreme-C/E ethernet devices. v5: - Removed empty blank lines at end of files (noted by David Miller). - Moved busy poll helper functions to bnxt.h to at least make the .c file look less cluttered with #ifdef (noted by Stephen Hemminger). v4: - Broke up 2 long message strings with "\n" (suggested by John Linville) - Constify an array of strings (suggested by Stephen Hemminger) - Improve bnxt_vf_pciid() (suggested by Stephen Hemminger) - Use PCI_VDEVICE() to populate pci_device_id table for more compact source. v3: - Fixed 2 more sparse warnings. - Removed some unused structures in .h files. v2: - Fixed all kbuild test robot reported warnings. - Fixed many of the checkpatch.pl errors and warnings. - Fixed the Kconfig description (noted by Dmitry Kravkov). Acked-by: Eddie Wai <eddie.wai@broadcom.com> Acked-by: Jeffrey Huang <huangjw@broadcom.com> Signed-off-by: Prashant Sreedharan <prashant@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 19:30:33 -07:00
Vivien Didelot	0a31adae0b	net: dsa: mv88e6xxx: remove debugfs interface It is preferable to have a common debugfs interface for DSA or switchdev instead of a driver specific one. Thus remove the mv88e6xxx debug code. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 19:17:45 -07:00
David S. Miller	998eb8079f	Merge branch 'dsa-port_fdb_dump' Vivien Didelot says: ==================== net: dsa: implement port_fdb_dump in drivers Not all switch chips provide a Get Next kind of operation to dump FDB entries. It is preferred to let the driver handle the dump operation the way it works best for the chip. Thus, drop port_fdb_getnext and implement the port_fdb_dump operation in DSA, which pushes the switchdev FDB dump callback down to the drivers. mv88e6xxx is the only driver affected and is updated accordingly. v3 -> v4: fix rejects on latest net-next v2 -> v3: opencode switchdev_obj_dump_cb_t to avoid multiple typedef; use ether_addr_copy in fdb_dump v1 -> v2: fix a few "return err" instead of "goto unlock" in mv88e6xxx.c ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:39:07 -07:00
Vivien Didelot	1a49a2fbf8	net: dsa: remove port_fdb_getnext No driver implements port_fdb_getnext anymore, and port_fdb_dump is preferred anyway, so remove this function from DSA. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:38:45 -07:00
Vivien Didelot	2c49471b66	net: dsa: mv88e6xxx: remove port_fdb_getnext Now that port_fdb_dump is implemented and even simpler, get rid of port_fdb_getnext. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:38:43 -07:00
Vivien Didelot	f33475bd67	net: dsa: mv88e6xxx: implement port_fdb_dump Implement the port_fdb_dump DSA operation. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:38:40 -07:00
Vivien Didelot	b0e1a692ff	net: dsa: mv88e6xxx: write MAC outside of ATU Get Next code There is no need to write the MAC address before every Get Next operation, since ATU MAC registers are not cleared between calls. Move the _mv88e6xxx_atu_mac_write call outside of _mv88e6xxx_atu_getnext so future code could call ATU Get Next multiple times and save a few register access. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:38:38 -07:00
Vivien Didelot	36d04ba127	net: dsa: mv88e6xxx: write VID outside of VTU Get Next code There is no need to write the VLAN ID before every Get Next operation, since the VTU VID register is not cleared between calls. Move the VID write call in a _mv88e6xxx_vtu_vid_write function outside of _mv88e6xxx_vtu_getnext so future code could call VTU Get Next multiple times and save a few register accesses. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:38:37 -07:00
Vivien Didelot	ea70ba9806	net: dsa: add port_fdb_dump function Not all switch chips support a Get Next operation to iterate on its FDB. So add a more simple port_fdb_dump function for them. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:38:35 -07:00
David S. Miller	e9829b9745	Here's another set of patches for the current cycle: * I merged net-next back to avoid a conflict with the * cfg80211 scheduled scan API extensions * preparations for better scan result timestamping * regulatory cleanups * mac80211 statistics cleanups * a few other small cleanups and fixes -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJWJ6lbAAoJEDBSmw7B7bqraasP/Ryaa7zL10E+dOQtqBQHQeMe olbrCUtTYltr4nnuESzh5WPeIVZBQ0DIduoLLF0IDSPVwE/NrbpFUVIMHvJvr+s7 rE9k8RB4P7BMTjf+mkDX1Od9kCKGkt4ezcyt/oNIsqM12SN9JQ99itwz6Mp94xCs XKsiXJRh9f/8Qwd/74qQq1Va3UfGAVuKO8WpUe/A7TYTla8ZY20pv1D8kQKQzrFg DwsMirjmHcUpobSjnPAAmZevRxdk6o0E+P7DYG172H2Tm8/EIMR/gYMnQeYW6HkA lfMMDfAGmNvyRm8v1iuBLodREP4kn4VbhMSZDtH7D6FYfmJh5fSeG09bSe51G5Xh zv/B8A1cCbWFqtQHp3wI6ml8VDyAhDc2Hvqb75KRn6FplIkEiszVP0y3cNHWiJVt Ix6Sysoa6kQDXEgR50APeLJ3VI+/mhXmvIila4jP9PKhO14SDHrCoRQO62Z0COJ7 2E5Ir2KE8T+O9mSeuB7m8xD/t60HDd3q3tLZmH0Ps6xfxKf9y2hdZacbX4Hi5Mqk 2XxXZYnhAXUqZmZhmG3ajnEiB4UGMt21R7dIqNTaQ9chOGBkHqIZxPm82XtNb13h yHILavGpUDT0z6OB2z8fxUcj4a4SrrK+aiIGh4iFpDR0Nu0IyZ5cPHXY2FfvJWmD ZO74RMEpBodYR8BsV4yP =uZ5N -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2015-10-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Here's another set of patches for the current cycle: * I merged net-next back to avoid a conflict with the * cfg80211 scheduled scan API extensions * preparations for better scan result timestamping * regulatory cleanups * mac80211 statistics cleanups * a few other small cleanups and fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:28:41 -07:00
yankejian	c7fc9eb79a	net: hisilicon: deals with the sub ctrl by syscon the global Soc configuration is treated by syscon, and sub ctrl bus is Soc bus. it has to be treated by syscon. Signed-off-by: yankejian <yankejian@huawei.com> Signed-off-by: lisheng <lisheng011@huawei.com> Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:19:36 -07:00
David S. Miller	e9e6d79c52	Merge branch 'cxgb4-trivial-fixes' Hariprasad Shenai says: ==================== Trivial fixes for cxgb4 driver This patch series updates driver description for next gen. adapters, updates firmware info., returns error for setup_rss error case, restores L1 configuration in case of FW rejects new config, updates and aligns ethtool get stats settings, etc This patch series has been created against net-next tree and includes patches on cxgb4 and cxgb4vf driver. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:04:08 -07:00
Hariprasad Shenai	b08f2b3569	cxgb4: Update ethtool get_drvinfo to get regdump len Update ethtool get_drvinfo to display regdump len and also update firmware string version print to display N/A in case FW isn't present Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:04:02 -07:00
Hariprasad Shenai	9c673d1562	cxgb4: Use vmalloc, if kmalloc fails Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:03:58 -07:00
Hariprasad Shenai	6ac5fe75df	cxgb4: Return error if setup_rss is called before probe Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:03:57 -07:00
Hariprasad Shenai	52a5f8463b	cxgb4/cxgb4vf: Update driver desc. to include Chelsio T6 adapter Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:03:53 -07:00
Hariprasad Shenai	43eb4e82eb	cxgb4: Add info print to display number of MSI-X vectors allocated Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:03:51 -07:00
Hariprasad Shenai	4116542897	cxgb4: Restore L1 cfg, if FW rejects new L1 cfg settings In the ethtool set_settings() routine we need to remember our old L1 Configuration in case the firmware rejects the request and then restore that. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:03:50 -07:00
Hariprasad Shenai	9bfdad5ef5	cxgb4: Don't disallow turning off auto-negotiation For {1, 10, 40} Gb/s. Prohibiting turning off autonegotiation isn't anywhere in the standard. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:03:50 -07:00
Hariprasad Shenai	eed7342d4b	cxgb4: Align ethtool get stat settings Align the ethtool get stats settings with the rest so it looks uniform Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:03:49 -07:00
Pravin B Shelar	aec1592474	openvswitch: Use dev_queue_xmit for vport send. With use of lwtunnel, we can directly call dev_queue_xmit() rather than calling netdev vport send operation. Following change make tunnel vport code bit cleaner. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 06:46:16 -07:00
Pravin B Shelar	99e28f18e3	openvswitch: Fix incorrect type use. Patch fixes following sparse warning. net/openvswitch/flow_netlink.c:583:30: warning: incorrect type in assignment (different base types) net/openvswitch/flow_netlink.c:583:30: expected restricted __be16 [usertype] ipv4 net/openvswitch/flow_netlink.c:583:30: got int Fixes: `6b26ba3a7d` ("openvswitch: netlink attributes for IPv6 tunneling") Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 06:46:13 -07:00
David S. Miller	721daebbdb	Merge branch 'bpf-perf' Alexei Starovoitov says: ==================== bpf_perf_event_output helper Over the last year there were multiple attempts to let eBPF programs output data into perf events by He Kuang and Wangnan. The last one was: https://lkml.org/lkml/2015/7/20/736 It was almost perfect with exception that all bpf programs would sent data into one global perf_event. This patch set takes different approach by letting user space open independent PERF_COUNT_SW_BPF_OUTPUT events, so that program output won't collide. Wangnan is working on corresponding perf patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 06:42:23 -07:00
Alexei Starovoitov	39111695b1	samples: bpf: add bpf_perf_event_output example Performance test and example of bpf_perf_event_output(). kprobe is attached to sys_write() and trivial bpf program streams pid+cookie into userspace via PERF_COUNT_SW_BPF_OUTPUT event. Usage: $ sudo ./bld_x64/samples/bpf/trace_output recv 2968913 events per sec Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 06:42:15 -07:00
Alexei Starovoitov	a43eec3042	bpf: introduce bpf_perf_event_output() helper This helper is used to send raw data from eBPF program into special PERF_TYPE_SOFTWARE/PERF_COUNT_SW_BPF_OUTPUT perf_event. User space needs to perf_event_open() it (either for one or all cpus) and store FD into perf_event_array (similar to bpf_perf_event_read() helper) before eBPF program can send data into it. Today the programs triggered by kprobe collect the data and either store it into the maps or print it via bpf_trace_printk() where latter is the debug facility and not suitable to stream the data. This new helper replaces such bpf_trace_printk() usage and allows programs to have dedicated channel into user space for post-processing of the raw data collected. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 06:42:15 -07:00
Alexei Starovoitov	fa128e6a14	perf: pad raw data samples automatically Instead of WARN_ON in perf_event_output() on unpaded raw samples, pad them automatically. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 06:42:13 -07:00
Brenden Blanco	63b11e757d	ipvlan: read direct ifindex instead of iflink In the ipv4 outbound path of an ipvlan device in l3 mode, the ifindex is being grabbed from dev_get_iflink. This works for the physical device case, since as the documentation of that function notes: "Physical interfaces have the same 'ifindex' and 'iflink' values.". However, if the master device is a veth, and the pairs are in separate net namespaces, the route lookup will fail with -ENODEV due to outer veth pair being in a separate namespace from the ipvlan master/routing namespace. ns0 \| ns1 \| ns2 veth0a--\|--veth0b--\|--ipvl0 In ipvlan_process_v4_outbound(), a packet sent from ipvl0 in the above configuration will pass fl.flowi4_oif == veth0a to ip_route_output_flow(), but *net == ns1. Notice also that ipv6 processing is not using iflink. Since there is a discrepancy in usage, fixup both v4 and v6 case to use local dev variable. Tested this with l3 ipvlan on top of veth, as well as with single physical interface in the top namespace. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Reviewed-by: Jiri Benc <jbenc@redhat.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 06:39:08 -07:00
Eric Dumazet	dbf650b67b	tcp: fastopen: limit max_qlen Allowing an application to set whatever limit for the list of recently RST fastopen sessions [1] is not wise, as it open ways to deplete kernel memory. Cap the user provided limit by somaxconn sysctl, like listen() backlog. [1] https://tools.ietf.org/html/rfc7413#section-5.1 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 06:22:13 -07:00
Vivien Didelot	e2aacd963a	net: mdio-gpio: move platform data header This header file only contains the platform data structure definition, so move it to the include/linux/platform_data/ directory. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 19:50:44 -07:00
Vivien Didelot	844338e5a4	ARM: gemini: remove unnecessary mdio-gpio includes Remove the inclusion of linux/mdio-gpio.h in nas4220b, wbd111 and wbd222 boards since mdio-gpio is not used. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 19:50:43 -07:00
Wu Fengguang	c6aa74d546	net: hisilicon: fix ptr_ret.cocci warnings drivers/net/ethernet/hisilicon/hns/hnae.c:442:1-3: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci CC: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 19:38:26 -07:00
Eric Dumazet	feec0cb3f2	ipv6: gro: support sit protocol Tom Herbert added SIT support to GRO with commit `19424e052f` ("sit: Add gro callbacks to sit_offload"), later reverted by Herbert Xu. The problem came because Tom patch was building GRO packets without proper meta data : If packets were locally delivered, we would not care. But if packets needed to be forwarded, GSO engine was not able to segment individual segments. With the following patch, we correctly set skb->encapsulation and inner network header. We also update gso_type. Tested: Server : netserver modprobe dummy ifconfig dummy0 8.0.0.1 netmask 255.255.255.0 up arp -s 8.0.0.100 4e:32:51:04:47:e5 iptables -I INPUT -s 10.246.7.151 -j TEE --gateway 8.0.0.100 ifconfig sixtofour0 sixtofour0 Link encap:IPv6-in-IPv4 inet6 addr: 2002:af6:798::1/128 Scope:Global inet6 addr: 2002:af6:798::/128 Scope:Global UP RUNNING NOARP MTU:1480 Metric:1 RX packets:411169 errors:0 dropped:0 overruns:0 frame:0 TX packets:409414 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:20319631739 (20.3 GB) TX bytes:29529556 (29.5 MB) Client : netperf -H 2002:af6:798::1 -l 1000 & Checked on server traffic copied on dummy0 and verify segments were properly rebuilt, with proper IP headers, TCP checksums... tcpdump on eth0 shows proper GRO aggregation takes place. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 19:36:11 -07:00
Eric Dumazet	8f3af27786	net: dummy: add more features While testing my SIT/GRO patch using netfilter TEE module and a dummy device, I found some features were missing : TSO IPv6, UFO, and encapsulated traffic. ethtool -k dummy0 now gives : ... tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: on tx-tcp6-segmentation: on udp-fragmentation-offload: on ... tx-gre-segmentation: on tx-ipip-segmentation: on tx-sit-segmentation: on tx-udp_tnl-segmentation: on Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 19:36:10 -07:00
Arad, Ronen	b1974ed05e	netlink: Rightsize IFLA_AF_SPEC size calculation if_nlmsg_size() overestimates the minimum allocation size of netlink dump request (when called from rtnl_calcit()) or the size of the message (when called from rtnl_getlink()). This is because ext_filter_mask is not supported by rtnl_link_get_af_size() and rtnl_link_get_size(). The over-estimation is significant when at least one netdev has many VLANs configured (8 bytes for each configured VLAN). This patch-set "rightsizes" the protocol specific attribute size calculation by propagating ext_filter_mask to rtnl_link_get_af_size() and adding this a argument to get_link_af_size op in rtnl_af_ops. Bridge module already used filtering aware sizing for notifications. br_get_link_af_size_filtered() is consistent with the modified get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops. br_get_link_af_size() becomes unused and thus removed. Signed-off-by: Ronen Arad <ronen.arad@intel.com> Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 19:15:20 -07:00
Elad Raz	6ac311ae8b	Adding switchdev ageing notification on port bridged Configure ageing time to the HW for newly bridged device CC: Scott Feldman <sfeldma@gmail.com> CC: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Elad Raz <eladr@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 07:50:57 -07:00
David S. Miller	eb9fae328f	Merge branch 'tcp-rack' Yuchung Cheng says: ==================== RACK loss detection RACK (Recent ACK) loss recovery uses the notion of time instead of packet sequence (FACK) or counts (dupthresh). It's inspired by the FACK heuristic in tcp_mark_lost_retrans(): when a limited transmit (new data packet) is sacked in recovery, then any retransmission sent before that newly sacked packet was sent must have been lost, since at least one round trip time has elapsed. But that existing heuristic from tcp_mark_lost_retrans() has several limitations: 1) it can't detect tail drops since it depends on limited transmit 2) it's disabled upon reordering (assumes no reordering) 3) it's only enabled in fast recovery but not timeout recovery RACK addresses these limitations with a core idea: an unacknowledged packet P1 is deemed lost if a packet P2 that was sent later is is s/acked, since at least one round trip has passed. Since RACK cares about the time sequence instead of the data sequence of packets, it can detect tail drops when a later retransmission is s/acked, while FACK or dupthresh can't. For reordering RACK uses a dynamically adjusted reordering window ("reo_wnd") to reduce false positives on ever (small) degree of reordering, similar to the delayed Early Retransmit. In the current patch set RACK is only a supplemental loss detection and does not trigger fast recovery. However we are developing RACK to replace or consolidate FACK/dupthresh, early retransmit, and thin-dupack. These heuristics all implicitly bear the time notion. For example, the delayed Early Retransmit is simply applying RACK to trigger the fast recovery with small inflight. RACK requires measuring the minimum RTT. Tracking a global min is less robust due to traffic engineering pathing changes. Therefore it uses a windowed filter by Kathleen Nichols. The min RTT can also be useful for various other purposes like congestion control or stat monitoring. This patch has been used on Google servers for well over 1 year. RACK has also been implemented in the QUIC protocol. We are submitting an IETF draft as well. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 07:00:59 -07:00
Yuchung Cheng	4f41b1c58a	tcp: use RACK to detect losses This patch implements the second half of RACK that uses the the most recent transmit time among all delivered packets to detect losses. tcp_rack_mark_lost() is called upon receiving a dubious ACK. It then checks if an not-yet-sacked packet was sent at least "reo_wnd" prior to the sent time of the most recently delivered. If so the packet is deemed lost. The "reo_wnd" reordering window starts with 1msec for fast loss detection and changes to min-RTT/4 when reordering is observed. We found 1msec accommodates well on tiny degree of reordering (<3 pkts) on faster links. We use min-RTT instead of SRTT because reordering is more of a path property but SRTT can be inflated by self-inflicated congestion. The factor of 4 is borrowed from the delayed early retransmit and seems to work reasonably well. Since RACK is still experimental, it is now used as a supplemental loss detection on top of existing algorithms. It is only effective after the fast recovery starts or after the timeout occurs. The fast recovery is still triggered by FACK and/or dupack threshold instead of RACK. We introduce a new sysctl net.ipv4.tcp_recovery for future experiments of loss recoveries. For now RACK can be disabled by setting it to 0. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 07:00:53 -07:00
Yuchung Cheng	659a8ad56f	tcp: track the packet timings in RACK This patch is the first half of the RACK loss recovery. RACK loss recovery uses the notion of time instead of packet sequence (FACK) or counts (dupthresh). It's inspired by the previous FACK heuristic in tcp_mark_lost_retrans(): when a limited transmit (new data packet) is sacked, then current retransmitted sequence below the newly sacked sequence must been lost, since at least one round trip time has elapsed. But it has several limitations: 1) can't detect tail drops since it depends on limited transmit 2) is disabled upon reordering (assumes no reordering) 3) only enabled in fast recovery ut not timeout recovery RACK (Recently ACK) addresses these limitations with the notion of time instead: a packet P1 is lost if a later packet P2 is s/acked, as at least one round trip has passed. Since RACK cares about the time sequence instead of the data sequence of packets, it can detect tail drops when later retransmission is s/acked while FACK or dupthresh can't. For reordering RACK uses a dynamically adjusted reordering window ("reo_wnd") to reduce false positives on ever (small) degree of reordering. This patch implements tcp_advanced_rack() which tracks the most recent transmission time among the packets that have been delivered (ACKed or SACKed) in tp->rack.mstamp. This timestamp is the key to determine which packet has been lost. Consider an example that the sender sends six packets: T1: P1 (lost) T2: P2 T3: P3 T4: P4 T100: sack of P2. rack.mstamp = T2 T101: retransmit P1 T102: sack of P2,P3,P4. rack.mstamp = T4 T205: ACK of P4 since the hole is repaired. rack.mstamp = T101 We need to be careful about spurious retransmission because it may falsely advance tp->rack.mstamp by an RTT or an RTO, causing RACK to falsely mark all packets lost, just like a spurious timeout. We identify spurious retransmission by the ACK's TS echo value. If TS option is not applicable but the retransmission is acknowledged less than min-RTT ago, it is likely to be spurious. We refrain from using the transmission time of these spurious retransmissions. The second half is implemented in the next patch that marks packet lost using RACK timestamp. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 07:00:48 -07:00
Yuchung Cheng	625a5e109a	tcp: skb_mstamp_after helper a helper to prepare the first main RACK patch. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 07:00:46 -07:00

1 2 3 4 5 ...

549398 Commits