linux

Author	SHA1	Message	Date
Moni Shoua	92e584fe44	net/bonding: Fix potential bad memory access during bonding events When queuing work to send the NETDEV_BONDING_INFO netdev event, it's possible that when the work is executed, the pointer to the slave becomes invalid. This can happen if between queuing the event and the execution of the work, the net-device was un-ensvaled and re-enslaved. Fix that by queuing a work with the data of the slave instead of the slave structure. Fixes: `69e6113343` ('net/bonding: Notify state change on slaves') Reported-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 14:03:53 -08:00
David S. Miller	9dce285b70	Merge branch 'tipc-next' Richard Alpe says: ==================== tipc: new compat layer for the legacy NL API This is a compatibility / transcoding layer for the old netlink API. It relies on the new netlink API to collect data or perform actions (dumpit / doit). The main benefit of this compat layer is that it removes a lot of complex code from the tipc core as only the new API needs to be able harness data or perform actions. I.e. the compat layer isn't concerned with locking or how the internal data-structures look. As long as the new API stays relatively intact the compat layer should be fine. The main challenge in this compat layer is the randomness of the legacy API. Some commands send binary data and some send ASCII data, some are very picky in optimizing there buffer sizes and some just don't care. Most legacy commands put there data in a single TLV (data container) but some segment the data into multiple TLV's. This list of randomness goes on and on.. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:53 -08:00
Richard Alpe	941787b829	tipc: remove tipc_snprintf tipc_snprintf() was heavily utilized by the old netlink API which no longer exists (now netlink compat). In this patch we swap tipc_snprintf() to the identical scnprintf() in the only remaining occurrence. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	22ae7cff50	tipc: nl compat add noop and remove legacy nl framework Add TIPC_CMD_NOOP to compat layer and remove the old framework. All legacy nl commands are now converted to the compat layer in netlink_compat.c. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	5a81a6377b	tipc: convert legacy nl stats show to nl compat Convert TIPC_CMD_SHOW_STATS to compat layer. This command does not have any counterpart in the new API, meaning it now solely exists as a function in the compat layer. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	3c26181c5b	tipc: convert legacy nl net id get to nl compat Convert TIPC_CMD_GET_NETID to compat dumpit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	964f9501c1	tipc: convert legacy nl net id set to nl compat Convert TIPC_CMD_SET_NETID to compat doit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	d7cc75d3cb	tipc: convert legacy nl node addr set to nl compat Convert TIPC_CMD_SET_NODE_ADDR to compat doit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	4b28cb581d	tipc: convert legacy nl node dump to nl compat Convert TIPC_CMD_GET_NODES to compat dumpit and remove global node counter solely used by the legacy API. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	5bfc335a63	tipc: convert legacy nl media dump to nl compat Convert TIPC_CMD_GET_MEDIA_NAMES to compat dumpit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:48 -08:00
Richard Alpe	487d2a3a13	tipc: convert legacy nl socket dump to nl compat Convert socket (port) listing to compat dumpit call. If a socket (port) has publications a second dumpit call is issued to collect them and format then into the legacy buffer before continuing to process the sockets (ports). Command converted in this patch: TIPC_CMD_SHOW_PORTS Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:48 -08:00
Richard Alpe	44a8ae94fd	tipc: convert legacy nl name table dump to nl compat Add functionality for printing a dump header and convert TIPC_CMD_SHOW_NAME_TABLE to compat dumpit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:48 -08:00
Richard Alpe	1817877b3c	tipc: convert legacy nl link stat reset to nl compat Convert TIPC_CMD_RESET_LINK_STATS to compat doit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:48 -08:00
Richard Alpe	37e2d4843f	tipc: convert legacy nl link prop set to nl compat Convert setting of link proprieties to compat doit calls. Commands converted in this patch: TIPC_CMD_SET_LINK_TOL TIPC_CMD_SET_LINK_PRI TIPC_CMD_SET_LINK_WINDOW Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:48 -08:00
Richard Alpe	357ebdbfca	tipc: convert legacy nl link dump to nl compat Convert TIPC_CMD_GET_LINKS to compat dumpit and remove global link counter solely used by the legacy API. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:48 -08:00
Richard Alpe	f2b3b2d4cc	tipc: convert legacy nl link stat to nl compat Add functionality for safely appending string data to a TLV without keeping write count in the caller. Convert TIPC_CMD_SHOW_LINK_STATS to compat dumpit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:47 -08:00
Richard Alpe	9ab154658a	tipc: convert legacy nl bearer enable/disable to nl compat Introduce a framework for transcoding legacy nl action into actions (.doit) calls from the new nl API. This is done by converting the incoming TLV data into netlink data with nested netlink attributes. Unfortunately due to the randomness of the legacy API we can't do this generically so each legacy netlink command requires a specific transcoding recipe. In this case for bearer enable and bearer disable. Convert TIPC_CMD_ENABLE_BEARER and TIPC_CMD_DISABLE_BEARER into doit compat calls. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:47 -08:00
Richard Alpe	d0796d1ef6	tipc: convert legacy nl bearer dump to nl compat Introduce a framework for dumping netlink data from the new netlink API and formatting it to the old legacy API format. This is done by looping the dump data and calling a format handler for each entity, in this case a bearer. We dump until either all data is dumped or we reach the limited buffer size of the legacy API. Remember, the legacy API doesn't scale. In this commit we convert TIPC_CMD_GET_BEARER_NAMES to use the compat layer. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:47 -08:00
Richard Alpe	bfb3e5dd8d	tipc: move and rename the legacy nl api to "nl compat" The new netlink API is no longer "v2" but rather the standard API and the legacy API is now "nl compat". We split them into separate start/stop and put them in different files in order to further distinguish them. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:47 -08:00
David S. Miller	c8ac18f200	Major changes: iwlwifi: * more work for new devices (4165 / 8260) * cleanups / improvemnts in rate control * fixes for TDLS * major statistics work from Johannes - more to come * improvements for the fw error dump infrastructure * usual amount of small fixes here and there (scan, D0i3 etc...) * add support for beamforming * enable stuck queue detection for iwlmvm * a few fixes for EBS scan * fixes for various failure paths * improvements for TDLS Offchannel wil6210: * performance tuning * some AP features brcm80211: * rework some code in SDIO part of the brcmfmac driver related to suspend/resume that were found doing stress testing * in PCIe part scheduling of worker thread needed to be relaxed * minor fixes and exposing firmware revision information to user-space, ie. ethtool. mwifiex: * enhancements for change virtual interface handling * remove coupling between netdev and FW supported interface combination, now conversion from any type of supported interface types to any other type is possible * DFS support in AP mode ath9k: * fix calibration issues on some boards * Wake-on-WLAN improvements ath10k: * add support for qca6174 hardware * enable RX batching to reduce CPU load -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAABAgAGBQJU1fUoAAoJEG4XJFUm622bnjMH/25f2kyLAJSLJmiIhpEcYlNJ CJMAYsSTcqhKaMEbx742StUUDH3pa1oV4mQ5csVa2QJIsS90WzPpB1eRnNtQ69Nj 89Zfwa6bbX0TXDgw1Aa28NewPY/3xWpCjI03HBQaMIncToWv3/dzNUr0bEmIuBds wsr5Y+fy80VKnkoXG7XzGFOqmxxFwNS+UF3M1WNtQ+xcr9rK3/LLxyWFy84S9UIe 4lVOb+df91YFIJeLs28/hfTiRhvV0fWIbupGv8UhuBMho+F0dpLvra+3xksEqALu 4+AzI0j9GjqFfMzbmRBcPcT4Rl37nLmvQ+7XokrFgAYS5zp7OGylOMU3nHyvkuY= =iL8v -----END PGP SIGNATURE----- Merge tag 'wireless-drivers-next-for-davem-2015-02-07' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Major changes: iwlwifi: * more work for new devices (4165 / 8260) * cleanups / improvemnts in rate control * fixes for TDLS * major statistics work from Johannes - more to come * improvements for the fw error dump infrastructure * usual amount of small fixes here and there (scan, D0i3 etc...) * add support for beamforming * enable stuck queue detection for iwlmvm * a few fixes for EBS scan * fixes for various failure paths * improvements for TDLS Offchannel wil6210: * performance tuning * some AP features brcm80211: * rework some code in SDIO part of the brcmfmac driver related to suspend/resume that were found doing stress testing * in PCIe part scheduling of worker thread needed to be relaxed * minor fixes and exposing firmware revision information to user-space, ie. ethtool. mwifiex: * enhancements for change virtual interface handling * remove coupling between netdev and FW supported interface combination, now conversion from any type of supported interface types to any other type is possible * DFS support in AP mode ath9k: * fix calibration issues on some boards * Wake-on-WLAN improvements ath10k: * add support for qca6174 hardware * enable RX batching to reduce CPU load Conflicts: drivers/net/wireless/rtlwifi/pci.c Conflict resolution is to get rid of the 'end' label and keep the rest. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 12:13:58 -08:00
Eric Dumazet	93c1af6ca9	net:rfs: adjust table size checking Make sure root user does not try something stupid. Also make sure mask field in struct rps_sock_flow_table does not share a cache line with the potentially often dirtied flow table. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `567e4b7973` ("net: rfs: add hash collision detection") Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 21:54:09 -08:00
Hariprasad Shenai	acde2c2d28	cxgb4: Fix trace observed while dumping clip_tbl Handle clip_tbl debugfs entry, when clip_tbl isn't allocated. In commit `b5a02f503c` ("cxgb4: Update ipv6 address handling api") wrong argument was passed for single_open for clip_tbl debugfs entry, which led to below trace. Fixing it. ====== call Trace: [<ffffffffa073c606>] clip_tbl_open+0x16/0x30 [cxgb4] [<ffffffff8119e2fa>] do_dentry_open+0x21a/0x370 [<ffffffff8119e499>] vfs_open+0x49/0x50 [<ffffffff811b0d0e>] do_last+0x21e/0x800 [<ffffffff811b1382>] path_openat+0x92/0x470 [<ffffffff8110569f>] ? rb_reserve_next_event+0xaf/0x380 [<ffffffff8110569f>] ? rb_reserve_next_event+0xaf/0x380 [<ffffffff811b189a>] do_filp_open+0x4a/0xa0 [<ffffffff811bdc5d>] ? __alloc_fd+0xcd/0x140 [<ffffffff8119fa4a>] do_sys_open+0x11a/0x230 [<ffffffff8101219f>] ? syscall_trace_enter_phase2+0xaf/0x1b0 [<ffffffff8119fb9e>] SyS_open+0x1e/0x20 [<ffffffff815bf6f0>] tracesys_phase2+0xd4/0xd9 Code: 89 e5 66 66 66 66 90 48 8b 47 e0 48 8b 40 30 48 8b 40 58 c9 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 48 8b 47 e0 <48> 8b 40 58 c9 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 RIP [<ffffffff8120898d>] PDE_DATA+0xd/0x20 RSP <ffff8800b08c3c48> CR2: 0000000000000058 ===== Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 21:53:27 -08:00
Stephen Rothwell	61d7b09773	rhashtable: using ERR_PTR requires linux/err.h Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 21:52:24 -08:00
Shrikrishna Khare	dd83829ed9	Driver: Vmxnet3: Change the hex constant to its decimal equivalent The hex constant chosen for VMXNET3_REV1_MAGIC is offensive, replace it with its decimal equivalent. Signed-off-by: Shrikrishna Khare <skhare@vmware.com> Reviewed-by: Shreyas Bhatewara <sbhatewara@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 16:55:01 -08:00
Eric Dumazet	567e4b7973	net: rfs: add hash collision detection Receive Flow Steering is a nice solution but suffers from hash collisions when a mix of connected and unconnected traffic is received on the host, when flow hash table is populated. Also, clearing flow in inet_release() makes RFS not very good for short lived flows, as many packets can follow close(). (FIN , ACK packets, ...) This patch extends the information stored into global hash table to not only include cpu number, but upper part of the hash value. I use a 32bit value, and dynamically split it in two parts. For host with less than 64 possible cpus, this gives 6 bits for the cpu number, and 26 (32-6) bits for the upper part of the hash. Since hash bucket selection use low order bits of the hash, we have a full hash match, if /proc/sys/net/core/rps_sock_flow_entries is big enough. If the hash found in flow table does not match, we fallback to RPS (if it is enabled for the rxqueue). This means that a packet for an non connected flow can avoid the IPI through a unrelated/victim CPU. This also means we no longer have to clear the table at socket close time, and this helps short lived flows performance. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 16:53:57 -08:00
Sabrina Dubroca	096a4cfa58	net: fix a typo in skb_checksum_validate_zero_check Remove trailing underscore. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 16:28:26 -08:00
Sabrina Dubroca	3e97fa7059	gre/ipip: use be16 variants of netlink functions encap.sport and encap.dport are __be16, use nla_{get,put}_be16 instead of nla_{get,put}_u16. Fixes the sparse warnings: warning: incorrect type in assignment (different base types) expected restricted __be32 [addressable] [usertype] o_key got restricted __be16 [addressable] [usertype] i_flags warning: incorrect type in assignment (different base types) expected restricted __be16 [usertype] sport got unsigned short warning: incorrect type in assignment (different base types) expected restricted __be16 [usertype] dport got unsigned short warning: incorrect type in argument 3 (different base types) expected unsigned short [unsigned] [usertype] value got restricted __be16 [usertype] sport warning: incorrect type in argument 3 (different base types) expected unsigned short [unsigned] [usertype] value got restricted __be16 [usertype] dport Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 16:28:06 -08:00
Jon Paul Maloy	51a00daf73	tipc: fix bug in socket reception function In commit `c637c10355` ("tipc: resolve race problem at unicast message reception") we introduced a time limit for how long the function tipc_sk_eneque() would be allowed to execute its loop. Unfortunately, the test for when this limit is passed was put in the wrong place, resulting in a lost message when the test is true. We fix this by moving the test to before we dequeue the next buffer from the input queue. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 13:09:25 -08:00
Michael Büsch	662f5533c4	rt6_probe_deferred: Do not depend on struct ordering rt6_probe allocates a struct __rt6_probe_work and schedules a work handler rt6_probe_deferred. But rt6_probe_deferred kfree's the struct work_struct instead of struct __rt6_probe_work. This works, because struct work_struct is the first element of struct __rt6_probe_work. Change it to kfree struct __rt6_probe_work to not implicitly depend on struct work_struct being the first element. This does not affect the generated code. Signed-off-by: Michael Buesch <m@bues.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 13:00:43 -08:00
David S. Miller	f06535c599	Merge branch 'tcp_ack_loops' Neal Cardwellsays: ==================== tcp: mitigate TCP ACK loops due to out-of-window validation dupacks This patch series mitigates "ack loop" DoS scenarios by rate-limiting outgoing duplicate ACKs sent in response to incoming "out of window" segments. Background ----------- There are several cases in which the TCP RFCs specify that a TCP endpoint should send a pure duplicate ACK in response to a pure duplicate ACK that appears to be invalid due to being "out of window": (1) RFC 793 (section 3.9, page 69) specifies that endpoints should send a duplicate ACK in response to an ACK when the incoming sequence number is invalid due to being outside the receive window: "If an incoming segment is not acceptable, an acknowledgment should be sent in reply". (2) RFC 793 (section 3.9, page 72) says: "If the ACK acknowledges something not yet sent (SEG.ACK > SND.NXT) then send an ACK". (3) RFC 1323 (section 4.2.1, page 18) specifies that endpoints should send a duplicate ACK in response to an ACK when the PAWS check for the incoming timestamp value fails: "If .... SEG.TSval < TS.Recent and if TS.Recent is valid ... Send an acknowledgement in reply" The problem ------------ Normally, this is not a problem. However, a buggy middlebox or malicious man-in-the-middle can inject a few packets into the conversation that advance each endpoint's notion of the current window (sequence, ACK, or timestamp), without either side noticing. In this case, from then on each side can think the other is sending invalid segments. Thus an infinite feedback loop of duplicate ACKs can ensue, as each endpoint receives a duplicate ACK, decides that it is invalid (due to sequence number, ACK number, or timestamp), and then sends a dupack in reply, which the other side decides is invalid, responding with a dupack... ad infinitum. This ping-pong feedback loop can happen at a very high rate. This phenomenon can and does happen in practice. It has been seen in datacenter and Internet contexts at Google, and has been documented by Anil Agarwal in the Nov 2013 tcpm thread "TCP mismatched sequence numbers issue", and Avery Fay in the Feb 2015 Linux netdev thread "Invalid timestamp? causing tight ack loop (hundreds of thousands of packets / sec)". This patch series ------------------ This patch series mitigates such ack loops by rate-limiting outgoing duplicate ACKs sent in response to incoming TCP packets that are for an existing connection but that are invalid due to any of the reasons mentioned above: sequence number (1), ACK field (2), or timestamp value (3). The rate limit for such duplicate ACKs is specified by a new sysctl, tcp_invalid_ratelimit, which specifies the minimal space between such outbound duplicate ACKs, in milliseconds. The default is 500 (500ms), and 0 disables the mechanism. We rate-limit these duplicate ACK responses rather than blocking them entirely or resetting the connection, because legitimate connections can rely on dupacks in response to some out-of-window segments. For example, zero window probes are typically sent with a sequence number that is below the current window, and ZWPs thus expect to thus elicit a dupack in response. Testing: this approach has been in use at Google for a while. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 01:03:26 -08:00
Neal Cardwell	4fb17a6091	tcp: mitigate ACK loops for connections as tcp_timewait_sock Ensure that in state FIN_WAIT2 or TIME_WAIT, where the connection is represented by a tcp_timewait_sock, we rate limit dupacks in response to incoming packets (a) with TCP timestamps that fail PAWS checks, or (b) with sequence numbers that are out of the acceptable window. We do not send a dupack in response to out-of-window packets if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we last sent a dupack in response to an out-of-window packet. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 01:03:13 -08:00
Neal Cardwell	f2b2c582e8	tcp: mitigate ACK loops for connections as tcp_sock Ensure that in state ESTABLISHED, where the connection is represented by a tcp_sock, we rate limit dupacks in response to incoming packets (a) with TCP timestamps that fail PAWS checks, or (b) with sequence numbers or ACK numbers that are out of the acceptable window. We do not send a dupack in response to out-of-window packets if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we last sent a dupack in response to an out-of-window packet. There is already a similar (although global) rate-limiting mechanism for "challenge ACKs". When deciding whether to send a challence ACK, we first consult the new per-connection rate limit, and then the global rate limit. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 01:03:12 -08:00
Neal Cardwell	a9b2c06dbe	tcp: mitigate ACK loops for connections as tcp_request_sock In the SYN_RECV state, where the TCP connection is represented by tcp_request_sock, we now rate-limit SYNACKs in response to a client's retransmitted SYNs: we do not send a SYNACK in response to client SYN if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we last sent a SYNACK in response to a client's retransmitted SYN. This allows the vast majority of legitimate client connections to proceed unimpeded, even for the most aggressive platforms, iOS and MacOS, which actually retransmit SYNs 1-second intervals for several times in a row. They use SYN RTO timeouts following the progression: 1,1,1,1,1,2,4,8,16,32. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 01:03:12 -08:00
Neal Cardwell	032ee42369	tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks Helpers for mitigating ACK loops by rate-limiting dupacks sent in response to incoming out-of-window packets. This patch includes: - rate-limiting logic - sysctl to control how often we allow dupacks to out-of-window packets - SNMP counter for cases where we rate-limited our dupack sending The rate-limiting logic in this patch decides to not send dupacks in response to out-of-window segments if (a) they are SYNs or pure ACKs and (b) the remote endpoint is sending them faster than the configured rate limit. We rate-limit our responses rather than blocking them entirely or resetting the connection, because legitimate connections can rely on dupacks in response to some out-of-window segments. For example, zero window probes are typically sent with a sequence number that is below the current window, and ZWPs thus expect to thus elicit a dupack in response. We allow dupacks in response to TCP segments with data, because these may be spurious retransmissions for which the remote endpoint wants to receive DSACKs. This is safe because segments with data can't realistically be part of ACK loops, which by their nature consist of each side sending pure/data-less ACKs to each other. The dupack interval is controlled by a new sysctl knob, tcp_invalid_ratelimit, given in milliseconds, in case an administrator needs to dial this upward in the face of a high-rate DoS attack. The name and units are chosen to be analogous to the existing analogous knob for ICMP, icmp_ratelimit. The default value for tcp_invalid_ratelimit is 500ms, which allows at most one such dupack per 500ms. This is chosen to be 2x faster than the 1-second minimum RTO interval allowed by RFC 6298 (section 2, rule 2.4). We allow the extra 2x factor because network delay variations can cause packets sent at 1 second intervals to be compressed and arrive much closer. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 01:03:12 -08:00
Pravin B Shelar	ca539345f8	openvswitch: Initialize unmasked key and uid len Flow alloc needs to initialize unmasked key pointer. Otherwise it can crash kernel trying to free random unmasked-key pointer. general protection fault: 0000 [#1] SMP 3.19.0-rc6-net-next+ #457 Hardware name: Supermicro X7DWU/X7DWU, BIOS 1.1 04/30/2008 RIP: 0010:[<ffffffff8111df0e>] [<ffffffff8111df0e>] kfree+0xac/0x196 Call Trace: [<ffffffffa060bd87>] flow_free+0x21/0x59 [openvswitch] [<ffffffffa060bde0>] ovs_flow_free+0x21/0x23 [openvswitch] [<ffffffffa0605b4a>] ovs_packet_cmd_execute+0x2f3/0x35f [openvswitch] [<ffffffffa0605995>] ? ovs_packet_cmd_execute+0x13e/0x35f [openvswitch] [<ffffffff811fe6fb>] ? nla_parse+0x4f/0xec [<ffffffff8139a2fc>] genl_family_rcv_msg+0x26d/0x2c9 [<ffffffff8107620f>] ? __lock_acquire+0x90e/0x9aa [<ffffffff8139a3be>] genl_rcv_msg+0x66/0x89 [<ffffffff8139a358>] ? genl_family_rcv_msg+0x2c9/0x2c9 [<ffffffff81399591>] netlink_rcv_skb+0x3e/0x95 [<ffffffff81399898>] ? genl_rcv+0x18/0x37 [<ffffffff813998a7>] genl_rcv+0x27/0x37 [<ffffffff81399033>] netlink_unicast+0x103/0x191 [<ffffffff81399382>] netlink_sendmsg+0x2c1/0x310 [<ffffffff811007ad>] ? might_fault+0x50/0xa0 [<ffffffff8135c773>] do_sock_sendmsg+0x5f/0x7a [<ffffffff8135c799>] sock_sendmsg+0xb/0xd [<ffffffff8135cacf>] ___sys_sendmsg+0x1a3/0x218 [<ffffffff8113e54b>] ? get_close_on_exec+0x86/0x86 [<ffffffff8115a9d0>] ? fsnotify+0x32c/0x348 [<ffffffff8115a720>] ? fsnotify+0x7c/0x348 [<ffffffff8113e5f5>] ? __fget+0xaa/0xbf [<ffffffff8113e54b>] ? get_close_on_exec+0x86/0x86 [<ffffffff8135cccd>] __sys_sendmsg+0x3d/0x5e [<ffffffff8135cd02>] SyS_sendmsg+0x14/0x16 [<ffffffff81411852>] system_call_fastpath+0x12/0x17 Fixes: 74ed7ab9264("openvswitch: Add support for unique flow IDs.") CC: Joe Stringer <joestringer@nicira.com> Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 00:51:14 -08:00
David S. Miller	34afb4eb03	Merge branch 'cxgb4' Hariprasad Shenai says: ==================== Add support to dump some hw debug info This patch series adds support to dump sensor info, dump Transport Processor event trace, dump Upper Layer Protocol RX module command trace, dump mailbox contents and dump Transport Processor congestion control configuration. Will send a separate patch series for all the hw stats patches, by moving them to ethtool. The patches series is created against 'net-next' tree. And includes patches on cxgb4 driver. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. V2: Dopped all hw stats related patches. Added a new patch which adds support to dump congestion control table. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:53:03 -08:00
Hariprasad Shenai	bad4379263	cxgb4: Add support in debugfs to dump the congestion control table Dump Transport Processor modules congestion control configuration Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:52:39 -08:00
Hariprasad Shenai	bf7c781d57	cxgb4: Add support to dump mailbox content in debugfs Adds support to dump the current contents of mailbox and the driver which owns it. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:52:39 -08:00
Hariprasad Shenai	797ff0f573	cxgb4: Add support for ULP RX logic analyzer output in debugfs Dump Upper Layer Protocol RX module command trace Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:52:39 -08:00
Hariprasad Shenai	2d277b3b44	cxgb4: Added support in debugfs to display TP logic analyzer output Dump Transport Processor event trace. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:52:39 -08:00
Hariprasad Shenai	70a5f3bb5f	cxgb4: Add support in debugfs to display sensor information Dump out various chip sensor information. Currently Chip Temperature and Core Voltage. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:52:39 -08:00
David S. Miller	bdb2748202	Merge branch 'be2net' Sathya Perla says: ==================== be2net: patch set Hi Dave, pls consider applying the following patch-set to the net-next tree. It has 5 code/style cleanup patches and 4 patches that add functionality to the driver. Patch 1 moves routines that were not needed to be in be.h to the respective src files, to avoid unnecessary compilation. Patch 2 replaces (1 << x) with BIT(x) macro Patch 3 refactors code that checks if a FW flash file is compatible with the adapter. The code is now refactored into 2 routines, the first one gets the file type from the image file and the 2nd routine checks if the file type is compatible with the adapter. Patch 4 adds compatibility checks for flashing a FW image on the new Skyhawk P2 HW revision. Patch 5 adds support for a new "offset based" flashing scheme, wherein the driver informs the FW of the offset at which each component in the flash file is to be flashed at. This helps flashing components that were previously not recognized by the running FW. Patch 6 simplifies the be_cmd_rx_filter() routine, by passing to it the filter flags already used in the FW cmd, instead of the netdev flags that were converted to the FW-cmd flags. Patch 7 introduces helper routines in be_set_rx_mode() and be_vid_config() to improve code readability. Patch 8 adds processing of port-misconfig async event sent by the FW. Patch 9 removes unnecessary swapping of a field in the TX desc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:51:02 -08:00
Sathya Perla	f986afcbe0	be2net: avoid unncessary swapping of fields in eth_tx_wrb The 32-bit fields of a tx-wrb are little endian. The driver is currently using be_dws_le_to_cpu() routine to swap (cpu to le) all the fields of a tx-wrb. So, the rsvd field is also unnecessarily swapped. This patch fixes this by individually swapping the required fields. Also, the type of the fields in eth_tx_wrb{} is now changed to __le32 from u32 to avoid sparse warnings. Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:50:59 -08:00
Vasundhara Volam	21252377bb	be2net: process port misconfig async event This patch adds support for processing the port misconfigure async event generated by the FW. This event is generated typically when an optical module is incorrectly installed or is faulty. This patch also moves the port_name field to the adapter struct for logging the event. As the be_cmd_query_port_name() call is now moved to be_get_config(), it is modified to use the mailbox instead of MCCQ Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:50:59 -08:00
Sathya Perla	f66b7cfd95	be2net: refactor be_set_rx_mode() and be_vid_config() for readability This patch re-factors the filter setting (uc-list, mc-list, promisc, vlan) code in be_set_rx_mode() and be_vid_config() to make it more readable and reduce code duplication. This patch adds a separate field to track the state/mode of filtering, along with moving all the filtering related fields to one place in be be_adapter structure. Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:50:59 -08:00
Sathya Perla	ac34b74378	be2net: remove duplicate code in be_cmd_rx_filter() This patch passes BE_IF_FLAGS_XXX flags to be_cmd_rx_filter() routine instead of the IFF_XXX flags. Doing this gets rid of the code to convert the IFF_XXX flags to the BE_IF_FLAGS_XXX used by the FW cmd. The patch also removes code for setting if_flags_mask that was duplicated for each filter mode. Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:50:58 -08:00
Vasundhara Volam	70a7b52570	be2net: use offset based FW flashing for Skyhawk chip While sending FW update cmds to the FW, the driver specifies the "type" of each component that needs to be flashed. The FW then picks the offset in the flash area at which the componnet is to be flashed. This doesn't work when new components that the current FW doesn't recognize, need to be flashed. Recent FWs (10.2 and above) support a scheme of FW-update wherein the "offset" of the component in the flash area can be specified instead of the "type". This patch uses the "offset" based FW-update mechanism and only when it fails, it fallsback to the old "type" based update. Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:50:58 -08:00
Vasundhara Volam	81a9e226ff	be2net: avoid flashing SH-B0 UFI image on SH-P2 chip Skyhawk-B0 FW UFI is not compatible to flash on Skyhawk-P2 ASIC. But, Skyhawk-P2 FW UFI is compatible with both B0 and P2 chips. Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:50:58 -08:00
Vasundhara Volam	5d3acd0d16	be2net: refactor code that checks flash file compatibility This patch re-factors the code that checks for flash file compatibility with the chip type, for better readability, as follows: - be_get_ufi_type() returns the UFI type from the flash file - be_check_ufi_compatibility() checks if the UFI type is compatible with the adapter/chip that is being flashed Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:50:58 -08:00
Vasundhara Volam	83b0611699	be2net: replace (1 << x) with BIT(x) BIT(x) is the preffered usage. Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:50:57 -08:00

1 2 3 4 5 ...

497463 Commits