linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-12 22:23:55 +00:00

Author	SHA1	Message	Date
Changli Gao	210d6de78c	act_mirred: don't clone skb when skb isn't shared don't clone skb when skb isn't shared When the tcf_action is TC_ACT_STOLEN, and the skb isn't shared, we don't need to clone a new skb. As the skb will be freed after this function returns, we can use it freely once we get a reference to it. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- include/net/sch_generic.h \| 11 +++++++++-- net/sched/act_mirred.c \| 6 +++--- 2 files changed, 12 insertions(+), 5 deletions(-) Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-28 23:24:32 -07:00
Eric Dumazet	c4ead4c595	tcp: tso_fragment() might avoid GFP_ATOMIC We can pass a gfp argument to tso_fragment() and avoid GFP_ATOMIC allocations sometimes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-28 23:24:31 -07:00
Eric Dumazet	9618e2ffd7	vlan: 64 bit rx counters Use u64_stats_sync infrastructure to implement 64bit rx stats. (tx stats are addressed later) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-28 23:24:31 -07:00
Eric Dumazet	bc66154efe	macvlan: 64 bit rx counters Use u64_stats_sync infrastructure to implement 64bit stats. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-28 23:24:30 -07:00
Eric Dumazet	33d91f00c7	net: u64_stats_fetch_begin_bh() and u64_stats_fetch_retry_bh() - Must disable preemption in case of 32bit UP in u64_stats_fetch_begin() and u64_stats_fetch_retry() - Add new u64_stats_fetch_begin_bh() and u64_stats_fetch_retry_bh() for network usage, disabling BH on 32bit UP only. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-28 23:24:30 -07:00
Eric Dumazet	7a9b2d5950	net: use this_cpu_ptr() use this_cpu_ptr(p) instead of per_cpu_ptr(p, smp_processor_id()) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-28 23:24:29 -07:00
Eric Dumazet	b6b3ecc71a	net: u64_stats_sync improvements - Add a comment about interrupts: 6) If counter might be written by an interrupt, readers should block interrupts. - Fix a typo in sample of use. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-28 23:24:29 -07:00
Ben Hutchings	a095cfc40e	3c59x: Specify window explicitly for access to windowed registers Currently much of the code assumes that a specific window has been selected, while a few functions save and restore the window. This makes it impossible to introduce fine-grained locking. Make those assumptions explicit by introducing wrapper functions to set the window and read/write a register. Use these everywhere except vortex_interrupt(), vortex_start_xmit() and vortex_rx(). These set the window just once, or not at all in the case of vortex_rx() as it should always be called from vortex_interrupt(). Cache the current window in struct vortex_private to avoid unnecessary hardware writes. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Tested-by: Arne Nordmark <nordmark@mech.kth.se> [against 2.6.32] Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-28 23:19:18 -07:00
Amerigo Wang	d2ef859034	mlx4: add dynamic LRO disable support This patch adds dynamic LRO diable support for mlx4 net driver. It also fixes a bug of mlx4, which checks NETIF_F_LRO flag in rx path without rtnl lock. (I don't have mlx4 card, so only did compiling test. Anyone who wants to test this is more than welcome.) This is based on Neil's initial work too, and heavily modified based on Stanislaw's suggestions. Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: Neil Horman <nhorman@redhat.com> Acked-by: Neil Horman <nhorman@redhat.com> Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-28 23:04:10 -07:00
Jon Mason	958de1931c	s2io: add dynamic LRO disable support This patch adds dynamic LRO disable support for s2io net driver, enables LRO by default, increases the driver version number, and corrects the name of the LRO modparm. This is mostly Wang's patch based on Neil's initial work, heavily modified based on Ramkrishna's suggestions. This has been tested on a Neterion Xframe adapter and verified via adapter LRO statistics. Signed-off-by: Jon Mason <jon.mason@exar.com> Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: Neil Horman <nhorman@redhat.com> Acked-by: Neil Horman <nhorman@redhat.com> Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Ramkrishna Vepa <Ramkrishna.Vepa@exar.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-28 23:04:10 -07:00
Florian Westphal	172d69e63c	syncookies: add support for ECN Allows use of ECN when syncookies are in effect by encoding ecn_ok into the syn-ack tcp timestamp. While at it, remove a uneeded #ifdef CONFIG_SYN_COOKIES. With CONFIG_SYN_COOKIES=nm want_cookie is ifdef'd to 0 and gcc removes the "if (0)". Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-26 22:00:03 -07:00
Florian Westphal	734f614bc1	syncookies: do not store rcv_wscale in tcp timestamp As pointed out by Fernando Gont there is no need to encode rcv_wscale into the cookie. We did not use the restored rcv_wscale anyway; it is recomputed via tcp_select_initial_window(). Thus we can save 4 bits in the ts option space by removing rcv_wscale. In case window scaling was not supported, we set the (invalid) wscale value 0xf. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-26 22:00:03 -07:00
Eric Dumazet	9587c6ddd4	ipv6: remove ipv6_statistics commit `9261e53701` (ipv6: making ip and icmp statistics per/namespace) forgot to remove ipv6_statistics variable. commit `bc417d99bf` (ipv6: remove stale MIB definitions) took care of icmpv6_statistics & icmpv6msg_statistics Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Denis V. Lunev <den@openvz.org> CC: Alexey Dobriyan <adobriyan@gmail.com> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 21:33:17 -07:00
Eric Dumazet	1823e4c80e	snmp: add align parameter to snmp_mib_init() In preparation for 64bit snmp counters for some mibs, add an 'align' parameter to snmp_mib_init(), instead of assuming mibs only contain 'unsigned long' fields. Callers can use __alignof__(type) to provide correct alignment. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 21:33:17 -07:00
Eric Dumazet	5eaa0bd81f	loopback: use u64_stats_sync infrastructure Commit `6b10de38f0` (loopback: Implement 64bit stats on 32bit arches) introduced 64bit stats in loopback driver, using a private seqcount and private helpers. David suggested to introduce a generic infrastructure, added in (net: Introduce u64_stats_sync infrastructure) This patch reimplements loopback 64bit stats using the u64_stats_sync infrastructure. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 21:33:16 -07:00
Eric Dumazet	4b4194c40f	arp: RCU change in arp_solicit() Avoid two atomic ops in arp_solicit() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 21:33:16 -07:00
Gerrit Renker	59b80802a8	dccp: make implementation of Syn-RTT symmetric This patch is thanks to Andre Noll who reported the issue and helped testing. The Syn-RTT sampled during the initial handshake currently only works for the client sending the DCCP-Request. TFRC penalizes the absence of an RTT sample with a very slow initial speed (1 packet per second), which delays slow-start significantly, resulting in sluggish performance. This patch mirrors the "Syn RTT" principle by adding a timestamp also onto the DCCP-Response, producing an RTT sample when the (Data)Ack completing the handshake arrives. Also changed the documentation to 'TFRC' since Syn RTTs are also used by CCID-4. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 21:33:15 -07:00
Gerrit Renker	a7d13fbf85	dccp: remove unused function argument This removes an unused 'sk' argument from several option-inserting functions. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 21:33:14 -07:00
Eric Benard	87cad5c385	net/fec: clean suspend/resume Commit `59d4289b83` converted fec to dev_pm_ops but didn't update the suspend/resume functions thus leading to the following warning : "initialization from incompatible pointer type" when CONFIG_PM is set. This patch also fixe a few indentation and style around CONFIG_PM area. Signed-off-by: Eric Bénard <eric@eukrea.com> Cc: netdev@vger.kernel.org Cc: davem@davemloft.net Cc: amit.kucheria@canonical.com Cc: s.hauer@pengutronix.de Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 21:33:14 -07:00
Divy Le Ray	d05f6cf01c	cxgb3: request 7.10 firmware The driver requests FW 7.10 Bump up driver version. Signed-off-by: Divy Le Ray <divy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 21:33:13 -07:00
Divy Le Ray	17acad30bf	cxgb3: update FW to 7.10 Update FW to 7.10 Signed-off-by: Divy Le Ray <divy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 21:33:13 -07:00
Joe Perches	f9467eaec3	net/core/pktgen.c: Use pr_<level> Add pr_fmt(fmt) KBUILD_MODNAME ": " fmt Remove "pktgen: " from formats Convert printks to pr_<level> Added func_enter() for debugging Moved version to end of string at module_init Coalesced long formats Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 21:33:12 -07:00
Hagen Paul Pfeifer	01f2f3f6ef	net: optimize Berkeley Packet Filter (BPF) processing Gcc is currenlty not in the ability to optimize the switch statement in sk_run_filter() because of dense case labels. This patch replace the OR'd labels with ordered sequenced case labels. The sk_chk_filter() function is modified to patch/replace the original OPCODES in a ordered but equivalent form. gcc is now in the ability to transform the switch statement in sk_run_filter into a jump table of complexity O(1). Until this patch gcc generates a sequence of conditional branches (O(n) of 567 byte .text segment size (arch x86_64): 7ff: 8b 06 mov (%rsi),%eax 801: 66 83 f8 35 cmp $0x35,%ax 805: 0f 84 d0 02 00 00 je adb <sk_run_filter+0x31d> 80b: 0f 87 07 01 00 00 ja 918 <sk_run_filter+0x15a> 811: 66 83 f8 15 cmp $0x15,%ax 815: 0f 84 c5 02 00 00 je ae0 <sk_run_filter+0x322> 81b: 77 73 ja 890 <sk_run_filter+0xd2> 81d: 66 83 f8 04 cmp $0x4,%ax 821: 0f 84 17 02 00 00 je a3e <sk_run_filter+0x280> 827: 77 29 ja 852 <sk_run_filter+0x94> 829: 66 83 f8 01 cmp $0x1,%ax [...] With the modification the compiler translate the switch statement into the following jump table fragment: 7ff: 66 83 3e 2c cmpw $0x2c,(%rsi) 803: 0f 87 1f 02 00 00 ja a28 <sk_run_filter+0x26a> 809: 0f b7 06 movzwl (%rsi),%eax 80c: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8) 813: 44 89 e3 mov %r12d,%ebx 816: e9 43 03 00 00 jmpq b5e <sk_run_filter+0x3a0> 81b: 41 89 dc mov %ebx,%r12d 81e: e9 3b 03 00 00 jmpq b5e <sk_run_filter+0x3a0> Furthermore, I reordered the instructions to reduce cache line misses by order the most common instruction to the start. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 21:33:12 -07:00
Ben Hutchings	bd97a63f7d	sfc: Log clearer error messages for hardware monitor Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 21:03:32 -07:00
Ben Hutchings	477e54eba4	sfc: Use Toeplitz IPv4 hash for RSS and hash insertion Insertion of the Falcon hash is unreliable. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 21:03:32 -07:00
Ben Hutchings	5d3a6fca95	sfc: Move siena_nic_data::ipv6_rss_key to efx_nic::rx_hash_key We will use this hash key for Toeplitz IPv4 hashing too. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 21:03:31 -07:00
Ben Hutchings	604f6049ba	sfc: Fix reading of inserted hash The hash appears immediately before the packet data, not at the beginning of the buffer. This means we can easily use negative offsets from the start of packet data, so adjust the data and length at the top of __efx_rx_packet() instead of wherever we consume the hash. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 21:03:30 -07:00
Vasanthy Kolluri	29046f9b1e	enic: Clean ups 1) Update copyright 2) Fix hardware queue descriptor field size CQ_ENET_RQ_DESC_FCOE_SOF_BITS 3) Include rtnetlink.h instead of if_link.h 4) Selectively flush writes to interrupt mask register 5) Use pci_enable_device_mem 6) Remove unused variables and header files 7) Fix size mismatch between memory alloc and free operations of a variable 8) Check for non null arguments to vic_provinfo_alloc Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 20:50:30 -07:00
Vasanthy Kolluri	506e119841	enic: Bug Fix: Handle surprise hardware removals Handle surprise hardware removals gracefully during devcmd issue and init, cleanup of queues. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 20:50:25 -07:00
Vasanthy Kolluri	1825aca667	enic: Feature Add: Add loopback capability to enic devices Hardware has the loopback capability to queue the packets transmitted from a device to the receive queue of the same device. enic now supports the loopback capability. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 20:50:24 -07:00
Vasanthy Kolluri	b5bab85c15	enic: Use receive queue buffer blocks of 32/64 entries Change the receive queue buffer allocations into blocks of 32 entries when ring size is less than 64, otherwise use 64 entries per block. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 20:50:24 -07:00
Vasanthy Kolluri	70feadf36d	enic: Add new firmware devcmds Add new firmware devcmds - CMD_PROXY_BY_BDF, CMD_PACKET_FILTER_ALL, CMD_ENABLE_WAIT. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 20:50:23 -07:00
Vasanthy Kolluri	a7a79debcc	enic: Use (netdev\|dev\|pr)_<level> macro helpers for logging Replace all printk routines with the (netdev\|dev\|pr)_<level> macros that provide verbose logs. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 20:50:23 -07:00
Vasanthy Kolluri	383ab92f11	enic: Clean up: Add wrapper routines for firmware devcmd calls Add wrapper routines that issue devcmds to firmware and ensure that a devcmd lock is held for each devcmd call. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 20:46:40 -07:00
Vasanthy Kolluri	99ef563901	enic: Use a lighter reset operation for enic devices The port profile information for a dynamic enic device is set by the upper layers, that are oblivious to the device reset operation. We do not want a reset operation erase the network state of a dynamic enic device as there is no way to set up the port profile information again. Hence a lighter reset operation called hang reset is used. Hang reset, unlike soft reset does not reset the network state and resets the host side state only. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 20:46:01 -07:00
Vasanthy Kolluri	f8cac14acf	enic: Bug Fix: Change hardware ingress vlan rewrite mode The current ingress vlan rewrite mode setting lets the hardware strip off the tag control information of a packet received on native vlan. As a result, the priority bits are also lost. The fix is to change the ingress vlan rewrite mode setting such that the complete tag control information is retained for packets that belong to native vlan. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 20:45:22 -07:00
Vasanthy Kolluri	88132f55d7	enic: Feature Add: Replace LRO with GRO enic now uses the GRO mechanism instead of LRO to pass skbs to upper layers. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 20:45:00 -07:00
Michael Chan	72b8a169db	cnic: Update version to 2.1.3. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 20:37:21 -07:00
Michael Chan	b177a5d5d8	cnic: Further unify kcq handling code. This eliminates some of the duplicate code for the various devices that require the same basic kcq handling. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 20:37:20 -07:00
Michael Chan	644b9d4f8b	cnic: Restructure kcq processing. By doing more work in the common function cnic_get_kcqes(), and making full use of the kcq_info structure. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 20:37:20 -07:00
Michael Chan	e6c2889478	cnic: Unify kcq allocation for all devices. By creating a common data stucture kcq_info for all devices, the kcq (kernel completion queue) for all devices can be allocated by common code. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 20:37:19 -07:00
Michael Chan	66fee9ed03	cnic: Unify IRQ code for all hardware types. By creating a common cnic_doirq(). Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 20:37:19 -07:00
Michael Chan	520efdf44f	cnic: Fine-tune CID memory space calculation. The current code makes assumptions about the CID (context ID) memory space and starting CID that may not be always correct when firmware changes. In particular, BNX2_ISCSI_START_CID may not always be fixed. We now calculate cp->max_cid_space and cp->iscsi_start_cid dynamically instead of using fixed constants. The unused cp->max_iscsi_conn is also eliminated. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-25 20:37:18 -07:00
Ben Hutchings	39c9cf0707	sfc: Record hardware RX hash on each skb where possible Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-24 22:13:24 -07:00
Ben Hutchings	2822235278	sfc: Disable setting feature flags that are not implemented Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-24 22:13:23 -07:00
Ben Hutchings	c5d5f5fdc7	sfc: Replace EFX_DRIVER_NAME with KBUILD_MODNAME Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-24 22:13:23 -07:00
Ben Hutchings	62776d034c	sfc: Implement message level control Replace EFX_ERR() with netif_err(), EFX_INFO() with netif_info(), EFX_LOG() with netif_dbg() and EFX_TRACE() and EFX_REGDUMP() with netif_vdbg(). Replace EFX_ERR_RL(), EFX_INFO_RL() and EFX_LOG_RL() using explicit calls to net_ratelimit(). Implement the ethtool operations to get and set message level flags, and add a 'debug' module parameter for the initial value. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-24 22:13:22 -07:00
Ben Hutchings	0c605a2061	sfc: Log MTD errors using partition name, not just net device name Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-24 22:13:22 -07:00
Ben Hutchings	5b98c1bfcf	sfc: Implement ethtool register dump operation Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-24 22:13:14 -07:00
Konstantin Khorenko	565b7b2d2e	tcp: do not send reset to already closed sockets i've found that tcp_close() can be called for an already closed socket, but still sends reset in this case (tcp_send_active_reset()) which seems to be incorrect. Moreover, a packet with reset is sent with different source port as original port number has been already cleared on socket. Besides that incrementing stat counter for LINUX_MIB_TCPABORTONCLOSE also does not look correct in this case. Initially this issue was found on 2.6.18-x RHEL5 kernel, but the same seems to be true for the current mainstream kernel (checked on 2.6.35-rc3). Please, correct me if i missed something. How that happens: 1) the server receives a packet for socket in TCP_CLOSE_WAIT state that triggers a tcp_reset(): Call Trace: <IRQ> [<ffffffff8025b9b9>] tcp_reset+0x12f/0x1e8 [<ffffffff80046125>] tcp_rcv_state_process+0x1c0/0xa08 [<ffffffff8003eb22>] tcp_v4_do_rcv+0x310/0x37a [<ffffffff80028bea>] tcp_v4_rcv+0x74d/0xb43 [<ffffffff8024ef4c>] ip_local_deliver_finish+0x0/0x259 [<ffffffff80037131>] ip_local_deliver+0x200/0x2f4 [<ffffffff8003843c>] ip_rcv+0x64c/0x69f [<ffffffff80021d89>] netif_receive_skb+0x4c4/0x4fa [<ffffffff80032eca>] process_backlog+0x90/0xec [<ffffffff8000cc50>] net_rx_action+0xbb/0x1f1 [<ffffffff80012d3a>] __do_softirq+0xf5/0x1ce [<ffffffff8001147a>] handle_IRQ_event+0x56/0xb0 [<ffffffff8006334c>] call_softirq+0x1c/0x28 [<ffffffff80070476>] do_softirq+0x2c/0x85 [<ffffffff80070441>] do_IRQ+0x149/0x152 [<ffffffff80062665>] ret_from_intr+0x0/0xa <EOI> [<ffffffff80008a2e>] __handle_mm_fault+0x6cd/0x1303 [<ffffffff80008903>] __handle_mm_fault+0x5a2/0x1303 [<ffffffff80033a9d>] cache_free_debugcheck+0x21f/0x22e [<ffffffff8006a263>] do_page_fault+0x49a/0x7dc [<ffffffff80066487>] thread_return+0x89/0x174 [<ffffffff800c5aee>] audit_syscall_exit+0x341/0x35c [<ffffffff80062e39>] error_exit+0x0/0x84 tcp_rcv_state_process() ... // (sk_state == TCP_CLOSE_WAIT here) ... /* step 2: check RST bit / if(th->rst) { tcp_reset(sk); goto discard; } ... --------------------------------- tcp_rcv_state_process tcp_reset tcp_done tcp_set_state(sk, TCP_CLOSE); inet_put_port __inet_put_port inet_sk(sk)->num = 0; sk->sk_shutdown = SHUTDOWN_MASK; 2) After that the process (socket owner) tries to write something to that socket and "inet_autobind" sets a _new_ (which differs from the original!) port number for the socket: Call Trace: [<ffffffff80255a12>] inet_bind_hash+0x33/0x5f [<ffffffff80257180>] inet_csk_get_port+0x216/0x268 [<ffffffff8026bcc9>] inet_autobind+0x22/0x8f [<ffffffff80049140>] inet_sendmsg+0x27/0x57 [<ffffffff8003a9d9>] do_sock_write+0xae/0xea [<ffffffff80226ac7>] sock_writev+0xdc/0xf6 [<ffffffff800680c7>] _spin_lock_irqsave+0x9/0xe [<ffffffff8001fb49>] __pollwait+0x0/0xdd [<ffffffff8008d533>] default_wake_function+0x0/0xe [<ffffffff800a4f10>] autoremove_wake_function+0x0/0x2e [<ffffffff800f0b49>] do_readv_writev+0x163/0x274 [<ffffffff80066538>] thread_return+0x13a/0x174 [<ffffffff800145d8>] tcp_poll+0x0/0x1c9 [<ffffffff800c56d3>] audit_syscall_entry+0x180/0x1b3 [<ffffffff800f0dd0>] sys_writev+0x49/0xe4 [<ffffffff800622dd>] tracesys+0xd5/0xe0 3) sendmsg fails at last with -EPIPE (=> 'write' returns -EPIPE in userspace): F: tcp_sendmsg1 -EPIPE: sk=ffff81000bda00d0, sport=49847, old_state=7, new_state=7, sk_err=0, sk_shutdown=3 Call Trace: [<ffffffff80027557>] tcp_sendmsg+0xcb/0xe87 [<ffffffff80033300>] release_sock+0x10/0xae [<ffffffff8016f20f>] vgacon_cursor+0x0/0x1a7 [<ffffffff8026bd32>] inet_autobind+0x8b/0x8f [<ffffffff8003a9d9>] do_sock_write+0xae/0xea [<ffffffff80226ac7>] sock_writev+0xdc/0xf6 [<ffffffff800680c7>] _spin_lock_irqsave+0x9/0xe [<ffffffff8001fb49>] __pollwait+0x0/0xdd [<ffffffff8008d533>] default_wake_function+0x0/0xe [<ffffffff800a4f10>] autoremove_wake_function+0x0/0x2e [<ffffffff800f0b49>] do_readv_writev+0x163/0x274 [<ffffffff80066538>] thread_return+0x13a/0x174 [<ffffffff800145d8>] tcp_poll+0x0/0x1c9 [<ffffffff800c56d3>] audit_syscall_entry+0x180/0x1b3 [<ffffffff800f0dd0>] sys_writev+0x49/0xe4 [<ffffffff800622dd>] tracesys+0xd5/0xe0 tcp_sendmsg() ... / Wait for a connection to finish. / if ((1 << sk->sk_state) & ~(TCPF_ESTABLISHED \| TCPF_CLOSE_WAIT)) { int old_state = sk->sk_state; if ((err = sk_stream_wait_connect(sk, &timeo)) != 0) { if (f_d && (err == -EPIPE)) { printk("F: tcp_sendmsg1 -EPIPE: sk=%p, sport=%u, old_state=%d, new_state=%d, " "sk_err=%d, sk_shutdown=%d\n", sk, ntohs(inet_sk(sk)->sport), old_state, sk->sk_state, sk->sk_err, sk->sk_shutdown); dump_stack(); } goto out_err; } } ... 4) Then the process (socket owner) understands that it's time to close that socket and does that (and thus triggers sending reset packet): Call Trace: ... [<ffffffff80032077>] dev_queue_xmit+0x343/0x3d6 [<ffffffff80034698>] ip_output+0x351/0x384 [<ffffffff80251ae9>] dst_output+0x0/0xe [<ffffffff80036ec6>] ip_queue_xmit+0x567/0x5d2 [<ffffffff80095700>] vprintk+0x21/0x33 [<ffffffff800070f0>] check_poison_obj+0x2e/0x206 [<ffffffff80013587>] poison_obj+0x36/0x45 [<ffffffff8025dea6>] tcp_send_active_reset+0x15/0x14d [<ffffffff80023481>] dbg_redzone1+0x1c/0x25 [<ffffffff8025dea6>] tcp_send_active_reset+0x15/0x14d [<ffffffff8000ca94>] cache_alloc_debugcheck_after+0x189/0x1c8 [<ffffffff80023405>] tcp_transmit_skb+0x764/0x786 [<ffffffff8025df8a>] tcp_send_active_reset+0xf9/0x14d [<ffffffff80258ff1>] tcp_close+0x39a/0x960 [<ffffffff8026be12>] inet_release+0x69/0x80 [<ffffffff80059b31>] sock_release+0x4f/0xcf [<ffffffff80059d4c>] sock_close+0x2c/0x30 [<ffffffff800133c9>] __fput+0xac/0x197 [<ffffffff800252bc>] filp_close+0x59/0x61 [<ffffffff8001eff6>] sys_close+0x85/0xc7 [<ffffffff800622dd>] tracesys+0xd5/0xe0 So, in brief: a received packet for socket in TCP_CLOSE_WAIT state triggers tcp_reset() which clears inet_sk(sk)->num and put socket into TCP_CLOSE state * an attempt to write to that socket forces inet_autobind() to get a new port (but the write itself fails with -EPIPE) * tcp_close() called for socket in TCP_CLOSE state sends an active reset via socket with newly allocated port This adds an additional check in tcp_close() for already closed sockets. We do not want to send anything to closed sockets. Signed-off-by: Konstantin Khorenko <khorenko@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-24 21:54:58 -07:00

1 2 3 4 5 ...

200211 Commits