linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-08 05:01:48 +00:00

Author	SHA1	Message	Date
Tom Marshall	a4d258036e	tcp: Fix race in tcp_poll If a RST comes in immediately after checking sk->sk_err, tcp_poll will return POLLIN but not POLLOUT. Fix this by checking sk->sk_err at the end of tcp_poll. Additionally, ensure the correct order of operations on SMP machines with memory barriers. Signed-off-by: Tom Marshall <tdm.code@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-20 15:42:05 -07:00
Thomas Egerer	8444cf712c	xfrm: Allow different selector family in temporary state The family parameter xfrm_state_find is used to find a state matching a certain policy. This value is set to the template's family (encap_family) right before xfrm_state_find is called. The family parameter is however also used to construct a temporary state in xfrm_state_find itself which is wrong for inter-family scenarios because it produces a selector for the wrong family. Since this selector is included in the xfrm_user_acquire structure, user space programs misinterpret IPv6 addresses as IPv4 and vice versa. This patch splits up the original init_tempsel function into a part that initializes the selector respectively the props and id of the temporary state, to allow for differing ip address families whithin the state. Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-20 11:11:38 -07:00
Eric Dumazet	842c74bffc	ip_gre: CONFIG_IPV6_MODULE support ipv6 can be a module, we should test CONFIG_IPV6 and CONFIG_IPV6_MODULE to enable ipv6 bits in ip_gre. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-20 10:06:12 -07:00
Michael Kerrisk	a89b47639f	ipv4: enable getsockopt() for IP_NODEFRAG While integrating your man-pages patch for IP_NODEFRAG, I noticed that this option is settable by setsockopt(), but not gettable by getsockopt(). I suppose this is not intended. The (untested, trivial) patch below adds getsockopt() support. Signed-off-by: Michael kerrisk <mtk.manpages@gmail.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-13 19:57:23 -07:00
Bob Arendt	7998156344	ipv4: force_igmp_version ignored when a IGMPv3 query received After all these years, it turns out that the /proc/sys/net/ipv4/conf//force_igmp_version parameter isn't fully implemented. Symptom: When set force_igmp_version to a value of 2, the kernel should only perform multicast IGMPv2 operations (IETF rfc2236). An host-initiated Join message will be sent as a IGMPv2 Join message. But if a IGMPv3 query message is received, the host responds with a IGMPv3 join message. Per rfc3376 and rfc2236, a IGMPv2 host should treat a IGMPv3 query as a IGMPv2 query and respond with an IGMPv2 Join message. Consequences: This is an issue when a IGMPv3 capable switch is the querier and will only issue IGMPv3 queries (which double as IGMPv2 querys) and there's an intermediate switch that is only IGMPv2 capable. The intermediate switch processes the initial v2 Join, but fails to recognize the IGMPv3 Join responses to the Query, resulting in a dropped connection when the intermediate v2-only switch times it out. Identifying issue in the kernel source: The issue is in this section of code (in net/ipv4/igmp.c), which is called when an IGMP query is received (from mainline 2.6.36-rc3 gitweb): ... A IGMPv3 query has a length >= 12 and no sources. This routine will exit after line 880, setting the general query timer (random timeout between 0 and query response time). This calls igmp_gq_timer_expire(): ... .. which only sends a v3 response. So if a v3 query is received, the kernel always sends a v3 response. IGMP queries happen once every 60 sec (per vlan), so the traffic is low. A IGMPv3 query is* a strict superset of a IGMPv2 query, so this patch properly short circuit's the v3 behaviour. One issue is that this does not address force_igmp_version=1. Then again, I've never seen any IGMPv1 multicast equipment in the wild. However there is a lot of v2-only equipment. If it's necessary to support the IGMPv1 case as well: 837 if (len == 8 \|\| IGMP_V2_SEEN(in_dev) \|\| IGMP_V1_SEEN(in_dev)) { Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-13 12:56:51 -07:00
Eric Dumazet	719f835853	udp: add rehash on connect() commit `30fff923` introduced in linux-2.6.33 (udp: bind() optimisation) added a secondary hash on UDP, hashed on (local addr, local port). Problem is that following sequence : fd = socket(...) connect(fd, &remote, ...) not only selects remote end point (address and port), but also sets local address, while UDP stack stored in secondary hash table the socket while its local address was INADDR_ANY (or ipv6 equivalent) Sequence is : - autobind() : choose a random local port, insert socket in hash tables [while local address is INADDR_ANY] - connect() : set remote address and port, change local address to IP given by a route lookup. When an incoming UDP frame comes, if more than 10 sockets are found in primary hash table, we switch to secondary table, and fail to find socket because its local address changed. One solution to this problem is to rehash datagram socket if needed. We add a new rehash(struct socket *) method in "struct proto", and implement this method for UDP v4 & v6, using a common helper. This rehashing only takes care of secondary hash table, since primary hash (based on local port only) is not changed. Reported-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-08 21:45:01 -07:00
Jianzhao Wang	ae2688d59b	net: blackhole route should always be recalculated Blackhole routes are used when xfrm_lookup() returns -EREMOTE (error triggered by IKE for example), hence this kind of route is always temporary and so we should check if a better route exists for next packets. Bug has been introduced by commit `d11a4dc18b`. Signed-off-by: Jianzhao Wang <jianzhao.wang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-08 14:35:43 -07:00
Jarek Poplawski	f6b085b69d	ipv4: Suppress lockdep-RCU false positive in FIB trie (3) Hi, Here is one more of these warnings and a patch below: Sep 5 23:52:33 del kernel: [46044.244833] =================================================== Sep 5 23:52:33 del kernel: [46044.269681] [ INFO: suspicious rcu_dereference_check() usage. ] Sep 5 23:52:33 del kernel: [46044.277000] --------------------------------------------------- Sep 5 23:52:33 del kernel: [46044.285185] net/ipv4/fib_trie.c:1756 invoked rcu_dereference_check() without protection! Sep 5 23:52:33 del kernel: [46044.293627] Sep 5 23:52:33 del kernel: [46044.293632] other info that might help us debug this: Sep 5 23:52:33 del kernel: [46044.293634] Sep 5 23:52:33 del kernel: [46044.325333] Sep 5 23:52:33 del kernel: [46044.325335] rcu_scheduler_active = 1, debug_locks = 0 Sep 5 23:52:33 del kernel: [46044.348013] 1 lock held by pppd/1717: Sep 5 23:52:33 del kernel: [46044.357548] #0: (rtnl_mutex){+.+.+.}, at: [<c125dc1f>] rtnl_lock+0xf/0x20 Sep 5 23:52:33 del kernel: [46044.367647] Sep 5 23:52:33 del kernel: [46044.367652] stack backtrace: Sep 5 23:52:33 del kernel: [46044.387429] Pid: 1717, comm: pppd Not tainted 2.6.35.4.4a #3 Sep 5 23:52:33 del kernel: [46044.398764] Call Trace: Sep 5 23:52:33 del kernel: [46044.409596] [<c12f9aba>] ? printk+0x18/0x1e Sep 5 23:52:33 del kernel: [46044.420761] [<c1053969>] lockdep_rcu_dereference+0xa9/0xb0 Sep 5 23:52:33 del kernel: [46044.432229] [<c12b7235>] trie_firstleaf+0x65/0x70 Sep 5 23:52:33 del kernel: [46044.443941] [<c12b74d4>] fib_table_flush+0x14/0x170 Sep 5 23:52:33 del kernel: [46044.455823] [<c1033e92>] ? local_bh_enable_ip+0x62/0xd0 Sep 5 23:52:33 del kernel: [46044.467995] [<c12fc39f>] ? _raw_spin_unlock_bh+0x2f/0x40 Sep 5 23:52:33 del kernel: [46044.480404] [<c12b24d0>] ? fib_sync_down_dev+0x120/0x180 Sep 5 23:52:33 del kernel: [46044.493025] [<c12b069d>] fib_flush+0x2d/0x60 Sep 5 23:52:33 del kernel: [46044.505796] [<c12b06f5>] fib_disable_ip+0x25/0x50 Sep 5 23:52:33 del kernel: [46044.518772] [<c12b10d3>] fib_netdev_event+0x73/0xd0 Sep 5 23:52:33 del kernel: [46044.531918] [<c1048dfd>] notifier_call_chain+0x2d/0x70 Sep 5 23:52:33 del kernel: [46044.545358] [<c1048f0a>] raw_notifier_call_chain+0x1a/0x20 Sep 5 23:52:33 del kernel: [46044.559092] [<c124f687>] call_netdevice_notifiers+0x27/0x60 Sep 5 23:52:33 del kernel: [46044.573037] [<c124faec>] __dev_notify_flags+0x5c/0x80 Sep 5 23:52:33 del kernel: [46044.586489] [<c124fb47>] dev_change_flags+0x37/0x60 Sep 5 23:52:33 del kernel: [46044.599394] [<c12a8a8d>] devinet_ioctl+0x54d/0x630 Sep 5 23:52:33 del kernel: [46044.612277] [<c12aabb7>] inet_ioctl+0x97/0xc0 Sep 5 23:52:34 del kernel: [46044.625208] [<c123f6af>] sock_ioctl+0x6f/0x270 Sep 5 23:52:34 del kernel: [46044.638046] [<c109d2b0>] ? handle_mm_fault+0x420/0x6c0 Sep 5 23:52:34 del kernel: [46044.650968] [<c123f640>] ? sock_ioctl+0x0/0x270 Sep 5 23:52:34 del kernel: [46044.663865] [<c10c3188>] vfs_ioctl+0x28/0xa0 Sep 5 23:52:34 del kernel: [46044.676556] [<c10c38fa>] do_vfs_ioctl+0x6a/0x5c0 Sep 5 23:52:34 del kernel: [46044.688989] [<c1048676>] ? up_read+0x16/0x30 Sep 5 23:52:34 del kernel: [46044.701411] [<c1021376>] ? do_page_fault+0x1d6/0x3a0 Sep 5 23:52:34 del kernel: [46044.714223] [<c10b6588>] ? fget_light+0xf8/0x2f0 Sep 5 23:52:34 del kernel: [46044.726601] [<c1241f98>] ? sys_socketcall+0x208/0x2c0 Sep 5 23:52:34 del kernel: [46044.739140] [<c10c3eb3>] sys_ioctl+0x63/0x70 Sep 5 23:52:34 del kernel: [46044.751967] [<c12fca3d>] syscall_call+0x7/0xb Sep 5 23:52:34 del kernel: [46044.764734] [<c12f0000>] ? cookie_v6_check+0x3d0/0x630 --------------> This patch fixes the warning: =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- net/ipv4/fib_trie.c:1756 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by pppd/1717: #0: (rtnl_mutex){+.+.+.}, at: [<c125dc1f>] rtnl_lock+0xf/0x20 stack backtrace: Pid: 1717, comm: pppd Not tainted 2.6.35.4a #3 Call Trace: [<c12f9aba>] ? printk+0x18/0x1e [<c1053969>] lockdep_rcu_dereference+0xa9/0xb0 [<c12b7235>] trie_firstleaf+0x65/0x70 [<c12b74d4>] fib_table_flush+0x14/0x170 ... Allow trie_firstleaf() to be called either under rcu_read_lock() protection or with RTNL held. The same annotation is added to node_parent_rcu() to prevent a similar warning a bit later. Followup of commits `634a4b20` and `4eaa0e3c`. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-08 14:14:20 -07:00
David S. Miller	6f86b32518	ipv4: Fix reverse path filtering with multipath routing. Actually iterate over the next-hops to make sure we have a device match. Otherwise RP filtering is always elided when the route matched has multiple next-hops. Reported-by: Igor M Podlesny <for.poige@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-07 13:57:24 -07:00
Nicolas Dichtel	750e9fad8c	ipv4: minor fix about RPF in help of Kconfig Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-01 14:29:36 -07:00
Julia Lawall	c34186ed00	net/ipv4: Eliminate kstrdup memory leak The string clone is only used as a temporary copy of the argument val within the while loop, and so it should be freed before leaving the function. The call to strsep, however, modifies clone, so a pointer to the front of the string is kept in saved_clone, to make it possible to free it. The sematic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ local idexpression x; expression E; identifier l; statement S; @@ x= $kasprintf\\|kstrdup$(...); ... if (x == NULL) S ... when != kfree(x) when != E = x if (...) { <... when != kfree(x) goto l; ...> * return ...; } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-27 19:31:56 -07:00
KOSAKI Motohiro	d84ba638e4	tcp: select(writefds) don't hang up when a peer close connection This issue come from ruby language community. Below test program hang up when only run on Linux. % uname -mrsv Linux 2.6.26-2-486 #1 Sat Dec 26 08:37:39 UTC 2009 i686 % ruby -rsocket -ve ' BasicSocket.do_not_reverse_lookup = true serv = TCPServer.open("127.0.0.1", 0) s1 = TCPSocket.open("127.0.0.1", serv.addr[1]) s2 = serv.accept s2.close s1.write("a") rescue p $! s1.write("a") rescue p $! Thread.new { s1.write("a") }.join' ruby 1.9.3dev (2010-07-06 trunk 28554) [i686-linux] #<Errno::EPIPE: Broken pipe> [Hang Here] FreeBSD, Solaris, Mac doesn't. because Ruby's write() method call select() internally. and tcp_poll has a bug. SUS defined 'ready for writing' of select() as following. \| A descriptor shall be considered ready for writing when a call to an output \| function with O_NONBLOCK clear would not block, whether or not the function \| would transfer data successfully. That said, EPIPE situation is clearly one of 'ready for writing'. We don't have read-side issue because tcp_poll() already has read side shutdown care. \| if (sk->sk_shutdown & RCV_SHUTDOWN) \| mask \|= POLLIN \| POLLRDNORM \| POLLRDHUP; So, Let's insert same logic in write side. - reference url http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31065 http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31068 Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-25 23:02:48 -07:00
Eric Dumazet	c5ed63d66f	tcp: fix three tcp sysctls tuning As discovered by Anton Blanchard, current code to autotune tcp_death_row.sysctl_max_tw_buckets, sysctl_tcp_max_orphans and sysctl_max_syn_backlog makes little sense. The bigger a page is, the less tcp_max_orphans is : 4096 on a 512GB machine in Anton's case. (tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket)) is much bigger if spinlock debugging is on. Its wrong to select bigger limits in this case (where kernel structures are also bigger) bhash_size max is 65536, and we get this value even for small machines. A better ground is to use size of ehash table, this also makes code shorter and more obvious. Based on a patch from Anton, and another from David. Reported-and-tested-by: Anton Blanchard <anton@samba.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-25 23:02:17 -07:00
David S. Miller	ad1af0fedb	tcp: Combat per-cpu skew in orphan tests. As reported by Anton Blanchard when we use percpu_counter_read_positive() to make our orphan socket limit checks, the check can be off by up to num_cpus_online() * batch (which is 32 by default) which on a 128 cpu machine can be as large as the default orphan limit itself. Fix this by doing the full expensive sum check if the optimized check triggers. Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>	2010-08-25 02:27:49 -07:00
Florian Westphal	cca77b7c81	netfilter: fix CONFIG_COMPAT support commit `f3c5c1bfd4` (netfilter: xtables: make ip_tables reentrant) forgot to also compute the jumpstack size in the compat handlers. Result is that "iptables -I INPUT -j userchain" turns into -j DROP. Reported by Sebastian Roesner on #netfilter, closes http://bugzilla.netfilter.org/show_bug.cgi?id=669. Note: arptables change is compile-tested only. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-23 14:41:22 -07:00
Eric Dumazet	001389b958	netfilter: {ip,ip6,arp}_tables: avoid lockdep false positive After commit `24b36f019` (netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary), lockdep can raise a warning because we attempt to lock a spinlock with BH enabled, while the same lock is usually locked by another cpu in a softirq context. Disable again BH to avoid these lockdep warnings. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Diagnosed-by: David S. Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-17 15:12:14 -07:00
Dmitry Popov	ba78e2ddca	tcp: no md5sig option size check bug tcp_parse_md5sig_option doesn't check md5sig option (TCPOPT_MD5SIG) length, but tcp_v[46]_inbound_md5_hash assume that it's at least 16 bytes long. Signed-off-by: Dmitry Popov <dp@highloadlab.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-07 20:24:28 -07:00
David S. Miller	00dad5e479	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/e1000e/hw.h net/bridge/br_device.c net/bridge/br_input.c	2010-08-02 22:22:46 -07:00
Changli Gao	c893b8066c	ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice `6c79bf0f24` subtracts PPPOE_SES_HLEN from mtu at the front of ip_fragment(). So the later subtraction should be removed. The MTU of 802.1q is also 1500, so MTU should not be changed. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Bart De Schuymer <bdschuym@pandora.bo> ---- net/ipv4/ip_output.c \| 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) Signed-off-by: Bart De Schuymer <bdschuym@pandora.bo> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-02 17:25:07 -07:00
Josh Hunt	3c0fef0b7d	net: Add getsockopt support for TCP thin-streams Initial TCP thin-stream commit did not add getsockopt support for the new socket options: TCP_THIN_LINEAR_TIMEOUTS and TCP_THIN_DUPACK. This adds support for them. Signed-off-by: Josh Hunt <johunt@akamai.com> Tested-by: Andreas Petlund <apetlund@simula.no> Acked-by: Andreas Petlund <apetlund@simula.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-02 17:25:06 -07:00
David S. Miller	83bf2e4089	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6	2010-08-02 15:07:58 -07:00
Changli Gao	2452a99dc0	netfilter: nf_nat: don't check if the tuple is unique when there isn't any other choice The tuple got from unique_tuple() doesn't need to be really unique, so the check for the unique tuple isn't necessary, when there isn't any other choice. Eliminating the unnecessary nf_nat_used_tuple() can save some CPU cycles too. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-08-02 17:35:49 +02:00
Changli Gao	f43dc98b3b	netfilter: nf_nat: make unique_tuple return void The only user of unique_tuple() get_unique_tuple() doesn't care about the return value of unique_tuple(), so make unique_tuple() return void (nothing). Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-08-02 17:20:54 +02:00
Changli Gao	794dbc1d71	netfilter: nf_nat: use local variable hdrlen Use local variable hdrlen instead of ip_hdrlen(skb). Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-08-02 17:15:30 +02:00
Eric Dumazet	24b36f0193	netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary We currently disable BH for the whole duration of get_counters() On machines with a lot of cpus and large tables, this might be too long. We can disable preemption during the whole function, and disable BH only while fetching counters for the current cpu. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-08-02 16:49:01 +02:00
Dmitry Popov	a3bdb549e3	tcp: cookie transactions setsockopt memory leak There is a bug in do_tcp_setsockopt(net/ipv4/tcp.c), TCP_COOKIE_TRANSACTIONS case. In some cases (when tp->cookie_values == NULL) new tcp_cookie_values structure can be allocated (at cvp), but not bound to tp->cookie_values. So a memory leak occurs. Signed-off-by: Dmitry Popov <dp@highloadlab.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-30 23:04:07 -07:00
Changli Gao	7df0884ce1	netfilter: iptables: use skb->len for accounting Use skb->len for accounting as xt_quota does. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-07-23 16:25:11 +02:00
Changli Gao	f667009ecc	netfilter: arptables: use arp_hdr_len() use arp_hdr_len(). Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-07-23 13:40:53 +02:00
Changli Gao	c36952e524	netfilter: nf_nat_core: merge the same lines proto->unique_tuple() will be called finally, if the previous calls fail. This patch checks the false condition of (range->flags &IP_NAT_RANGE_PROTO_RANDOM) instead to avoid duplicate line of code: proto->unique_tuple(). Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-07-23 13:27:08 +02:00
Eric Dumazet	963bfeeeec	net: RTA_MARK addition Add a new rt attribute, RTA_MARK, and use it in rt_fill_info()/inet_rtm_getroute() to support following commands : ip route get 192.168.20.110 mark NUMBER ip route get 192.168.20.108 from 192.168.20.110 iif eth1 mark NUMBER ip route list cache [192.168.20.110] mark NUMBER Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-22 13:46:21 -07:00
Gustavo F. Padovan	3f30fc1570	net: remove last uses of __attribute__((packed)) Network code uses the __packed macro instead of __attribute__((packed)). Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-21 14:44:29 -07:00
David S. Miller	11fe883936	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/vhost/net.c net/bridge/br_device.c Fix merge conflict in drivers/vhost/net.c with guidance from Stephen Rothwell. Revert the effects of net-2.6 commit `573201f36f` since net-next-2.6 has fixes that make bridge netpoll work properly thus we don't need it disabled. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-20 18:25:24 -07:00
Ilpo Järvinen	45e77d3145	tcp: fix crash in tcp_xmit_retransmit_queue It can happen that there are no packets in queue while calling tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns NULL and that gets deref'ed to get sacked into a local var. There is no work to do if no packets are outstanding so we just exit early. This oops was introduced by `08ebd1721a` (tcp: remove tp->lost_out guard to make joining diff nicer). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de> Tested-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-19 12:43:49 -07:00
Ben Greear	e40dbc51fb	ipmr: Don't leak memory if fib lookup fails. This was detected using two mcast router tables. The pimreg for the second interface did not have a specific mrule, so packets received by it were handled by the default table, which had nothing configured. This caused the ipmr_fib_lookup to fail, causing the memory leak. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-15 22:38:43 -07:00
Changli Gao	3a047bf87b	rfs: call sock_rps_record_flow() in tcp_splice_read() rfs: call sock_rps_record_flow() in tcp_splice_read() call sock_rps_record_flow() in tcp_splice_read(), so the applications using splice(2) or sendfile(2) can utilize RFS. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- net/ipv4/tcp.c \| 1 + 1 file changed, 1 insertion(+) Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-14 14:45:15 -07:00
Changli Gao	7ba4291007	inet, inet6: make tcp_sendmsg() and tcp_sendpage() through inet_sendmsg() and inet_sendpage() a new boolean flag no_autobind is added to structure proto to avoid the autobind calls when the protocol is TCP. Then sock_rps_record_flow() is called int the TCP's sendmsg() and sendpage() pathes. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- include/net/inet_common.h \| 4 ++++ include/net/sock.h \| 1 + include/net/tcp.h \| 8 ++++---- net/ipv4/af_inet.c \| 15 +++++++++------ net/ipv4/tcp.c \| 11 +++++------ net/ipv4/tcp_ipv4.c \| 3 +++ net/ipv6/af_inet6.c \| 8 ++++---- net/ipv6/tcp_ipv6.c \| 3 +++ 8 files changed, 33 insertions(+), 20 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-12 20:21:46 -07:00
Eric Dumazet	4bc2f18ba4	net/ipv4: EXPORT_SYMBOL cleanups CodingStyle cleanups EXPORT_SYMBOL should immediately follow the symbol declaration. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-12 12:57:54 -07:00
Stephen Hemminger	dd4ba83dc1	gre: propagate ipv6 transport class This patch makes IPV6 over IPv4 GRE tunnel propagate the transport class field from the underlying IPV6 header to the IPV4 Type Of Service field. Without the patch, all IPV6 packets in tunnel look the same to QoS. This assumes that IPV6 transport class is exactly the same as IPv4 TOS. Not sure if that is always the case? Maybe need to mask off some bits. The mask and shift to get tclass is copied from ipv6/datagram.c Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-08 21:35:58 -07:00
David S. Miller	597e608a84	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2010-07-07 15:59:38 -07:00
George Kadianakis	49085bd7d4	net/ipv4/ip_output.c: Removal of unused variable in ip_fragment() Removal of unused integer variable in ip_fragment(). Signed-off-by: George Kadianakis <desnacked@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-07 15:44:59 -07:00
Eric Dumazet	fe76cda308	ipv4: use skb_dst_copy() in ip_copy_metadata() Avoid touching dst refcount in ip_fragment(). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-05 18:50:56 -07:00
Eric Dumazet	b13b7125e4	netfilter: ipt_REJECT: avoid touching dst ref We can avoid a pair of atomic ops in ipt_REJECT send_reset() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-07-05 10:40:09 +02:00
Changli Gao	98b0e84aaa	netfilter: ipt_REJECT: postpone the checksum calculation. postpone the checksum calculation, then if the output NIC supports checksum offloading, we can utlize it. And though the output NIC doesn't support checksum offloading, but we'll mangle this packet, this can free us from updating the checksum, as the checksum calculation occurs later. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-07-05 10:39:17 +02:00
Peter Kosyh	44b451f163	xfrm: fix xfrm by MARK logic While using xfrm by MARK feature in 2.6.34 - 2.6.35 kernels, the mark is always cleared in flowi structure via memset in _decode_session4 (net/ipv4/xfrm4_policy.c), so the policy lookup fails. IPv6 code is affected by this bug too. Signed-off-by: Peter Kosyh <p.kosyh@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-04 11:46:07 -07:00
David S. Miller	e490c1defe	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6	2010-07-02 22:42:06 -07:00
Changli Gao	d6bebca92c	fragment: add fast path for in-order fragments add fast path for in-order fragments As the fragments are sent in order in most of OSes, such as Windows, Darwin and FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue. In the fast path, we check if the skb at the end of the inet_frag_queue is the prev we expect. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- include/net/inet_frag.h \| 1 + net/ipv4/ip_fragment.c \| 12 ++++++++++++ net/ipv6/reassembly.c \| 11 +++++++++++ 3 files changed, 24 insertions(+) Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-30 13:44:29 -07:00
Eric Dumazet	4ce3c183fc	snmp: 64bit ipstats_mib for all arches /proc/net/snmp and /proc/net/netstat expose SNMP counters. Width of these counters is either 32 or 64 bits, depending on the size of "unsigned long" in kernel. This means user program parsing these files must already be prepared to deal with 64bit values, regardless of user program being 32 or 64 bit. This patch introduces 64bit snmp values for IPSTAT mib, where some counters can wrap pretty fast if they are 32bit wide. # netstat -s\|egrep "InOctets\|OutOctets" InOctets: 244068329096 OutOctets: 244069348848 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-30 13:31:19 -07:00
Eric Dumazet	c4ead4c595	tcp: tso_fragment() might avoid GFP_ATOMIC We can pass a gfp argument to tso_fragment() and avoid GFP_ATOMIC allocations sometimes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-28 23:24:31 -07:00
Eric Dumazet	7a9b2d5950	net: use this_cpu_ptr() use this_cpu_ptr(p) instead of per_cpu_ptr(p, smp_processor_id()) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-28 23:24:29 -07:00
Patrick McHardy	7eb9282cd0	netfilter: ipt_LOG/ip6t_LOG: add option to print decoded MAC header The LOG targets print the entire MAC header as one long string, which is not readable very well: IN=eth0 OUT= MAC=00:15:f2:24:91:f8:00:1b:24:dc:61:e6:08:00 ... Add an option to decode known header formats (currently just ARPHRD_ETHER devices) in their individual fields: IN=eth0 OUT= MACSRC=00:1b:24:dc:61:e6 MACDST=00:15:f2:24:91:f8 MACPROTO=0800 ... IN=eth0 OUT= MACSRC=00:1b:24:dc:61:e6 MACDST=00:15:f2:24:91:f8 MACPROTO=86dd ... The option needs to be explicitly enabled by userspace to avoid breaking existing parsers. Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-06-28 14:16:08 +02:00

1 2 3 4 5 ...

3788 Commits