linux

Author	SHA1	Message	Date
Denis V. Lunev	ce25999078	[IPV4]: sk parameter is unused in ipv4_dst_blackhole. Just remove it. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 17:42:37 -07:00
Pavel Emelyanov	fc8717baa8	[RAW]: Add raw_hashinfo member on struct proto. Sorry for the patch sequence confusion :\| but I found that the similar thing can be done for raw sockets easily too late. Expand the proto.h union with the raw_hashinfo member and use it in raw_prot and rawv6_prot. This allows to drop the protocol specific versions of hash and unhash callbacks. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 16:56:51 -07:00
Pavel Emelyanov	6ba5a3c52d	[UDP]: Make full use of proto.h.udp_hash innovation. After this we have only udp_lib_get_port to get the port and two stubs for ipv4 and ipv6. No difference in udp and udplite except for initialized h.udp_hash member. I tried to find a graceful way to drop the only difference between udp_v4_get_port and udp_v6_get_port (i.e. the rcv_saddr comparison routine), but adding one more callback on the struct proto didn't appear such :( Maybe later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 16:51:21 -07:00
Pavel Emelyanov	39d8cda76c	[SOCK]: Add udp_hash member to struct proto. Inspired by the commit `ab1e0a13` ([SOCK] proto: Add hashinfo member to struct proto) from Arnaldo, I made similar thing for UDP/-Lite IPv4 and -v6 protocols. The result is not that exciting, but it removes some levels of indirection in udpxxx_get_port and saves some space in code and text. The first step is to union existing hashinfo and new udp_hash on the struct proto and give a name to this union, since future initialization of tcpxxx_prot, dccp_vx_protinfo and udpxxx_protinfo will cause gcc warning about inability to initialize anonymous member this way. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 16:50:58 -07:00
Denis V. Lunev	22aba383ce	[IPV4]: Always pass ip_options pointer into ip_options_compile. This makes code a bit more uniform and straigthforward. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 16:36:20 -07:00
Denis V. Lunev	ef722495c8	[IPV4]: Remove unused ip_options->is_data. ip_options->is_data is assigned only and never checked. The structure is not a part of kernel interface to the userspace. So, it is safe to remove this field. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 16:35:29 -07:00
Denis V. Lunev	10fe7d85e2	[IPV4]: Remove unnecessary check for opt->is_data in ip_options_compile. There is the only way to reach ip_options compile with opt != NULL: ip_options_get_finish opt->is_data = 1; ip_options_compile(opt, NULL) So, checking for is_data inside opt != NULL branch is not needed. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 16:35:00 -07:00
Herbert Xu	69d1506731	[TCP]: Let skbs grow over a page on fast peers While testing the virtio-net driver on KVM with TSO I noticed that TSO performance with a 1500 MTU is significantly worse compared to the performance of non-TSO with a 16436 MTU. The packet dump shows that most of the packets sent are smaller than a page. Looking at the code this actually is quite obvious as it always stop extending the packet if it's the first packet yet to be sent and if it's larger than the MSS. Since each extension is bound by the page size, this means that (given a 1500 MTU) we're very unlikely to construct packets greater than a page, provided that the receiver and the path is fast enough so that packets can always be sent immediately. The fix is also quite obvious. The push calls inside the loop is just an optimisation so that we don't end up doing all the sending at the end of the loop. Therefore there is no specific reason why it has to do so at MSS boundaries. For TSO, the most natural extension of this optimisation is to do the pushing once the skb exceeds the TSO size goal. This is what the patch does and testing with KVM shows that the TSO performance with a 1500 MTU easily surpasses that of a 16436 MTU and indeed the packet sizes sent are generally larger than 16436. I don't see any obvious downsides for slower peers or connections, but it would be prudent to test this extensively to ensure that those cases don't regress. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-22 15:47:05 -07:00
Patrick McManus	ec3c0982a2	[TCP]: TCP_DEFER_ACCEPT updates - process as established Change TCP_DEFER_ACCEPT implementation so that it transitions a connection to ESTABLISHED after handshake is complete instead of leaving it in SYN-RECV until some data arrvies. Place connection in accept queue when first data packet arrives from slow path. Benefits: - established connection is now reset if it never makes it to the accept queue - diagnostic state of established matches with the packet traces showing completed handshake - TCP_DEFER_ACCEPT timeouts are expressed in seconds and can now be enforced with reasonable accuracy instead of rounding up to next exponential back-off of syn-ack retry. Signed-off-by: Patrick McManus <mcmanus@ducksong.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 16:33:01 -07:00
Patrick McManus	e4c7884028	[TCP]: TCP_DEFER_ACCEPT updates - dont retxmt synack a socket in LISTEN that had completed its 3 way handshake, but not notified userspace because of SO_DEFER_ACCEPT, would retransmit the already acked syn-ack during the time it was waiting for the first data byte from the peer. Signed-off-by: Patrick McManus <mcmanus@ducksong.com> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 16:29:22 -07:00
Patrick McManus	539fae89be	[TCP]: TCP_DEFER_ACCEPT updates - defer timeout conflicts with max_thresh timeout associated with SO_DEFER_ACCEPT wasn't being honored if it was less than the timeout allowed by the maximum syn-recv queue size algorithm. Fix by using the SO_DEFER_ACCEPT value if the ack has arrived. Signed-off-by: Patrick McManus <mcmanus@ducksong.com> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 16:27:38 -07:00
Pavel Emelyanov	28518fc170	[NET]: NULL pointer dereference and other nasty things in /proc/net/(tcp\|udp)[6] Commits f40c81 ([NETNS][IPV4] tcp - make proc handle the network namespaces) and a91275 ([NETNS][IPV6] udp - make proc handle the network namespace) both introduced bad checks on sockets and tw buckets to belong to proper net namespace. I.e. when checking for socket to belong to given net and family the do { sk = sk_next(sk); } while (sk && sk->sk_net != net && sk->sk_family != family); constructions were used. This is wrong, since as soon as the sk->sk_net fits the net the socket is immediately returned, even if it belongs to other family. As the result four /proc/net/(udp\|tcp)[6] entries show wrong info. The udp6 entry even oopses when dereferencing inet6_sk(sk) pointer: static void udp6_sock_seq_show(struct seq_file seq, struct sock sp, int bucket) { ... struct ipv6_pinfo np = inet6_sk(sp); ... dest = &np->daddr; / will be NULL for AF_INET sockets */ ... seq_printf(... dest->s6_addr32[0], dest->s6_addr32[1], dest->s6_addr32[2], dest->s6_addr32[3], ... Fix it by converting && to \|\|. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 15:52:00 -07:00
Phil Oester	12b101555f	[IPV4]: Fix null dereference in ip_defrag Been seeing occasional panics in my testing of 2.6.25-rc in ip_defrag. Offending line in ip_defrag is here: net = skb->dev->nd_net where dev is NULL. Bisected the problem down to commit `ac18e7509e` ([NETNS][FRAGS]: Make the inet_frag_queue lookup work in namespaces). Below patch (idea from Patrick McHardy) fixes the problem for me. Signed-off-by: Phil Oester <kernel@linuxace.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 15:01:50 -07:00
Daniel Lezcano	6f8b13bcb3	[NETNS][IPV6] tcp6 - make proc per namespace Make the proc for tcp6 to be per namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 04:14:45 -07:00
Daniel Lezcano	0c96d8c50b	[NETNS][IPV6] udp6 - make proc per namespace The proc init/exit functions take a new network namespace parameter in order to register/unregister /proc/net/udp6 for a namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 04:14:17 -07:00
Daniel Lezcano	f40c8174d3	[NETNS][IPV4] tcp - make proc handle the network namespaces This patch, like udp proc, makes the proc functions to take care of which namespace the socket belongs. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 04:13:54 -07:00
Daniel Lezcano	8d9f1744ca	[NETNS][IPV6] tcp - assign the netns for timewait sockets Copy the network namespace from the socket to the timewait socket. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 04:12:54 -07:00
Daniel Lezcano	a91275eff4	[NETNS][IPV6] udp - make proc handle the network namespace This patch makes the common udp proc functions to take care of which socket they should show taking into account the namespace it belongs. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 04:11:58 -07:00
Peter P Waskiewicz Jr	82cc1a7a56	[NET]: Add per-connection option to set max TSO frame size Update: My mailer ate one of Jarek's feedback mails... Fixed the parameter in netif_set_gso_max_size() to be u32, not u16. Fixed the whitespace issue due to a patch import botch. Changed the types from u32 to unsigned int to be more consistent with other variables in the area. Also brought the patch up to the latest net-2.6.26 tree. Update: Made gso_max_size container 32 bits, not 16. Moved the location of gso_max_size within netdev to be less hotpath. Made more consistent names between the sock and netdev layers, and added a define for the max GSO size. Update: Respun for net-2.6.26 tree. Update: changed max_gso_frame_size and sk_gso_max_size from signed to unsigned - thanks Stephen! This patch adds the ability for device drivers to control the size of the TSO frames being sent to them, per TCP connection. By setting the netdevice's gso_max_size value, the socket layer will set the GSO frame size based on that value. This will propogate into the TCP layer, and send TSO's of that size to the hardware. This can be desirable to help tune the bursty nature of TSO on a per-adapter basis, where one may have 1 GbE and 10 GbE devices coexisting in a system, one running multiqueue and the other not, etc. This can also be desirable for devices that cannot support full 64 KB TSO's, but still want to benefit from some level of segmentation offloading. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 03:43:19 -07:00
David S. Miller	a25606c845	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	2008-03-21 03:42:24 -07:00
Patrick McHardy	607bfbf2d5	[TCP]: Fix shrinking windows with window scaling When selecting a new window, tcp_select_window() tries not to shrink the offered window by using the maximum of the remaining offered window size and the newly calculated window size. The newly calculated window size is always a multiple of the window scaling factor, the remaining window size however might not be since it depends on rcv_wup/rcv_nxt. This means we're effectively shrinking the window when scaling it down. The dump below shows the problem (scaling factor 2^7): - Window size of 557 (71296) is advertised, up to 3111907257: IP 172.2.2.3.33000 > 172.2.2.2.33000: . ack 3111835961 win 557 <...> - New window size of 514 (65792) is advertised, up to 3111907217, 40 bytes below the last end: IP 172.2.2.3.33000 > 172.2.2.2.33000: . 3113575668:3113577116(1448) ack 3111841425 win 514 <...> The number 40 results from downscaling the remaining window: 3111907257 - 3111841425 = 65832 65832 / 2^7 = 514 65832 % 2^7 = 40 If the sender uses up the entire window before it is shrunk, this can have chaotic effects on the connection. When sending ACKs, tcp_acceptable_seq() will notice that the window has been shrunk since tcp_wnd_end() is before tp->snd_nxt, which makes it choose tcp_wnd_end() as sequence number. This will fail the receivers checks in tcp_sequence() however since it is before it's tp->rcv_wup, making it respond with a dupack. If both sides are in this condition, this leads to a constant flood of ACKs until the connection times out. Make sure the window is never shrunk by aligning the remaining window to the window scaling factor. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-20 16:11:27 -07:00
Daniel Hokka Zakrisson	d0ebf13359	[NETFILTER]: ipt_recent: sanity check hit count If a rule using ipt_recent is created with a hit count greater than ip_pkt_list_tot, the rule will never match as it cannot keep track of enough timestamps. This patch makes ipt_recent refuse to create such rules. With ip_pkt_list_tot's default value of 20, the following can be used to reproduce the problem. nc -u -l 0.0.0.0 1234 & for i in `seq 1 100`; do echo $i \| nc -w 1 -u 127.0.0.1 1234; done This limits it to 20 packets: iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \ --rsource iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \ 60 --hitcount 20 --name test --rsource -j DROP While this is unlimited: iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \ --rsource iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \ 60 --hitcount 21 --name test --rsource -j DROP With the patch the second rule-set will throw an EINVAL. Reported-by: Sean Kennedy <skennedy@vcn.com> Signed-off-by: Daniel Hokka Zakrisson <daniel@hozac.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-20 15:07:10 -07:00
Robert P. J. Day	938b93adb2	[NET]: Add debugging names to __RW_LOCK_UNLOCKED macros. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-18 00:59:23 -07:00
David S. Miller	577f99c1d0	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/rt2x00/rt2x00dev.c net/8021q/vlan_dev.c	2008-03-18 00:37:55 -07:00
Al Viro	5e226e4d90	[IPV4]: esp_output() misannotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-17 22:50:23 -07:00
Al Viro	e6f1cebf71	[NET] endianness noise: INADDR_ANY Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-17 22:44:53 -07:00
Ilpo Järvinen	5ea3a74806	[TCP]: Prevent sending past receiver window with TSO (at last skb) With TSO it was possible to send past the receiver window when the skb to be sent was the last in the write queue while the receiver window is the limiting factor. One can notice that there's a loophole in the tcp_mss_split_point that lacked a receiver window check for the tcp_write_queue_tail() if also cwnd was smaller than the full skb. Noticed by Thomas Gleixner <tglx@linutronix.de> in form of "Treason uncloaked! Peer ... shrinks window .... Repaired." messages (the peer didn't actually shrink its window as the message suggests, we had just sent something past it without a permission to do so). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Tested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-11 17:55:27 -07:00
David S. Miller	db8dac20d5	[UDP]: Revert udplite and code split. This reverts commit `db1ed684f6` ("[IPV6] UDP: Rename IPv6 UDP files."), commit `8be8af8fa4` ("[IPV4] UDP: Move IPv4-specific bits to other file.") and commit `e898d4db27` ("[UDP]: Allow users to configure UDP-Lite."). First, udplite is of such small cost, and it is a core protocol just like TCP and normal UDP are. We spent enormous amounts of effort to make udplite share as much code with core UDP as possible. All of that work is less valuable if we're just going to slap a config option on udplite support. It is also causing build failures, as reported on linux-next, showing that the changeset was not tested very well. In fact, this is the second build failure resulting from the udplite change. Finally, the config options provided was a bool, instead of a modular option. Meaning the udplite code does not even get build tested by allmodconfig builds, and furthermore the user is not presented with a reasonable modular build option which is particularly needed by distribution vendors. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-06 16:22:02 -08:00
Harvey Harrison	0dc47877a3	net: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-05 20:47:47 -08:00
Eric Dumazet	ee6b967301	[IPV4]: Add 'rtable' field in struct sk_buff to alias 'dst' and avoid casts (Anonymous) unions can help us to avoid ugly casts. A common cast it the (struct rtable )skb->dst one. Defining an union like : union { struct dst_entry dst; struct rtable *rtable; }; permits to use skb->rtable in place. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-05 18:30:47 -08:00
David S. Miller	255333c1db	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/rc80211_pid_algo.c	2008-03-05 12:26:41 -08:00
Stephen Hemminger	dea75bdfa5	[IPCONFIG]: The kernel gets no IP from some DHCP servers From: Stephen Hemminger <shemminger@linux-foundation.org> Based upon a patch by Marcel Wappler: This patch fixes a DHCP issue of the kernel: some DHCP servers (i.e. in the Linksys WRT54Gv5) are very strict about the contents of the DHCPDISCOVER packet they receive from clients. Table 5 in RFC2131 page 36 requests the fields 'ciaddr' and 'siaddr' MUST be set to '0'. These DHCP servers ignore Linux kernel's DHCP discovery packets with these two fields set to '255.255.255.255' (in contrast to popular DHCP clients, such as 'dhclient' or 'udhcpc'). This leads to a not booting system. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-04 17:03:49 -08:00
Herbert Xu	ed58dd41f3	[ESP]: Add select on AUTHENC Now the ESP uses the AEAD interface even for algorithms which are not combined mode, we need to select CONFIG_CRYPTO_AUTHENC as otherwise only combined mode algorithms will work. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-04 14:29:21 -08:00
Sangtae Ha	6b3d626321	[TCP]: TCP cubic v2.2 We have updated CUBIC to fix some issues with slow increase in large BDP networks. We also improved its convergence speed. The fix is in fact very simple -- the window increase limit of smax during the window probing phase (i.e., convex growth phase) is removed. We found that this does not affect TCP friendliness, but only improves its scalability. We have run some tests in our lab and also over the Internet path from NCSU to Japan. These results can be seen from the following page: http://netsrv.csc.ncsu.edu/wiki/index.php/Intra_protocol_fairness_testing_with_linux-2.6.23.9 http://netsrv.csc.ncsu.edu/wiki/index.php/RTT_fairness_testing_with_linux-2.6.23.9 http://netsrv.csc.ncsu.edu/wiki/index.php/TCP_friendliness_testing_with_linux-2.6.23.9 Signed-off-by: Sangtae Ha <sha2@ncsu.edu> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-04 14:17:41 -08:00
YOSHIFUJI Hideaki	8be8af8fa4	[IPV4] UDP: Move IPv4-specific bits to other file. Move IPv4-specific UDP bits from net/ipv4/udp.c into (new) net/ipv4/udp_ipv4.c. Rename net/ipv4/udplite.c to net/ipv4/udplite_ipv4.c. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-04 15:18:22 +09:00
YOSHIFUJI Hideaki	e898d4db27	[UDP]: Allow users to configure UDP-Lite. Let's give users an option for disabling UDP-Lite (~4K). old: \| text data bss dec hex filename \| 286498 12432 6072 305002 4a76a net/ipv4/built-in.o \| 193830 8192 3204 205226 321aa net/ipv6/ipv6.o new (without UDP-Lite): \| text data bss dec hex filename \| 284086 12136 5432 301654 49a56 net/ipv4/built-in.o \| 191835 7832 3076 202743 317f7 net/ipv6/ipv6.o Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-04 15:18:22 +09:00
Glenn Griffin	c6aefafb7e	[TCP]: Add IPv6 support to TCP SYN cookies Updated to incorporate Eric's suggestion of using a per cpu buffer rather than allocating on the stack. Just a two line change, but will resend in it's entirety. Signed-off-by: Glenn Griffin <ggriffin.kernel@gmail.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-04 15:18:21 +09:00
Eric Dumazet	11baab7ac3	[TCP]: lower stack usage in cookie_hash() function 400 bytes allocated on stack might be a litle bit too much. Using a per_cpu var is more friendly. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-04 15:18:21 +09:00
Pavel Emelyanov	988b705077	[ARP]: Introduce the arp_hdr_len helper. There are some place, that calculate the ARP header length. These calculations are correct, but a) some operate with "magic" constants, b) enlarge the code length (sometimes at the cost of coding style), c) are not informative from the first glance. The proposal is to introduce a helper, that includes all the good sides of these calculations. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-03 12:20:57 -08:00
Ilpo Järvinen	d152a7d88a	[TCP]: Must count fack_count also when skipping It makes fackets_out to grow too slowly compared with the real write queue. This shouldn't cause those BUG_TRAP(packets <= tp->packets_out) to trigger but how knows how such inconsistent fackets_out affects here and there around TCP when everything is nowadays assuming accurate fackets_out. So lets see if this silences them all. Reported by Guillaume Chazarain <guichaz@gmail.com>. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-03 12:10:16 -08:00
Denis V. Lunev	7cd04fa7e3	[TCP]: Merge exit paths in tcp_v4_conn_request. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-03 11:59:32 -08:00
Denis V. Lunev	da7ef338a2	[IPV4]: skb->dst can't be NULL in ip_options_echo. ip_options_echo is called on the packet input path after the initial routing. The dst entry on the packet is cleared only in the several very specific places and immidiately assigned back (may be new). Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-03 11:50:10 -08:00
Denis V. Lunev	1d1c8d13c4	[ICMP]: Section conflict between icmp_sk_init/icmp_sk_exit. Functions from __exit section should not be called from ones in __init section. Fix this conflict. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 14:15:19 -08:00
Denis V. Lunev	fd80eb942a	[INET]: Remove struct dst_entry *dst from request_sock_ops.rtx_syn_ack. It looks like dst parameter is used in this API due to historical reasons. Actually, it is really used in the direct call to tcp_v4_send_synack only. So, create a wrapper for tcp_v4_send_synack and remove dst from rtx_syn_ack. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:43:03 -08:00
Pavel Emelyanov	665bba1087	[NETFILTER/RXRPC]: Don't use seq_release_private where inappropriate. Some netfilter code and rxrpc one use seq_open() to open a proc file, but seq_release_private to release one. This is harmless, but ambiguous. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:39:17 -08:00
Denis V. Lunev	4a6ad7a141	[NETNS]: Make icmp_sk per namespace. All preparations are done. Now just add a hook to perform an initialization on namespace startup and replace icmp_sk macro with proper inline call. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:19:58 -08:00
Denis V. Lunev	5c8cafd65e	[NETNS]: icmp(v6)_sk should not pin a namespace. So, change icmp(v6)_sk creation/disposal to the scheme used in the netlink for rtnl, i.e. create a socket in the context of the init_net and assign the namespace without getting a referrence later. Also use sk_release_kernel instead of sock_release to properly destroy such sockets. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:19:22 -08:00
Denis V. Lunev	79c9115953	[ICMP]: Allocate data for __icmp(v6)_sk dynamically. Own __icmp(v6)_sk should be present in each namespace. So, it should be allocated dynamically. Though, alloc_percpu does not fit the case as it implies additional dereferrence for no bonus. Allocate data for pointers just like __percpu_alloc_mask does and place pointers to struct sock into this array. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:17:11 -08:00
Denis V. Lunev	405666db84	[ICMP]: Pass proper ICMP socket into icmp(v6)_xmit_(un)lock. We have to get socket lock inside icmp(v6)_xmit_lock/unlock. The socket is get from global variable now. When this code became namespaces, one should pass a namespace and get socket from it. Though, above is useless. Socket is available in the caller, just pass it inside. This saves a bit of code now and saves more later. add/remove: 0/0 grow/shrink: 1/3 up/down: 1/-169 (-168) function old new delta icmp_rcv 718 719 +1 icmpv6_rcv 2343 2303 -40 icmp_send 1566 1518 -48 icmp_reply 549 468 -81 Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:16:46 -08:00
Denis V. Lunev	b7e729c4b4	[ICMP]: Store sock rather than socket for ICMP flow control. Basically, there is no difference, what to store: socket or sock. Though, sock looks better as there will be 1 less dereferrence on the fast path. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:16:08 -08:00
Denis V. Lunev	1e3cf6834e	[ICMP]: Optimize icmp_socket usage. Use this macro only once in a function to save a bit of space. add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-98 (-98) function old new delta icmp_reply 562 561 -1 icmp_push_reply 305 258 -47 icmp_init 273 223 -50 Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:15:42 -08:00
Denis V. Lunev	a5710d6582	[ICMP]: Add return code to icmp_init. icmp_init could fail and this is normal for namespace other than initial. So, the panic should be triggered only on init_net initialization path. Additionally create rollback path for icmp_init as a separate function. It will also be used later during namespace destruction. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:14:50 -08:00
Denis V. Lunev	9b0f976f27	[INET]: Remove struct net_proto_family* from _init calls. struct net_proto_family* is not used in icmp[v6]_init, ndisc_init, igmp_init and tcp_v4_init. Remove it. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:13:15 -08:00
Sangtae Ha	0bc8c7bf9e	[TCP]: BIC web page link is corrected. Signed-off-by: Sangtae Ha <sha2@ncsu.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 22:14:32 -08:00
Denis V. Lunev	c4544c7243	[NETNS]: Process inet_select_addr inside a namespace. The context is available from a network device passed in. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:52:54 -08:00
Denis V. Lunev	3776c8891a	[NETNS]: Enable IPv4 address manipulations inside namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:52:25 -08:00
Denis V. Lunev	1937504dd1	[NETNS]: Enable all routing manipulation via netlink inside namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:52:04 -08:00
Denis V. Lunev	e5b13cb10d	[NETNS]: Process devinet ioctl in the correct namespace. Add namespace parameter to devinet_ioctl and locate device inside it for state changes. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:51:43 -08:00
Denis V. Lunev	73b3871165	[NETNS]: Register /proc/net/rt_cache for each namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:51:18 -08:00
Denis V. Lunev	a75e936f2f	[NETNS]: Process /proc/net/rt_cache inside a namespace. Show routing cache for a particular namespace only. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:50:55 -08:00
Denis V. Lunev	642d631811	[IPV4]: rt_cache_get_next should take rt_genid into account. In the other case /proc/net/rt_cache will look inconsistent in respect to genid. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:50:33 -08:00
Denis V. Lunev	317805b8f8	[NETNS]: Process ip_rt_redirect in the correct namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:50:06 -08:00
Denis V. Lunev	be162d6288	[NETNS]: Enable inetdev_event notifier. After all these preparations it is time to enable main IPv4 device initialization routine inside namespace. It is safe do this now. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:49:13 -08:00
Denis V. Lunev	2430aa85de	[NETNS]: Disable multicaststing configuration inside non-initial namespace. Do not calls hooks from device notifiers and disallow configuration from ioctl/netlink layer. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:48:49 -08:00
Denis V. Lunev	6fc68624e5	[NETFILTER]: Consolidate masq_inet_event and masq_device_event. They do exactly the same job. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 20:45:41 -08:00
Wang Chen	770207208e	[IPV4]: Use proc_create() to setup ->proc_fops first Use proc_create() to make sure that ->proc_fops be setup before gluing PDE to main tree. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 14:14:25 -08:00
Herbert Xu	21e43188f2	[IPCOMP]: Disable BH on output when using shared tfm Because we use shared tfm objects in order to conserve memory, (each tfm requires 128K of vmalloc memory), BH needs to be turned off on output as that can occur in process context. Previously this was done implicitly by the xfrm output code. That was lost when it became lockless. So we need to add the BH disabling to IPComp directly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-28 11:23:17 -08:00
Pavel Emelyanov	b37d428b24	[INET]: Don't create tunnels with '%' in name. Four tunnel drivers (ip_gre, ipip, ip6_tunnel and sit) can receive a pre-defined name for a device from the userspace. Since these drivers call the register_netdevice() (rtnl_lock, is held), which does _not_ generate the device's name, this name may contain a '%' character. Not sure how bad is this to have a device with a '%' in its name, but all the other places either use the register_netdev(), which call the dev_alloc_name(), or explicitly call the dev_alloc_name() before registering, i.e. do not allow for such names. This had to be prior to the commit 34cc7b, but I forgot to number the patches and this one got lost, sorry. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-26 23:51:04 -08:00
Bjorn Mork	148f97292e	[IPV4]: Reset scope when changing address This bug did bite at least one user, who did have to resort to rebooting the system after an "ifconfig eth0 127.0.0.1" typo. Deleting the address and adding a new is a less intrusive workaround. But I still beleive this is a bug that should be fixed. Some way or another. Another possibility would be to remove the scope mangling based on address. This will always be incomplete (are 127/8 the only address space with host scope requirements?) We set the scope to RT_SCOPE_HOST if an IPv4 interface is configured with a loopback address (127/8). The scope is never reset, and will remain set to RT_SCOPE_HOST after changing the address. This patch resets the scope if the address is changed again, to restore normal functionality. Signed-off-by: Bjorn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-26 18:42:41 -08:00
Pavel Emelyanov	34cc7ba639	[IP_TUNNEL]: Don't limit the number of tunnels with generic name explicitly. Use the added dev_alloc_name() call to create tunnel device name, rather than iterate in a hand-made loop with an artificial limit. Thanks Patrick for noticing this. [ The way this works is, when the device is actually registered, the generic code noticed the '%' in the name and invokes dev_alloc_name() to fully resolve the name. -DaveM ] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-23 20:19:20 -08:00
Joonwoo Park	eb1197bc0e	[NETFILTER]: Fix incorrect use of skb_make_writable http://bugzilla.kernel.org/show_bug.cgi?id=9920 The function skb_make_writable returns true or false. Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-19 17:18:47 -08:00
Patrick McHardy	e2b58a67b9	[NETFILTER]: {ip,ip6,nfnetlink}_queue: fix SKB_LINEAR_ASSERT when mangling packet data As reported by Tomas Simonaitis <tomas.simonaitis@gmail.com>, inserting new data in skbs queued over {ip,ip6,nfnetlink}_queue triggers a SKB_LINEAR_ASSERT in skb_put(). Going back through the git history, it seems this bug is present since at least 2.6.12-rc2, probably even since the removal of skb_linearize() for netfilter. Linearize non-linear skbs through skb_copy_expand() when enlarging them. Tested by Thomas, fixes bugzilla #9933. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-19 17:17:52 -08:00
Adrian Bunk	94cb1503c7	ipv4/fib_hash.c: fix NULL dereference Unless I miss a guaranteed relation between between "f" and "new_fa->fa_info" this patch is required for fixing a NULL dereference introduced by commit `a6501e080c` ("[IPV4] FIB_HASH: Reduce memory needs and speedup lookups") and spotted by the Coverity checker. Eric Dumazet says: Hum, you are right, kmem_cache_free() doesnt allow a NULL object, like kfree() does. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-19 16:28:54 -08:00
Kris Katterjohn	9bf1d83e7e	[TCP]: Fix tcp_v4_send_synack() comment Signed-off-by: Kris Katterjohn <katterjohn@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-17 22:29:19 -08:00
Uwe Kleine-Koenig	9c00409a2a	[IPV4]: fix alignment of IP-Config output Make the indented lines aligned in the output (not in the code). Signed-off-by: Uwe Kleine-Koenig <Uwe.Kleine-Koenig@digi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-17 22:28:32 -08:00
David S. Miller	9ff5660746	Revert "[NDISC]: Fix race in generic address resolution" This reverts commit `69cc64d8d9`. It causes recursive locking in IPV6 because unlike other neighbour layer clients, it even needs neighbour cache entries to send neighbour soliciation messages :-( We'll have to find another way to fix this race. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-17 18:39:54 -08:00
Adrian Bunk	324b57619b	[INET]: Unexport inet_listen_wlock This patch removes the no longer used EXPORT_SYMBOL(inet_listen_wlock). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-13 17:40:25 -08:00
Adrian Bunk	74da4d34e4	[INET]: Unexport __inet_hash_connect This patch removes the unused EXPORT_SYMBOL_GPL(__inet_hash_connect). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-13 17:39:34 -08:00
Herbert Xu	b318e0e4ef	[IPSEC]: Fix bogus usage of u64 on input sequence number Al Viro spotted a bogus use of u64 on the input sequence number which is big-endian. This patch fixes it by giving the input sequence number its own member in the xfrm_skb_cb structure. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-12 22:50:35 -08:00
David S. Miller	69cc64d8d9	[NDISC]: Fix race in generic address resolution Frank Blaschka provided the bug report and the initial suggested fix for this bug. He also validated this version of this fix. The problem is that the access to neigh->arp_queue is inconsistent, we grab references when dropping the lock lock to call neigh->ops->solicit() but this does not prevent other threads of control from trying to send out that packet at the same time causing corruptions because both code paths believe they have exclusive access to the skb. The best option seems to be to hold the write lock on neigh->lock during the ->solicit() call. I looked at all of the ndisc_ops implementations and this seems workable. The only case that needs special care is the IPV4 ARP implementation of arp_solicit(). It wants to take neigh->lock as a reader to protect the header entry in neigh->ha during the emission of the soliciation. We can simply remove the read lock calls to take care of that since holding the lock as a writer at the caller providers a superset of the protection afforded by the existing read locking. The rest of the ->solicit() implementations don't care whether the neigh is locked or not. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-12 17:54:17 -08:00
Stephen Hemminger	8315f5d80a	fib_trie: /proc/net/route performance improvement Use key/offset caching to change /proc/net/route (use by iputils route) from O(n^2) to O(n). This improves performance from 30sec with 160,000 routes to 1sec. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-12 17:53:31 -08:00
Stephen Hemminger	ec28cf738d	fib_trie: handle empty tree This fixes possible problems when trie_firstleaf() returns NULL to trie_leafindex(). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-12 17:53:30 -08:00
David S. Miller	e4f8b5d4ed	[IPV4]: Remove IP_TOS setting privilege checks. Various RFCs have all sorts of things to say about the CS field of the DSCP value. In particular they try to make the distinction between values that should be used by "user applications" and things like routing daemons. This seems to have influenced the CAP_NET_ADMIN check which exists for IP_TOS socket option settings, but in fact it has an off-by-one error so it wasn't allowing CS5 which is meant for "user applications" as well. Further adding to the inconsistency and brokenness here, IPV6 does not validate the DSCP values specified for the IPV6_TCLASS socket option. The real actual uses of these TOS values are system specific in the final analysis, and these RFC recommendations are just that, "a recommendation". In fact the standards very purposefully use "SHOULD" and "SHOULD NOT" when describing how these values can be used. In the final analysis the only clean way to provide consistency here is to remove the CAP_NET_ADMIN check. The alternatives just don't work out: 1) If we add the CAP_NET_ADMIN check to ipv6, this can break existing setups. 2) If we just fix the off-by-one error in the class comparison in IPV4, certain DSCP values can be used in IPV6 but not IPV4 by default. So people will just ask for a sysctl asking to override that. I checked several other freely available kernel trees and they do not make any privilege checks in this area like we do. For the BSD stacks, this goes back all the way to Stevens Volume 2 and beyond. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-12 17:53:29 -08:00
Denis V. Lunev	cd557bc1c1	[IGMP]: Optimize kfree_skb in igmp_rcv. Merge error paths inside igmp_rcv. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-09 23:22:26 -08:00
Patrick McHardy	4136cd523e	[IPV4]: route: fix crash ip_route_input ip_route_me_harder() may call ip_route_input() with skbs that don't have skb->dev set for skbs rerouted in LOCAL_OUT and TCP resets generated by the REJECT target, resulting in a crash when dereferencing skb->dev->nd_net. Since ip_route_input() has an input device argument, it seems correct to use that one anyway. Bug introduced in `b5921910a1` (Routing cache virtualization). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-07 17:58:20 -08:00
Patrick McHardy	86577c661b	[NETFILTER]: nf_conntrack: fix ct_extend ->move operation The ->move operation has two bugs: - It is called with the same extension as source and destination, so it doesn't update the new extension. - The address of the old extension is calculated incorrectly, instead of (void *)ct->ext + ct->ext->offset[i] it uses ct->ext + ct->ext->offset[i]. Fixes a crash on x86_64 reported by Chuck Ebbert <cebbert@redhat.com> and Thomas Woerner <twoerner@redhat.com>. Tested-by: Thomas Woerner <twoerner@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-07 17:56:34 -08:00
Sven Wegener	9c1ca6e68a	ipvs: Make wrr "no available servers" error message rate-limited No available servers is more an error message than something informational. It should also be rate-limited, else we're going to flood our logs on a busy director, if all real servers are out of order with a weight of zero. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-05 20:00:10 -08:00
Linus Torvalds	3d412f60b7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits) [PKT_SCHED]: vlan tag match [NET]: Add if_addrlabel.h to sanitized headers. [NET] rtnetlink.c: remove no longer used functions [ICMP]: Restore pskb_pull calls in receive function [INET]: Fix accidentally broken inet(6)_hash_connect's port offset calculations. [NET]: Remove further references to net-modules.txt bluetooth rfcomm tty: destroy before tty_close() bluetooth: blacklist another Broadcom BCM2035 device drivers/bluetooth/btsdio.c: fix double-free drivers/bluetooth/bpa10x.c: fix memleak bluetooth: uninlining bluetooth: hidp_process_hid_control remove unnecessary parameter dealing tun: impossible to deassert IFF_ONE_QUEUE or IFF_NO_PI hamradio: fix dmascc section mismatch [SCTP]: Fix kernel panic while received AUTH chunk with BAD shared key identifier [SCTP]: Fix kernel panic while received AUTH chunk while enabled auth [IPV4]: Formatting fix for /proc/net/fib_trie. [IPV6]: Fix sysctl compilation error. [NET_SCHED]: Add #ifdef CONFIG_NET_EMATCH in net/sched/cls_flow.c (latest git broken build) [IPV4]: Fix compile error building without CONFIG_FS_PROC ...	2008-02-05 10:09:07 -08:00
Paul Moore	eda61d32e8	NetLabel: introduce a new kernel configuration API for NetLabel Add a new set of configuration functions to the NetLabel/LSM API so that LSMs can perform their own configuration of the NetLabel subsystem without relying on assistance from userspace. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Reviewed-by: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-05 09:44:20 -08:00
Herbert Xu	8cf229437f	[ICMP]: Restore pskb_pull calls in receive function Somewhere along the development of my ICMP relookup patch the header length check went AWOL on the non-IPsec path. This patch restores the check. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-05 03:15:50 -08:00
Pavel Emelyanov	5d8c0aa943	[INET]: Fix accidentally broken inet(6)_hash_connect's port offset calculations. The port offset calculations depend on the protocol family, but, as Adrian noticed, I broke this logic with the commit `5ee31fc1ec` [INET]: Consolidate inet(6)_hash_connect. Return this logic back, by passing the port offset directly into the consolidated function. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Noticed-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-05 03:14:44 -08:00
Denis V. Lunev	b9c4d82a85	[IPV4]: Formatting fix for /proc/net/fib_trie. The line in the /proc/net/fib_trie for route with TOS specified - has extra \n at the end - does not have a space after route scope like below. \|-- 1.1.1.1 /32 universe UNICASTtos =1 Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-05 02:58:45 -08:00
Adrian Bunk	322c8a3c36	[IPSEC] xfrm4_beet_input(): fix an if() A bug every C programmer makes at some point in time... Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-05 02:51:39 -08:00
Arnaldo Carvalho de Melo	ab1e0a13d7	[SOCK] proto: Add hashinfo member to struct proto This way we can remove TCP and DCCP specific versions of sk->sk_prot->get_port: both v4 and v6 use inet_csk_get_port sk->sk_prot->hash: inet_hash is directly used, only v6 need a specific version to deal with mapped sockets sk->sk_prot->unhash: both v4 and v6 use inet_hash directly struct inet_connection_sock_af_ops also gets a new member, bind_conflict, so that inet_csk_get_port can find the per family routine. Now only the lookup routines receive as a parameter a struct inet_hashtable. With this we further reuse code, reducing the difference among INET transport protocols. Eventually work has to be done on UDP and SCTP to make them share this infrastructure and get as a bonus inet_diag interfaces so that iproute can be used with these protocols. net-2.6/net/ipv4/inet_hashtables.c: struct proto \| +8 struct inet_connection_sock_af_ops \| +8 2 structs changed __inet_hash_nolisten \| +18 __inet_hash \| -210 inet_put_port \| +8 inet_bind_bucket_create \| +1 __inet_hash_connect \| -8 5 functions changed, 27 bytes added, 218 bytes removed, diff: -191 net-2.6/net/core/sock.c: proto_seq_show \| +3 1 function changed, 3 bytes added, diff: +3 net-2.6/net/ipv4/inet_connection_sock.c: inet_csk_get_port \| +15 1 function changed, 15 bytes added, diff: +15 net-2.6/net/ipv4/tcp.c: tcp_set_state \| -7 1 function changed, 7 bytes removed, diff: -7 net-2.6/net/ipv4/tcp_ipv4.c: tcp_v4_get_port \| -31 tcp_v4_hash \| -48 tcp_v4_destroy_sock \| -7 tcp_v4_syn_recv_sock \| -2 tcp_unhash \| -179 5 functions changed, 267 bytes removed, diff: -267 net-2.6/net/ipv6/inet6_hashtables.c: __inet6_hash \| +8 1 function changed, 8 bytes added, diff: +8 net-2.6/net/ipv4/inet_hashtables.c: inet_unhash \| +190 inet_hash \| +242 2 functions changed, 432 bytes added, diff: +432 vmlinux: 16 functions changed, 485 bytes added, 492 bytes removed, diff: -7 /home/acme/git/net-2.6/net/ipv6/tcp_ipv6.c: tcp_v6_get_port \| -31 tcp_v6_hash \| -7 tcp_v6_syn_recv_sock \| -9 3 functions changed, 47 bytes removed, diff: -47 /home/acme/git/net-2.6/net/dccp/proto.c: dccp_destroy_sock \| -7 dccp_unhash \| -179 dccp_hash \| -49 dccp_set_state \| -7 dccp_done \| +1 5 functions changed, 1 bytes added, 242 bytes removed, diff: -241 /home/acme/git/net-2.6/net/dccp/ipv4.c: dccp_v4_get_port \| -31 dccp_v4_request_recv_sock \| -2 2 functions changed, 33 bytes removed, diff: -33 /home/acme/git/net-2.6/net/dccp/ipv6.c: dccp_v6_get_port \| -31 dccp_v6_hash \| -7 dccp_v6_request_recv_sock \| +5 3 functions changed, 5 bytes added, 38 bytes removed, diff: -33 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-03 04:28:52 -08:00
Denis V. Lunev	4814bdbd59	[NETNS]: Lookup in FIB semantic hashes taking into account the namespace. The namespace is not available in the fib_sync_down_addr, add it as a parameter. Looking up a device by the pointer to it is OK. Looking up using a result from fib_trie/fib_hash table lookup is also safe. No need to fix that at all. So, just fix lookup by address and insertion to the hash table path. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:41 -08:00
Denis V. Lunev	7462bd744e	[NETNS]: Add a namespace mark to fib_info. This is required to make fib_info lookups namespace aware. In the other case initial namespace devices are marked as dead in the local routing table during other namespace stop. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:40 -08:00
Denis V. Lunev	85326fa54b	[IPV4]: fib_sync_down rework. fib_sync_down can be called with an address and with a device. In reality it is called either with address OR with a device. The codepath inside is completely different, so lets separate it into two calls for these two cases. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:39 -08:00
Denis V. Lunev	4b8aa9abee	[NETNS]: Process interface address manipulation routines in the namespace. The namespace is available when required except rtm_to_ifaddr. Add namespace argument to it. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:39 -08:00
Denis V. Lunev	7b2185747c	[IPV4]: Small style cleanup of the error path in rtm_to_ifaddr. Remove error code assignment inside brackets on failure. The code looks better if the error is assigned before condition check. Also, the compiler treats this better. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:38 -08:00
Denis V. Lunev	dce5cbeec3	[IPV4]: Fix memory leak on error path during FIB initialization. net->ipv4.fib_table_hash is not freed when fib4_rules_init failed. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:37 -08:00
Adrian Bunk	30a50cc566	[TCP]: Unexport sysctl_tcp_tso_win_divisor This patch removes the no longer used EXPORT_SYMBOL(sysctl_tcp_tso_win_divisor). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:32 -08:00
Adrian Bunk	0027ba8434	[IPV4]: Make struct ipv4_devconf static. struct ipv4_devconf can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:31 -08:00
Eric Dumazet	29e75252da	[IPV4] route cache: Introduce rt_genid for smooth cache invalidation Current ip route cache implementation is not suited to large caches. We can consume a lot of CPU when cache must be invalidated, since we currently need to evict all cache entries, and this eviction is sometimes asynchronous. min_delay & max_delay can somewhat control this asynchronism behavior, but whole thing is a kludge, regularly triggering infamous soft lockup messages. When entries are still in use, this also consumes a lot of ram, filling dst_garbage.list. A better scheme is to use a generation identifier on each entry, so that cache invalidation can be performed by changing the table identifier, without having to scan all entries. No more delayed flushing, no more stalling when secret_interval expires. Invalidated entries will then be freed at GC time (controled by ip_rt_gc_timeout or stress), or when an invalidated entry is found in a chain when an insert is done. Thus we keep a normal equilibrium. This patch : - renames rt_hash_rnd to rt_genid (and makes it an atomic_t) - Adds a new rt_genid field to 'struct rtable' (filling a hole on 64bit) - Checks entry->rt_genid at appropriate places :	2008-01-31 19:28:27 -08:00
Shan Wei	16ca3f9130	[TCP]: Fix a bug in strategy_allowed_congestion_control In strategy_allowed_congestion_control of the 2.6.24 kernel, when sysctl_string return 1 on success,it should call tcp_set_allowed_congestion_control to set the allowed congestion control.But, it don't. the sysctl_string return 1 on success, otherwise return negative, never return 0.The patch fix the problem. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:23 -08:00
Stephen Hemminger	71d67e666e	[IPV4] fib_trie: rescan if key is lost during dump Normally during a dump the key of the last dumped entry is used for continuation, but since lock is dropped it might be lost. In that case fallback to the old counter based N^2 behaviour. This means the dump will end up skipping some routes which matches what FIB_HASH does. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:23 -08:00
Pavel Emelyanov	fa4d3c6210	[NETNS]: Udp sockets per-net lookup. Add the net parameter to udp_get_port family of calls and udp_lookup one and use it to filter sockets. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:21 -08:00
Pavel Emelyanov	d86e0dac2c	[NETNS]: Tcp-v6 sockets per-net lookup. Add a net argument to inet6_lookup and propagate it further. Actually, this is tcp-v6 implementation of what was done for tcp-v4 sockets in a previous patch. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:20 -08:00
Pavel Emelyanov	c67499c0e7	[NETNS]: Tcp-v4 sockets per-net lookup. Add a net argument to inet_lookup and propagate it further into lookup calls. Plus tune the __inet_check_established. The dccp and inet_diag, which use that lookup functions pass the init_net into them. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:19 -08:00
Pavel Emelyanov	941b1d22cc	[NETNS]: Make bind buckets live in net namespaces. This tags the inet_bind_bucket struct with net pointer, initializes it during creation and makes a filtering during lookup. A better hashfn, that takes the net into account is to be done in the future, but currently all bind buckets with similar port will be in one hash chain. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:18 -08:00
Pavel Emelyanov	5ee31fc1ec	[INET]: Consolidate inet(6)_hash_connect. These two functions are the same except for what they call to "check_established" and "hash" for a socket. This saves half-a-kilo for ipv4 and ipv6. add/remove: 1/0 grow/shrink: 1/4 up/down: 582/-1128 (-546) function old new delta __inet_hash_connect - 577 +577 arp_ignore 108 113 +5 static.hint 8 4 -4 rt_worker_func 376 372 -4 inet6_hash_connect 584 25 -559 inet_hash_connect 586 25 -561 Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:17 -08:00
Patrick McHardy	969d71089f	[NETFILTER]: nf_nat: fix sparse warning Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:15 -08:00
Patrick McHardy	c392a74018	[NETFILTER]: {ip,ip6}_queue: fix build error Reported by Ingo Molnar: net/built-in.o: In function `ip_queue_init': ip_queue.c:(.init.text+0x322c): undefined reference to `net_ipv4_ctl_path' Fix the build error and also handle CONFIG_PROC_FS=n properly. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:14 -08:00
Jan Engelhardt	32948588ac	[NETFILTER]: nf_conntrack: annotate l3protos with const Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:13 -08:00
Jan Engelhardt	7cc3864d39	[NETFILTER]: nf_{conntrack,nat}_icmp: constify and annotate Constify a few data tables use const qualifiers on variables where possible in the nf_conntrack_icmp* sources. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:12 -08:00
Jan Engelhardt	dc35dc5a4c	[NETFILTER]: nf_{conntrack,nat}_proto_gre: annotate with const Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:12 -08:00
Jan Engelhardt	da3f13c95a	[NETFILTER]: nf_{conntrack,nat}_proto_udp{,lite}: annotate with const Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:11 -08:00
Jan Engelhardt	82f568fc2f	[NETFILTER]: nf_{conntrack,nat}_proto_tcp: constify and annotate TCP modules Constify a few data tables use const qualifiers on variables where possible in the nf_*_proto_tcp sources. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:10 -08:00
Jan Engelhardt	9ddd0ed050	[NETFILTER]: nf_{conntrack,nat}_pptp: annotate PPtP helper with const Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:09 -08:00
Jan Engelhardt	de24b4ebb8	[NETFILTER]: nf_{conntrack,nat}_tftp: annotate TFTP helper with const Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:08 -08:00
Jan Engelhardt	13f7d63c29	[NETFILTER]: nf_{conntrack,nat}_sip: annotate SIP helper with const Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:08 -08:00
Jan Engelhardt	905e3e8ec5	[NETFILTER]: nf_conntrack_h323: constify and annotate H.323 helper Constify data tables (predominantly in nf_conntrack_h323_types.c, but also a few in nf_conntrack_h323_asn1.c) and use const qualifiers on variables where possible in the h323 sources. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:07 -08:00
Alexey Dobriyan	3cb609d57c	[NETFILTER]: x_tables: create per-netns /proc/net/_tables_ Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:06 -08:00
Ilpo Järvinen	a38201e3c9	[NETFILTER]: ipt_CLUSTERIP: kill clusterip_config_entry_get It's unused static inline. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:02 -08:00
Patrick McHardy	02502f6224	[NETFILTER]: nf_nat: switch rwlock to spinlock Since we're using RCU, all users of nf_nat_lock take a write_lock. Switch it to a spinlock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:00 -08:00
Patrick McHardy	4d354c5782	[NETFILTER]: nf_nat: use RCU for bysource hash Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:00 -08:00
Patrick McHardy	c88130bcd5	[NETFILTER]: nf_conntrack: naming unification Rename all "conntrack" variables to "ct" for more consistency and avoiding some overly long lines. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:59 -08:00
Patrick McHardy	76507f69c4	[NETFILTER]: nf_conntrack: use RCU for conntrack hash Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:54 -08:00
Patrick McHardy	7d0742da1c	[NETFILTER]: nf_conntrack_expect: use RCU for expectation hash Use RCU for expectation hash. This doesn't buy much for conntrack runtime performance, but allows to reduce the use of nf_conntrack_lock for /proc and nf_netlink_conntrack. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:53 -08:00
Patrick McHardy	b0a6363c24	[NETFILTER]: {ip,arp,ip6}_tables: fix sparse warnings in compat code CHECK net/ipv4/netfilter/ip_tables.c net/ipv4/netfilter/ip_tables.c:1453:8: warning: incorrect type in argument 3 (different signedness) net/ipv4/netfilter/ip_tables.c:1453:8: expected int size net/ipv4/netfilter/ip_tables.c:1453:8: got unsigned int [usertype] size net/ipv4/netfilter/ip_tables.c:1458:44: warning: incorrect type in argument 3 (different signedness) net/ipv4/netfilter/ip_tables.c:1458:44: expected int size net/ipv4/netfilter/ip_tables.c:1458:44: got unsigned int [usertype] size net/ipv4/netfilter/ip_tables.c:1603:2: warning: incorrect type in argument 2 (different signedness) net/ipv4/netfilter/ip_tables.c:1603:2: expected unsigned int i net/ipv4/netfilter/ip_tables.c:1603:2: got int <noident> net/ipv4/netfilter/ip_tables.c:1627:8: warning: incorrect type in argument 3 (different signedness) net/ipv4/netfilter/ip_tables.c:1627:8: expected int size net/ipv4/netfilter/ip_tables.c:1627:8: got unsigned int size net/ipv4/netfilter/ip_tables.c:1634:40: warning: incorrect type in argument 3 (different signedness) net/ipv4/netfilter/ip_tables.c:1634:40: expected int size net/ipv4/netfilter/ip_tables.c:1634:40: got unsigned int size net/ipv4/netfilter/ip_tables.c:1653:8: warning: incorrect type in argument 5 (different signedness) net/ipv4/netfilter/ip_tables.c:1653:8: expected unsigned int i net/ipv4/netfilter/ip_tables.c:1653:8: got int <noident> net/ipv4/netfilter/ip_tables.c:1666:2: warning: incorrect type in argument 2 (different signedness) net/ipv4/netfilter/ip_tables.c:1666:2: expected unsigned int i net/ipv4/netfilter/ip_tables.c:1666:2: got int <noident> CHECK net/ipv4/netfilter/arp_tables.c net/ipv4/netfilter/arp_tables.c:1285:40: warning: incorrect type in argument 3 (different signedness) net/ipv4/netfilter/arp_tables.c:1285:40: expected int size net/ipv4/netfilter/arp_tables.c:1285:40: got unsigned int size net/ipv4/netfilter/arp_tables.c:1543:44: warning: incorrect type in argument 3 (different signedness) net/ipv4/netfilter/arp_tables.c:1543:44: expected int size net/ipv4/netfilter/arp_tables.c:1543:44: got unsigned int [usertype] size CHECK net/ipv6/netfilter/ip6_tables.c net/ipv6/netfilter/ip6_tables.c:1481:8: warning: incorrect type in argument 3 (different signedness) net/ipv6/netfilter/ip6_tables.c:1481:8: expected int size net/ipv6/netfilter/ip6_tables.c:1481:8: got unsigned int [usertype] size net/ipv6/netfilter/ip6_tables.c:1486:44: warning: incorrect type in argument 3 (different signedness) net/ipv6/netfilter/ip6_tables.c:1486:44: expected int size net/ipv6/netfilter/ip6_tables.c:1486:44: got unsigned int [usertype] size net/ipv6/netfilter/ip6_tables.c:1631:2: warning: incorrect type in argument 2 (different signedness) net/ipv6/netfilter/ip6_tables.c:1631:2: expected unsigned int i net/ipv6/netfilter/ip6_tables.c:1631:2: got int <noident> net/ipv6/netfilter/ip6_tables.c:1655:8: warning: incorrect type in argument 3 (different signedness) net/ipv6/netfilter/ip6_tables.c:1655:8: expected int size net/ipv6/netfilter/ip6_tables.c:1655:8: got unsigned int size net/ipv6/netfilter/ip6_tables.c:1662:40: warning: incorrect type in argument 3 (different signedness) net/ipv6/netfilter/ip6_tables.c:1662:40: expected int size net/ipv6/netfilter/ip6_tables.c:1662:40: got unsigned int size net/ipv6/netfilter/ip6_tables.c:1680:8: warning: incorrect type in argument 5 (different signedness) net/ipv6/netfilter/ip6_tables.c:1680:8: expected unsigned int i net/ipv6/netfilter/ip6_tables.c:1680:8: got int <noident> net/ipv6/netfilter/ip6_tables.c:1693:2: warning: incorrect type in argument 2 (different signedness) net/ipv6/netfilter/ip6_tables.c:1693:2: expected unsigned int i net/ipv6/netfilter/ip6_tables.c:1693:2: got int <noident> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:49 -08:00
Patrick McHardy	855304af29	[NETFILTER]: ipt_recent: fix sparse warnings net/ipv4/netfilter/ipt_recent.c:215:17: warning: symbol 't' shadows an earlier one net/ipv4/netfilter/ipt_recent.c:179:22: originally declared here net/ipv4/netfilter/ipt_recent.c:322:13: warning: context imbalance in 'recent_seq_start' - wrong count at exit net/ipv4/netfilter/ipt_recent.c:354:13: warning: context imbalance in 'recent_seq_stop' - unexpected unlock Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:48 -08:00
Stephen Hemminger	f4f6fb714f	[NETFILTER]: more sparse fixes Some lock annotations, and make initializers static. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:46 -08:00
Stephen Hemminger	06aa10728e	[NETFILTER]: nf_nat_snmp: sparse warning Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:44 -08:00
Alexey Dobriyan	df200969b1	[NETFILTER]: netns: put table module on netns stop When number of entries exceeds number of initial entries, foo-tables code will pin table module. But during table unregister on netns stop, that additional pin was forgotten. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:41 -08:00
Alexey Dobriyan	9ea0cb2601	[NETFILTER]: arp_tables: per-netns arp_tables FILTER Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:41 -08:00
Alexey Dobriyan	79df341ab6	[NETFILTER]: arp_tables: netns preparation * Propagate netns from userspace. * arpt_register_table() registers table in supplied netns. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:40 -08:00
Alexey Dobriyan	9335f047fe	[NETFILTER]: ip_tables: per-netns FILTER, MANGLE, RAW Now, iptables show and configure different set of rules in different netnss'. Filtering decisions are still made by consulting only init_net's set. Changes are identical except naming so no splitting. P.S.: one need to remove init_net checks in nf_sockopt.c and inet_create() to see the effect. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:38 -08:00
Alexey Dobriyan	34bd137ba7	[NETFILTER]: ip_tables: propagate netns from userspace .. all the way down to table searching functions. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:37 -08:00
Alexey Dobriyan	44d34e721e	[NETFILTER]: x_tables: return new table from {arp,ip,ip6}t_register_table() Typical table module registers xt_table structure (i.e. packet_filter) and link it to list during it. We can't use one template for it because corresponding list_head will become corrupted. We also can't unregister with template because it wasn't changed at all and thus doesn't know in which list it is. So, we duplicate template at the very first step of table registration. Table modules will save it for use during unregistration time and actual filtering. Do it at once to not screw bisection. P.S.: renaming i.e. packet_filter => __packet_filter is temporary until full netnsization of table modules is done. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:36 -08:00
Alexey Dobriyan	8d87005207	[NETFILTER]: x_tables: per-netns xt_tables In fact all we want is per-netns set of rules, however doing that will unnecessary complicate routines such as ipt_hook()/ipt_do_table, so make full xt_table array per-netns. Every user stubbed with init_net for a while. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:35 -08:00
Alexey Dobriyan	a98da11d88	[NETFILTER]: x_tables: change xt_table_register() return value convention Switch from 0/-E to ptr/PTR_ERR convention. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:35 -08:00
Jan Engelhardt	abfdf1c489	[NETFILTER]: ebtables: remove casts, use consts Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:33 -08:00
Patrick McHardy	d44caf88e8	[NETFILTER]: nf_nat: remove double bysource hash initialization The hash table is already initialized by nf_ct_alloc_hashtable(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:28 -08:00
Jan Engelhardt	ecb6f85e11	[NETFILTER]: Use const in struct xt_match, xt_target, xt_table Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:28 -08:00
Denis V. Lunev	3046d76746	[RAW]: Wrong content of the /proc/net/raw6. The address of IPv6 raw sockets was shown in the wrong format, from IPv4 ones. The problem has been introduced by the commit `42a73808ed` ("[RAW]: Consolidate proc interface.") Thanks to Adrian Bunk who originally noticed the problem. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:26 -08:00
Denis V. Lunev	8cd850efa4	[RAW]: Cleanup IPv4 raw_seq_show. There is no need to use 128 bytes on the stack at all. Clean the code in the IPv6 style. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:25 -08:00
Denis V. Lunev	377cf82d66	[RAW]: Family check in the /proc/net/raw[6] is extra. Different hashtables are used for IPv6 and IPv4 raw sockets, so no need to check the socket family in the iterator over hashtables. Clean this out. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:24 -08:00
Herbert Xu	b1641064a3	[IPCOMP]: Fix reception of incompressible packets I made a silly typo by entering IPPROTO_IP (== 0) instead of IPPROTO_IPIP (== 4). This broke the reception of incompressible packets. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:24 -08:00
Eric Dumazet	e242297055	[NET]: should explicitely initialize atomic_t field in struct dst_ops All but one struct dst_ops static initializations miss explicit initialization of entries field. As this field is atomic_t, we should use ATOMIC_INIT(0), and not rely on atomic_t implementation. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:23 -08:00
Ilpo Järvinen	ad1984e844	[TCP]: NewReno must count every skb while marking losses NewReno should add cnt per skb (as with FACK) instead of depending on SACKED_ACKED bits which won't be set with it at all. Effectively, NewReno should always exists after the first iteration anyway (or immediately if there's already head in lost_out. This was fixed earlier in net-2.6.25 but got reverted among other stuff and I didn't notice that this is still necessary (actually wasn't even considering this case while trying to figure out the reports because I lived with different kind of code than it in reality was). This should solve the WARN_ONs in TCP code that as a result of this triggered multiple times in every place we check for this invariant. Special thanks to Dave Young <hidave.darkstar@gmail.com> and Krishna Kumar2 <krkumar2@in.ibm.com> for trying with my debug patches. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Tested-by: Dave Young <hidave.darkstar@gmail.com> Tested-by: Krishna Kumar2 <krkumar2@in.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:22 -08:00
Eric Dumazet	533cb5b0a6	[XFRM]: constify 'struct xfrm_type' Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:20 -08:00
Laszlo Attila Toth	4a19ec5800	[NET]: Introducing socket mark socket option. A userspace program may wish to set the mark for each packets its send without using the netfilter MARK target. Changing the mark can be used for mark based routing without netfilter or for packet filtering. It requires CAP_NET_ADMIN capability. Signed-off-by: Laszlo Attila Toth <panther@balabit.hu> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:19 -08:00
Herbert Xu	2614fa59fa	[IPCOMP]: Fetch nexthdr before ipch is destroyed When I moved the nexthdr setting out of IPComp I accidently moved the reading of ipch->nexthdr after the decompression. Unfortunately this means that we'd be reading from a stale ipch pointer which doesn't work very well. This patch moves the reading up so that we get the correct nexthdr value. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:11 -08:00
Julian Anastasov	936f6f8e1b	[IPV4] fib_trie: apply fixes from fib_hash Update fib_trie with some fib_hash fixes: - check for duplicate alternative routes for prefix+tos+priority when replacing route - properly insert by matching tos together with priority - fix alias walking to use list_for_each_entry_continue for insertion and deletion when fa_head is not NULL - copy state from fa to new_fa on replace (not a problem for now) - additionally, avoid replacement without error if new route is same, as Joonwoo Park suggests. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:10 -08:00
Julian Anastasov	c18865f392	[IPV4] fib: fix route replacement, fib_info is shared fib_info can be shared by many route prefixes but we don't want duplicate alternative routes for a prefix+tos+priority. Last change was not correct to check fib_treeref because it accounts usage from other prefixes. Additionally, avoid replacement without error if new route is same, as Joonwoo Park suggests. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:10 -08:00
Arnaldo Carvalho de Melo	8cf8e5a67f	[INET_DIAG]: Fix inet_diag_lock_handler error path. Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=9825 The inet_diag_lock_handler function uses ERR_PTR to encode errors but its callers were testing against NULL. This only happens when the only inet_diag modular user, DCCP, is not built into the kernel or available as a module. Also there was a problem with not dropping the mutex lock when a handler was not found, also fixed in this patch. This caused an OOPS and ss would then hang on subsequent calls, as &inet_diag_table_mutex was being left locked. Thanks to spike at ml.yaroslavl.ru for report it after trying 'ss -d' on a kernel that doesn't have DCCP available. This bug was introduced in cset `d523a328fb` ("Fix inet_diag dead-lock regression"), after 2.6.24-rc3, so just 2.6.24 seems to be affected. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:08 -08:00
Herbert Xu	29ffe1a5c5	[INET]: Prevent out-of-sync truesize on ip_fragment slow path When ip_fragment has to hit the slow path the value of skb->truesize may go out of sync because we would have updated it without changing the packet length. This violates the constraints on truesize. This patch postpones the update of skb->truesize to prevent this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:07 -08:00
Herbert Xu	1a6509d991	[IPSEC]: Add support for combined mode algorithms This patch adds support for combined mode algorithms with GCM being the first algorithm supported. Combined mode algorithms can be added through the xfrm_user interface using the new algorithm payload type XFRMA_ALG_AEAD. Each algorithms is identified by its name and the ICV length. For the purposes of matching algorithms in xfrm_tmpl structures, combined mode algorithms occupy the same name space as encryption algorithms. This is in line with how they are negotiated using IKE. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:03 -08:00
Herbert Xu	38320c70d2	[IPSEC]: Use crypto_aead and authenc in ESP This patch converts ESP to use the crypto_aead interface and in particular the authenc algorithm. This lays the foundations for future support of combined mode algorithms. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:02 -08:00
Paul Moore	16efd45435	NetLabel: Add secid token support to the NetLabel secattr struct This patch adds support to the NetLabel LSM secattr struct for a secid token and a type field, paving the way for full LSM/SELinux context support and "static" or "fallback" labels. In addition, this patch adds a fair amount of documentation to the core NetLabel structures used as part of the NetLabel kernel API. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-01-30 08:17:19 +11:00
Stephen Hemminger	ac97f75faa	[IPV4] fib_trie: remove unneeded NULL check Since fib_route_seq_show now uses hlist_for_each_entry(), the leaf info can not be NULL. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:26 -08:00
Stephen Hemminger	f638a2f057	[IPV4] fib_trie: More whitespace cleanup. Remove extra blank lines. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:25 -08:00
Denis V. Lunev	dde1bc0e6f	[NETNS]: Add namespace for ICMP replying code. All needed API is done, the namespace is available when required from the device on the DST entry from the incoming packet. So, just replace init_net with proper namespace. Other protocols will follow. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:13 -08:00
Denis V. Lunev	b5921910a1	[NETNS]: Routing cache virtualization. Basically, this piece looks relatively easy. Namespace is already available on the dst entry via device and the device is safe to dereferrence. Compare it with one of a searcher and skip entry if appropriate. The only exception is ip_rt_frag_needed. So, add namespace parameter to it. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:13 -08:00
Denis V. Lunev	f206351a50	[NETNS]: Add namespace parameter to ip_route_output_key. Needed to propagate it down to the ip_route_output_flow. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:07 -08:00
Denis V. Lunev	f1b050bf7a	[NETNS]: Add namespace parameter to ip_route_output_flow. Needed to propagate it down to the __ip_route_output_key. Signed_off_by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:06 -08:00
Denis V. Lunev	611c183ebc	[NETNS]: Add namespace parameter to __ip_route_output_key. This is only required to propagate it down to the ip_route_output_slow. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:05 -08:00
Denis V. Lunev	b40afd0e5c	[NETNS]: Add namespace parameter to ip_route_output_slow. This function needs a net namespace to lookup devices, fib tables, etc. in, so pass it there. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:05 -08:00
Denis V. Lunev	1ab352768f	[NETNS]: Add namespace parameter to ip_dev_find. in_dev_find() need a namespace to pass it to fib_get_table(), so add an argument. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:04 -08:00
Denis V. Lunev	010278ec4c	[NETNS]: Add netns parameter to fib_select_default. Currently fib_select_default calls fib_get_table() with the init_net. Prepare it to provide a correct namespace to lookup default route. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:03 -08:00
Denis V. Lunev	64c2d53829	[IPV4]: Consolidate fib_select_default. The difference in the implementation of the fib_select_default when CONFIG_IP_MULTIPLE_TABLES is (not) defined looks negligible. Consolidate it and place into fib_frontend.c. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:02 -08:00
Stephen Hemminger	d5ce8a0e97	[IPV4] fib_trie: avoid rescan on dump This converts dumping (and flushing) of large route tables form O(N^2) to O(N). If the route dump took multiple pages then the dump routine gets called again. The old code kept track of location by counter, the new code instead uses the last key. This is a really big win ( 0.3 sec vs 12 sec) for big route tables. One side effect is that if the table changes during the dump, then the last key will not be found, and we will return -EBUSY. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:01 -08:00
Stephen Hemminger	9195bef7fb	[IPV4] fib_trie: avoid extra search on delete Get rid of extra search that made route deletion O(n). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:00 -08:00
Stephen Hemminger	a88ee22925	[IPV4] fib_trie: dump table in sorted order It is easier with TRIE to dump the data traversal rather than interating over every possible prefix. This saves some time and makes the dump come out in sorted order. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:00 -08:00
Stephen Hemminger	82cfbb0085	[IPV4] fib_trie: iterator recode Remove the complex loop structure of nextleaf() and replace it with a simpler tree walker. This improves the performance and is much cleaner. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:59 -08:00
Stephen Hemminger	64347f786d	[IPV4] fib_trie: dump message multiple part flag Match fib_hash, and set NLM_F_MULTI to handle multiple part messages. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:58 -08:00
Stephen Hemminger	1328042e26	[IPV4] fib_trie: use hash list The code to dump can use the existing hash chain rather than doing repeated lookup. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:58 -08:00
Stephen Hemminger	936722922f	[IPV4] fib_trie: compute size when needed Compute the number of prefixes when needed, rather than doing bookeeping. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:57 -08:00
Stephen Hemminger	a07f5f508a	[IPV4] fib_trie: style cleanup Style cleanups: * make check_leaf return -1 or plen, rather than by reference * Get rid of #ifdef that is always set * split out embedded function calls in if statements. * checkpatch warnings Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:56 -08:00
Stephen Hemminger	bc3c8c1e02	[IPV4] fib_trie: put leaf nodes in a slab cache This improves locality for operations that touch all the leaves. Save space since these entries don't need to be hardware cache aligned. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:56 -08:00
Eric Dumazet	69a73829db	[DST]: shrinks sizeof(struct rtable) by 64 bytes on x86_64 On x86_64, sizeof(struct rtable) is 0x148, which is rounded up to 0x180 bytes by SLAB allocator. We can reduce this to exactly 0x140 bytes, without alignment overhead, and store 12 struct rtable per PAGE instead of 10. rate_tokens is currently defined as an "unsigned long", while its content should not exceed 6*HZ. It can safely be converted to an unsigned int. Moving tclassid right after rate_tokens to fill the 4 bytes hole permits to save 8 bytes on 'struct dst_entry', which finally permits to save 8 bytes on 'struct rtable' Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:41 -08:00
Pavel Emelyanov	81566e8322	[NETNS][FRAGS]: Make the pernet subsystem for fragments. On namespace start we mainly prepare the ctl variables. When the namespace is stopped we have to kill all the fragments that point to this namespace. The inet_frags_exit_net() handles it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:40 -08:00
Pavel Emelyanov	3140c25c82	[NETNS][FRAGS]: Make the LRU list per namespace. The inet_frags.lru_list is used for evicting only, so we have to make it per-namespace, to evict only those fragments, who's namespace exceeded its high threshold, but not the whole hash. Besides, this helps to avoid long loops in evictor. The spinlock is not per-namespace because it protects the hash table as well, which is global. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:39 -08:00
Pavel Emelyanov	3b4bc4a2bf	[NETNS][FRAGS]: Isolate the secret interval from namespaces. Since we have one hashtable to lookup the fragment, having different secret_interval-s for hash rebuild doesn't make sense, so move this one to inet_frags. The inet_frags_ctl becomes empty after this, so remove it. The appropriate ctl table is kept read-only in namespaces. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:39 -08:00
Pavel Emelyanov	e31e0bdc7e	[NETNS][FRAGS]: Make thresholds work in namespaces. This is the same as with the timeout variable. Currently, after exceeding the high threshold _all_ the fragments are evicted, but it will be fixed in later patch. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:38 -08:00
Pavel Emelyanov	b2fd5321dd	[NETNS][FRAGS]: Make the net.ipv4.ipfrag_timeout work in namespaces. Move it to the netns_frags, adjust the usage and make the appropriate ctl table writable. Now fragment, that live in different namespaces can live for different times. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:37 -08:00
Pavel Emelyanov	e4a2d5c2bc	[NETNS][FRAGS]: Duplicate sysctl tables for new namespaces. Each namespace has to have own tables to tune their different parameters, so duplicate the tables and register them. All the tables in sub-namespaces are temporarily made read-only. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:37 -08:00
Pavel Emelyanov	6ddc082223	[NETNS][FRAGS]: Make the mem counter per-namespace. This is also simple, but introduces more changes, since then mem counter is altered in more places. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:36 -08:00
Pavel Emelyanov	e5a2bb842c	[NETNS][FRAGS]: Make the nqueues counter per-namespace. This is simple - just move the variable from struct inet_frags to struct netns_frags and adjust the usage appropriately. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:35 -08:00
Pavel Emelyanov	ac18e7509e	[NETNS][FRAGS]: Make the inet_frag_queue lookup work in namespaces. Since fragment management code is consolidated, we cannot have the pointer from inet_frag_queue to struct net, since we must know what king of fragment this is. So, I introduce the netns_frags structure. This one is currently empty, but will be eventually filled with per-namespace attributes. Each inet_frag_queue is tagged with this one. The conntrack_reasm is not "netns-izated", so it has one static netns_frags instance to keep working in init namespace. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:34 -08:00
Pavel Emelyanov	8d8354d2fb	[NETNS][FRAGS]: Move ctl tables around. This is a preparation for sysctl netns-ization. Move the ctl tables to the files, where the tuning variables reside. Plus make the helpers to register the tables. This will simplify the later patches and will keep similar things closer to each other. ipv4, ipv6 and conntrack_reasm are patched differently, but the result is all the tables are in appropriate files. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:34 -08:00
YOSHIFUJI Hideaki	fc80be87dc	[IPV4] UDP,UDPLITE: Sparse: {__udp4_lib,udp,udplite}_err() are of void. Fix following sparse warnings: \| net/ipv4/udp.c:421:2: warning: returning void-valued expression \| net/ipv4/udplite.c:38:2: warning: returning void-valued expression Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-01-28 15:10:24 -08:00
Denis V. Lunev	ecfdc8c542	[NETNS]: Pass correct namespace in ip_rt_get_source. ip_rt_get_source is the infamous place for which dst_ifdown kludges have been implemented. This means that rt->u.dst.dev can be safely dereferrenced obtain nd_net. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:23 -08:00
Denis V. Lunev	84a885f449	[NETNS]: Pass correct namespace in ip_route_input_slow. The packet on the input path always has a referrence to an input network device it is passed from. Extract network namespace from it. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:22 -08:00
Denis V. Lunev	86167a377f	[NETNS]: Pass correct namespace in context fib_check_nh. Correct network namespace is already used in fib_check_nh. Re-work its usage for better readability and pass into fib_lookup & inetdev_by_index. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:21 -08:00
Denis V. Lunev	5b707aaae4	[NETNS]: Pass correct namespace in fib_validate_source. Correct network namespace is available inside fib_validate_source. It can be obtained from the device passed in. The device is not NULL as in_device is obtained from it just above. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:21 -08:00
Denis V. Lunev	7fee0ca237	[NETNS]: Add netns parameter to inetdev_by_index. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:20 -08:00
Denis V. Lunev	da0e28cb68	[NETNS]: Add netns parameter to fib_lookup. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:19 -08:00
Stephen Hemminger	ba93ef7465	[IPV4]: ipmr sparse warnings Get rid of some of the sparse warnings. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:18 -08:00
Stephen Hemminger	dd329bfa96	[IPV4]: igmp sparse warnings Partial sparse warning fix. The other conditional locking is too much for sparse to handle. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:10:18 -08:00
Jan Engelhardt	1e637c74b0	[IPV4]: Enable use of 240/4 address space. This short patch modifies the IPv4 networking to enable use of the 240.0.0.0/4 (aka "class-E") address space as propsed in the internet draft draft-fuller-240space-00.txt. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:44 -08:00
Denis V. Lunev	51314a17ba	[NETNS]: Process FIB rule action in the context of the namespace. Save namespace context on the fib rule at the rule creation time and call routing lookup in the correct namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:14 -08:00
Denis V. Lunev	9e3a548781	[NETNS]: FIB rules API cleanup. Remove struct net from fib_rules_register(unregister)/notify_change paths and diet code size a bit. add/remove: 0/0 grow/shrink: 10/12 up/down: 35/-100 (-65) function old new delta notify_rule_change 273 280 +7 trie_show_stats 471 475 +4 fn_trie_delete 473 477 +4 fib_rules_unregister 144 148 +4 fib4_rule_compare 119 123 +4 resize 2842 2845 +3 fn_trie_select_default 515 518 +3 inet_sk_rebuild_header 836 838 +2 fib_trie_seq_show 764 766 +2 __devinet_sysctl_register 276 278 +2 fn_trie_lookup 1124 1123 -1 ip_fib_check_default 133 131 -2 devinet_conf_sysctl 223 221 -2 snmp_fold_field 126 123 -3 fn_trie_insert 2091 2086 -5 inet_create 876 870 -6 fib4_rules_init 197 191 -6 fib_sync_down 452 444 -8 inet_gso_send_check 334 325 -9 fib_create_info 3003 2991 -12 fib_nl_delrule 568 553 -15 fib_nl_newrule 883 852 -31 Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:13 -08:00
Denis V. Lunev	0359238333	[FIB]: Add netns to fib_rules_ops. The backward link from FIB rules operations to the network namespace will allow to simplify the API a bit. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:13 -08:00
Denis V. Lunev	775516bfa2	[NETNS]: Namespace stop vs 'ip r l' race. During network namespace stop process kernel side netlink sockets belonging to a namespace should be closed. They should not prevent namespace to stop, so they do not increment namespace usage counter. Though this counter will be put during last sock_put. The raplacement of the correct netns for init_ns solves the problem only partial as socket to be stoped until proper stop is a valid netlink kernel socket and can be looked up by the user processes. This is not a problem until it resides in initial namespace (no processes inside this net), but this is not true for init_net. So, hold the referrence for a socket, remove it from lookup tables and only after that change namespace and perform a last put. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:08 -08:00
Denis V. Lunev	b7c6ba6eb1	[NETNS]: Consolidate kernel netlink socket destruction. Create a specific helper for netlink kernel socket disposal. This just let the code look better and provides a ground for proper disposal inside a namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:07 -08:00
Denis V. Lunev	4f84d82f7a	[NETNS]: Memory leak on network namespace stop. Network namespace allocates 2 kernel netlink sockets, fibnl & rtnl. These sockets should be disposed properly, i.e. by sock_release. Plain sock_put is not enough. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:06 -08:00
Daniel Lezcano	569d36452e	[NETNS][DST] dst: pass the dst_ops as parameter to the gc functions The garbage collection function receive the dst_ops structure as parameter. This is useful for the next incoming patchset because it will need the dst_ops (there will be several instances) and the network namespace pointer (contained in the dst_ops). The protocols which do not take care of the namespaces will not be impacted by this change (expect for the function signature), they do just ignore the parameter. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:46 -08:00
Eric Dumazet	a6501e080c	[IPV4] FIB_HASH: Reduce memory needs and speedup lookups Currently, sizeof(struct fib_alias) is 24 or 48 bytes on 32/64 bits arches. Because of SLAB_HWCACHE_ALIGN requirement, these are rounded to 32 and 64 bytes respectively. This patch moves rcu to the end of fib_alias, and conditionally defines it only for CONFIG_IP_FIB_TRIE. We also remove SLAB_HWCACHE_ALIGN requirement for fib_alias and fib_node objects because it is not necessary. (BTW SLUB currently denies it for objects smaller than cache_line_size() / 2, but not SLAB) Finally, sizeof(fib_alias) go back to 16 and 32 bytes. Then, we can embed one fib_alias on each fib_node, to favor locality. Most of the time access to the fib_alias will be free because one cache line contains both the list head (fn_alias) and (one of) the list element. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:46 -08:00
Eric Dumazet	b59cfbf77d	[FIB]: Fix rcu_dereference() abuses in fib_trie.c node_parent() and tnode_get_child() currently use rcu_dereference(). These functions are called from both - readers only paths (where rcu_dereference() is needed), and - writer path (where rcu_dereference() is not needed) To make explicit where rcu_dereference() is really needed, I introduced new node_parent_rcu() and tnode_get_child_rcu() functions which use rcu_dereference(), while node_parent() and tnode_get_child() dont use it. Then I changed calling sites where rcu_dereference() was really needed to call the _rcu() variants. This should have no impact but for alpha architecture, and may help future sparse checks. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:45 -08:00
Patrick McHardy	c71e916708	[NETFILTER]: nf_conntrack: make print_conntrack function optional for l4protos Allows to remove five empty implementations. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:42 -08:00
Patrick McHardy	c56cc9c07b	[NETFILTER]: nf_conntrack: remove print_conntrack function from l3protos Its unused and unlikely to ever be used. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:41 -08:00
Patrick McHardy	4f536522da	[NETFILTER]: kill nf_sysctl.c Since there now is generic support for shared sysctl paths, the only remains are the net/netfilter and net/ipv4/netfilter paths. Move them to net/netfilter/core.c and net/ipv4/netfilter.c and kill nf_sysctl.c. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:40 -08:00
Denys Vlasenko	9ba99b0d3f	[NETFILTER]: ipt_REJECT: properly handle IP options The current TCP RST construction reuses the old packet and can't deal with IP options as a consequence of that. Construct the RST from scratch instead. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:30 -08:00
Denys Vlasenko	022748a935	[NETFILTER]: {ip,ip6}_tables: remove some inlines This patch removes inlines except those which are used by packet matching code and thus are performance-critical. Before: $ size ///iptables.o text data bss dec hex filename 6402 500 16 6918 1b06 net/ipv4/netfilter/ip_tables.o 7130 500 16 7646 1dde net/ipv6/netfilter/ip6_tables.o After: $ size ///iptables.o text data bss dec hex filename 6307 500 16 6823 1aa7 net/ipv4/netfilter/ip_tables.o 7010 500 16 7526 1d66 net/ipv6/netfilter/ip6_tables.o Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:29 -08:00
Jan Engelhardt	f72e25a897	[NETFILTER]: Rename ipt_iprange to xt_iprange This patch moves ipt_iprange to xt_iprange, in preparation for adding IPv6 support to xt_iprange. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:27 -08:00
Jan Engelhardt	2ae15b64e6	[NETFILTER]: Update modules' descriptions Updates the MODULE_DESCRIPTION() tags for all Netfilter modules, actually describing what the module does and not just "netfilter XYZ target". Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:26 -08:00
Jan Engelhardt	11fa2aa362	[NETFILTER]: remove ipt_TOS.c Commit 88c85d81f74f92371745158aebc5cbf490412002 forgot to remove the old ipt_TOS file (whose code has been merged into xt_DSCP). Remove it now. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:17 -08:00
Patrick McHardy	8ce22fcab4	[NETFILTER]: Remove some EXPERIMENTAL dependencies Most of the netfilter modules are not considered experimental anymore, the only ones I want to keep marked as EXPERIMENTAL are: - TCPOPTSTRIP target, which is brand new. - SANE helper, which is quite new. - CLUSTERIP target, which I believe hasn't had much testing despite being in the kernel for quite a long time. - SCTP match and conntrack protocol, which are a mess and need to be reviewed and cleaned up before I would trust them. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:16 -08:00
Stephen Hemminger	7f9b80529b	[IPV4]: fib hash\|trie initialization Initialization of the slab cache's should be done when IP is initialized to make sure of available memory, and that code can be marked __init. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:15 -08:00
Stephen Hemminger	d717a9a620	[IPV4] fib_trie: size and statistics Show number of entries in trie, the size field was being set but never used, but it only counted leaves, not all entries. Refactor the two cases in fib_triestat_seq_show into a single routine. Note: the stat structure was being malloc'd but the stack usage isn't so high (288 bytes) that it is worth the additional complexity. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:14 -08:00
Eric Dumazet	28d36e3702	[FIB]: Avoid using static variables without proper locking fib_trie_seq_show() uses two helper functions, rtn_scope() and rtn_type() that can write to static storage without locking. Just pass to them a temporary buffer to avoid potential corruption (probably not triggerable but still...) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:13 -08:00
Denis V. Lunev	39a6d06300	[NETNS]: Process inet_confirm_addr in the correct namespace. inet_confirm_addr can be called with NULL in_dev from arp_ignore iff scope is RT_SCOPE_LINK. Lets always pass the device and check for RT_SCOPE_LINK scope inside inet_confirm_addr. This let us take network namespace from in_device a need for an additional argument. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:13 -08:00
Denis V. Lunev	9bd85e3264	[IPV4]: Remove extra argument from arp_ignore. arp_ignore has two arguments: dev & in_dev. dev is used for inet_confirm_addr calling only. inet_confirm_addr, in turn, either gets in_dev from the device passed or iterates over all network devices if the device passed is NULL. It seems logical to directly pass in_dev into inet_confirm_addr. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:12 -08:00
Denis V. Lunev	2db82b534b	[NETNS]: Make arp code network namespace consistent. Some calls in the arp.c have network namespace as an argument. Getting init_net inside these functions is simply inconsistent. Fix this. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:08 -08:00
Denis V. Lunev	a79878f00d	[ARP]: Move inet_addr_type call after simple error checks in arp_contructor. The neighbour entry will be destroyed in the case of error, so it is pointless to perform constly routing table lookup in this case. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:08 -08:00
Pavel Emelyanov	a308da1627	[NETNS][RAW]: Create the /proc/net/raw(6) in each namespace. To do so, just register the proper subsystem and create files in ->init callbacks. No other special per-namespace handling for raw sockets is required. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:07 -08:00
Pavel Emelyanov	e5ba31f11f	[NETNS][RAW]: Eliminate explicit init_net references. Happily, in all the rest places (->bind callbacks only), that require the struct net, we have a socket, so get the net from it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:06 -08:00
Pavel Emelyanov	f51d599fbe	[NETNS][RAW]: Make /proc/net/raw(6) show per-namespace socket list. Pull the struct net pointer up to the showing functions to filter the sockets depending on their namespaces. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:06 -08:00
Pavel Emelyanov	be185884b3	[NETNS][RAW]: Make ipv[46] raw sockets lookup namespaces aware. This requires just to pass the appropriate struct net pointer into __raw_v[46]_lookup and skip sockets that do not belong to a needed namespace. The proper net is get from skb->dev in all the cases. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:05 -08:00
Eric Dumazet	8d96544475	[FIB]: full_children & empty_children should be uint, not ushort If declared as unsigned short, these fields can overflow, and whole trie logic is broken. I could not make the machine crash, but some tnode can never be freed. Note for 64 bit arches : By reordering t_key and parent in [node, leaf, tnode] structures, we can use 32 bits hole after t_key so that sizeof(struct tnode) doesnt change after this patch. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:04 -08:00
Eric Dumazet	4dde4610c4	[IPV4] fib_trie: removes a memset() call in tnode_new() tnode_alloc() already clears allocated memory, using kcalloc() or alloc_pages(GFP_KERNEL\|__GFP_ZERO, ...) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:02 -08:00
David S. Miller	88ebc72f68	[IPV4] FIB: Include nexthop device indexes in fib_info hashfn. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:02:01 -08:00
Eric Dumazet	112d8cfcbf	[FIB]: Reduce text size of net/ipv4/fib_trie.o In struct tnode, we use two fields of 5 bits for 'pos' and 'bits'. Switching to plain 'unsigned char' (8 bits) take the same space because of compiler alignments, and reduce text size by 435 bytes on i386. On i386 : $ size net/ipv4/fib_trie.o.before_patch net/ipv4/fib_trie.o text data bss dec hex filename 13714 4 64 13782 35d6 net/ipv4/fib_trie.o.before 13279 4 64 13347 3423 net/ipv4/fib_trie.o Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:58 -08:00
Stephen Hemminger	c95aaf9af5	[IPV4] fib_trie: Fix sparse warnings. Make FIB TRIE go through sparse checker without warnings. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:56 -08:00
Stephen Hemminger	66a2f7fd2f	[IPV4] fib_trie: Add statistics. The FIB TRIE code has a bunch of statistics, but the code is hidden behind an ifdef that was never implemented. Since it was dead code, it was broken as well. This patch fixes that by making it a config option. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:56 -08:00
Stephen Hemminger	a6db901092	[IPV4] FIB: printk related cleanups printk related cleanups: * Get rid of unused printk wrappers. * Make bug checks into KERN_WARNING because KERN_DEBUG gets ignored * Turn one cryptic old message into something real * Make sure all messages have KERN_XXX Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:55 -08:00
Stephen Hemminger	fea86ad812	[IPV4] fib_trie: fib_insert_node cleanup The only error from fib_insert_node is if memory allocation fails, so instead of passing by reference, just use the convention of returning NULL. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:54 -08:00
Stephen Hemminger	187b5188a7	[IPV4] fib_trie: Use %u for unsigned printfs. Use %u instead of %d when printing unsigned values. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:53 -08:00
Stephen Hemminger	93e4308b3b	[IPV4] fib_trie: Get rid of unused revision element. The revision element must of been part of an earlier design, because currently it is set but never used. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:53 -08:00
Stephen Hemminger	c28a1cf448	[IPV4] fib_trie: Get rid of trie_init(). trie_init is worthless it is just zeroing stuff that is already zero! Move the memset() down to make it obvious. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:52 -08:00
Ilpo Järvinen	cea14e0ed6	[TCP]: Uninline tcp_is_cwnd_limited net/ipv4/tcp_cong.c: tcp_reno_cong_avoid \| -65 1 function changed, 65 bytes removed, diff: -65 net/ipv4/arp.c: arp_ignore \| -5 1 function changed, 5 bytes removed, diff: -5 net/ipv4/tcp_bic.c: bictcp_cong_avoid \| -57 1 function changed, 57 bytes removed, diff: -57 net/ipv4/tcp_cubic.c: bictcp_cong_avoid \| -61 1 function changed, 61 bytes removed, diff: -61 net/ipv4/tcp_highspeed.c: hstcp_cong_avoid \| -63 1 function changed, 63 bytes removed, diff: -63 net/ipv4/tcp_hybla.c: hybla_cong_avoid \| -85 1 function changed, 85 bytes removed, diff: -85 net/ipv4/tcp_htcp.c: htcp_cong_avoid \| -57 1 function changed, 57 bytes removed, diff: -57 net/ipv4/tcp_veno.c: tcp_veno_cong_avoid \| -52 1 function changed, 52 bytes removed, diff: -52 net/ipv4/tcp_scalable.c: tcp_scalable_cong_avoid \| -61 1 function changed, 61 bytes removed, diff: -61 net/ipv4/tcp_yeah.c: tcp_yeah_cong_avoid \| -75 1 function changed, 75 bytes removed, diff: -75 net/ipv4/tcp_illinois.c: tcp_illinois_cong_avoid \| -54 1 function changed, 54 bytes removed, diff: -54 net/dccp/ccids/ccid3.c: ccid3_update_send_interval \| -7 ccid3_hc_tx_packet_recv \| +7 2 functions changed, 7 bytes added, 7 bytes removed, diff: +0 net/ipv4/tcp_cong.c: tcp_is_cwnd_limited \| +88 1 function changed, 88 bytes added, diff: +88 built-in.o: 14 functions changed, 95 bytes added, 642 bytes removed, diff: -547 ...Again some gcc artifacts visible as well. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:48 -08:00
Ilpo Järvinen	490d504693	[TCP]: Uninline tcp_set_state net/ipv4/tcp.c: tcp_close_state \| -226 tcp_done \| -145 tcp_close \| -564 tcp_disconnect \| -141 4 functions changed, 1076 bytes removed, diff: -1076 net/ipv4/tcp_input.c: tcp_fin \| -86 tcp_rcv_state_process \| -164 2 functions changed, 250 bytes removed, diff: -250 net/ipv4/tcp_ipv4.c: tcp_v4_connect \| -209 1 function changed, 209 bytes removed, diff: -209 net/ipv4/arp.c: arp_ignore \| +5 1 function changed, 5 bytes added, diff: +5 net/ipv6/tcp_ipv6.c: tcp_v6_connect \| -158 1 function changed, 158 bytes removed, diff: -158 net/sunrpc/xprtsock.c: xs_sendpages \| -2 1 function changed, 2 bytes removed, diff: -2 net/dccp/ccids/ccid3.c: ccid3_update_send_interval \| +7 1 function changed, 7 bytes added, diff: +7 net/ipv4/tcp.c: tcp_set_state \| +238 1 function changed, 238 bytes added, diff: +238 built-in.o: 12 functions changed, 250 bytes added, 1695 bytes removed, diff: -1445 I've no explanation why some unrelated changes seem to occur consistently as well (arp_ignore, ccid3_update_send_interval; I checked the arp_ignore asm and it seems to be due to some reordered of operation order causing some extra opcodes to be generated). Still, the benefits are pretty obvious from the codiff's results. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:47 -08:00
David S. Miller	9993e7d313	[TCP]: Do not purge sk_forward_alloc entirely in tcp_delack_timer(). Otherwise we beat heavily on the global tcp_memory atomics when all of the sockets in the system are slowly sending perioding packet clumps. Noticed and suggested by Eric Dumazet. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:42 -08:00
Denis V. Lunev	8cced9eff1	[NETNS]: Enable routing configuration in non-initial namespace. I.e. remove the net != &init_net checks from the places, that now can handle other-than-init net namespace. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:35 -08:00
Denis V. Lunev	226b0b4a51	[NETNS]: Replace init_net with the correct context in fib_frontend.c Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:34 -08:00
Denis V. Lunev	1bad118a33	[NETNS]: Pass namespace through ip_rt_ioctl. ... up to rtentry_to_fib_config Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:34 -08:00
Denis V. Lunev	4b5d47d4d3	[NETNS]: Correctly fill fib_config data. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:33 -08:00
Denis V. Lunev	6bd48fcf73	[NETNS]: Provide correct namespace for fibnl netlink socket. This patch makes the netlink socket to be per namespace. That allows to have each namespace its own socket for routing queries. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:32 -08:00
Denis V. Lunev	e4aef8aea3	[NETNS]: Place fib tables into netns. The preparatory work has been done. All we need is to substitute fib_table_hash with net->ipv4.fib_table_hash. Netns context is available when required. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:31 -08:00
Denis V. Lunev	e4e4971c5f	[NETNS]: Namespacing IPv4 fib rules. The final trick for rules: place fib4_rules_ops into struct net and modify initialization path for this. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:31 -08:00
Denis V. Lunev	1c340b2fd7	[NETNS]: Show routing information from correct namespace (fib_trie.c) This is the second part (for the CONFIG_IP_FIB_TRIE case) of the patch #4, where we have created proc files in namespaces. Now we can dump correct info in them. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:30 -08:00
Denis V. Lunev	6e04d01dfa	[NETNS]: Show routing information from correct namespace (fib_hash.c) This is the second part (for the CONFIG_IP_FIB_HASH case) of the patch #4, where we have created proc files in namespaces. Now we can dump correct info in them. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:29 -08:00
Denis V. Lunev	4d1169c1e7	[NETNS]: Add netns to nl_info structure. nl_info is used to track the end-user destination of routing change notification. This is a natural object to hold a namespace on. Place it there and utilize the context in the appropriate places. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:29 -08:00
Eric W. Biederman	6b175b26c1	[NETNS]: Add netns parameter to inet_(dev_)add_type. The patch extends the inet_addr_type and inet_dev_addr_type with the network namespace pointer. That allows to access the different tables relatively to the network namespace. The modification of the signature function is reported in all the callers of the inet_addr_type using the pointer to the well known init_net. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:27 -08:00
Denis V. Lunev	8ad4942cd5	[NETNS]: Add netns parameter to fib_get_table/fib_new_table. This patch extends the fib_get_table and the fib_new_table functions with the network namespace pointer. That will allow to access the table relatively from the network namespace. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:27 -08:00
Denis V. Lunev	93456b6d77	[IPV4]: Unify access to the routing tables. Replace the direct pointers to local and main tables with calls to fib_get_table() with appropriate argument. This doesn't introduce additional dereferences, but makes the access to fib tables uniform in any (CONFIG_IP_MULTIPLE_TABLES) case. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:26 -08:00
Denis V. Lunev	7b1a74fdbb	[NETNS]: Refactor fib initialization so it can handle multiple namespaces. This patch makes the fib to be initialized as a subsystem for the network namespaces. The code does not handle several namespaces yet, so in case of a creation of a network namespace, the creation/initialization will not occur. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:25 -08:00
Denis V. Lunev	dbb50165b5	[IPV4]: Check fib4_rules_init failure. This adds error paths into both versions of fib4_rules_init (with/without CONFIG_IP_MULTIPLE_TABLES) and returns error code to the caller. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:25 -08:00
Denis V. Lunev	61a0265344	[NETNS]: Add namespace to API for routing /proc entries creation. This adds netns parameter to fib_proc_init/exit and replaces __init specifier with __net_init. After this, we will not yet have these proc files show info from the specific namespace - this will be done when these tables become namespaced. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:24 -08:00
Denis V. Lunev	868d13ac81	[NETNS]: Pass fib_rules_ops into default_pref method. fib_rules_ops contains operations and the list of configured rules. ops will become per/namespace soon, so we need them to be known in the default_pref callback. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:22 -08:00
Denis V. Lunev	f8c26b8d58	[NETNS]: Add netns parameter to fib_rules_(un)register. The patch extends the different fib rules API in order to pass the network namespace pointer. That will allow to access the different tables from a namespace relative object. As usual, the pointer to the init_net variable is passed as parameter so we don't break the network. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:21 -08:00
Pavel Emelyanov	3d7cc2ba62	[NETFILTER]: Switch to using ctl_paths in nf_queue and conntrack modules This includes the most simple cases for netfilter. The first part is tne queue modules for ipv4 and ipv6, on which the net/ipv4/ and net/ipv6/ paths are reused from the appropriate ipv4 and ipv6 code. The conntrack module is also patched, but this hunk is very small and simple. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:10 -08:00
Pavel Emelyanov	90754f8ec0	[IPVS]: Switch to using ctl_paths. The feature of ipvs ctls is that the net/ipv4/vs path is common for core ipvs ctls and for two schedulers, so I make it exported and re-use it in modules. Two other .c files required linux/sysctl.h to make the extern declaration of this path compile well. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:08 -08:00
Rami Rosen	cb7928a528	[IPV4]: Remove unsupported DNAT (RTCF_NAT and RTCF_NAT) in IPV4 - The DNAT (Destination NAT) is not implemented in IPV4. - This patch remove the code which checks these flags in net/ipv4/arp.c and net/ipv4/route.c. The RTCF_NAT and RTCF_NAT should stay in the header (linux/in_route.h) because they are used in DECnet. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:07 -08:00
Ilpo Järvinen	a067d9ac39	[NET]: Remove obsolete comment It seems that ip_build_xmit is no longer used in here and ip_append_data is used. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:45 -08:00
Ilpo Järvinen	ad1b30b1c2	[IPVS]: Kill some bloat net/ipv4/ipvs/ip_vs_xmit.c: ip_vs_icmp_xmit \| -638 ip_vs_tunnel_xmit \| -674 ip_vs_nat_xmit \| -716 ip_vs_dr_xmit \| -682 4 functions changed, 2710 bytes removed, diff: -2710 net/ipv4/ipvs/ip_vs_xmit.c: __ip_vs_get_out_rt \| +595 1 function changed, 595 bytes added, diff: +595 net/ipv4/ipvs/ip_vs_xmit.o: 5 functions changed, 595 bytes added, 2710 bytes removed, diff: -2115 Without some CONFIG.*DEBUGs: net/ipv4/ipvs/ip_vs_xmit.o: 5 functions changed, 383 bytes added, 1513 bytes removed, diff: -1130 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:43 -08:00
Eric Dumazet	2a75de0c1d	[NETNS]: Should build with CONFIG_SYSCTL=n Previous NETNS patches broke CONFIG_SYSCTL=n case Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:40 -08:00
Eric Dumazet	74feb6e84e	[ICMP]: Avoid sparse warnings in net/ipv4/icmp.c CHECK net/ipv4/icmp.c net/ipv4/icmp.c:249:13: warning: context imbalance in 'icmp_xmit_unlock' - unexpected unlock net/ipv4/icmp.c:376:13: warning: context imbalance in 'icmp_reply' - different lock contexts for basic block net/ipv4/icmp.c:430:6: warning: context imbalance in 'icmp_send' - different lock contexts for basic block Solution is to declare both icmp_xmit_lock() and icmp_xmit_unlock() as inline Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:37 -08:00
Eric Dumazet	65f7651788	[NET]: prot_inuse cleanups and optimizations 1) Cleanups (all functions are prefixed by sock_prot_inuse) sock_prot_inc_use(prot) -> sock_prot_inuse_add(prot,-1) sock_prot_dec_use(prot) -> sock_prot_inuse_add(prot,-1) sock_prot_inuse() -> sock_prot_inuse_get() New functions : sock_prot_inuse_init() and sock_prot_inuse_free() to abstract pcounter use. 2) if CONFIG_PROC_FS=n, we can zap 'inuse' member from "struct proto", since nobody wants to read the inuse value. This saves 1372 bytes on i386/SMP and some cpu cycles. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:36 -08:00
Ilpo Järvinen	e870a8efcd	[TCP]: Perform setting of common control fields in one place In case of segments which are purely for control without any data (SYN/ACK/FIN/RST), many fields are set to common values in multiple places. i386 results: $ gcc --version gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13) $ codiff tcp_output.o.old tcp_output.o.new net/ipv4/tcp_output.c: tcp_xmit_probe_skb \| -48 tcp_send_ack \| -56 tcp_retransmit_skb \| -79 tcp_connect \| -43 tcp_send_active_reset \| -35 tcp_make_synack \| -42 tcp_send_fin \| -48 7 functions changed, 351 bytes removed net/ipv4/tcp_output.c: tcp_init_nondata_skb \| +90 1 function changed, 90 bytes added tcp_output.o.mid: 8 functions changed, 90 bytes added, 351 bytes removed, diff: -261 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:34 -08:00
Ilpo Järvinen	19773b4923	[TCP]: Urgent parameter effect can be simplified. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:33 -08:00
Ilpo Järvinen	f038ac8f9b	[TCP]: cleanup tcp_parse_options deep indented switch Removed case indentation level & combined some nested ifs, mostly within 80 lines now. This is a leftover from indent patch, it just had to be done manually to avoid messing it up completely. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:33 -08:00
Eric Dumazet	9a429c4983	[NET]: Add some acquires/releases sparse annotations. Add __acquires() and __releases() annotations to suppress some sparse warnings. example of warnings : net/ipv4/udp.c:1555:14: warning: context imbalance in 'udp_seq_start' - wrong count at exit net/ipv4/udp.c:1571:13: warning: context imbalance in 'udp_seq_stop' - unexpected unlock Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:31 -08:00
Ilpo Järvinen	d436d68630	[TCP]: Remove unnecessary local variable Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:26 -08:00
Ilpo Järvinen	409d22b470	[TCP]: Code duplication removal, added tcp_bound_to_half_wnd() Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:26 -08:00
Ilpo Järvinen	056834d9f6	[TCP]: cleanup tcp_{in,out}put.c style These were manually selected from indent's results which as is are too noisy to be of any use without human reason. In addition, some extra newlines between function and its comment were removed too. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:25 -08:00
Ilpo Järvinen	058dc3342b	[TCP]: reduce tcp_output's indentation levels a bit Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:24 -08:00
Ilpo Järvinen	4828e7f49a	[TCP]: Remove TCPCB_URG & TCPCB_AT_TAIL as unnecessary The snd_up check should be enough. I suspect this has been there to provide a minor optimization in clean_rtx_queue which used to have a small if (!->sacked) block which could skip snd_up check among the other work. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:23 -08:00
Ilpo Järvinen	cadbd0313b	[TCP]: Dropped unnecessary skb/sacked accessing in reneging SACK reneging can be precalculated to a FLAG in clean_rtx_queue which has the right skb looked up. This will help a bit in future because skb->sacked access will be changed eventually, changing it already won't hurt any. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:23 -08:00
Ilpo Järvinen	90840defab	[TCP]: Introduce tcp_wnd_end() to reduce line lengths Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:22 -08:00
Ilpo Järvinen	66f5fe624f	[TCP]: Rename update_send_head & include related increment to it There's very little need to have the packets_out incrementing in a separate function. Also name the combined function appropriately. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:21 -08:00
Ilpo Järvinen	3ccd3130b3	[TCP]: Make invariant check complain about invalid sacked_out Earlier resolution for NewReno's sacked_out should now keep it small enough for this to become invariant-like check. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:20 -08:00
Hideo Aoki	95766fff6b	[UDP]: Add memory accounting. Signed-off-by: Takahiro Yasui <tyasui@redhat.com> Signed-off-by: Hideo Aoki <haoki@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:19 -08:00
Hideo Aoki	3ab224be6d	[NET] CORE: Introducing new memory accounting interface. This patch introduces new memory accounting functions for each network protocol. Most of them are renamed from memory accounting functions for stream protocols. At the same time, some stream memory accounting functions are removed since other functions do same thing. Renaming: sk_stream_free_skb() -> sk_wmem_free_skb() __sk_stream_mem_reclaim() -> __sk_mem_reclaim() sk_stream_mem_reclaim() -> sk_mem_reclaim() sk_stream_mem_schedule -> __sk_mem_schedule() sk_stream_pages() -> sk_mem_pages() sk_stream_rmem_schedule() -> sk_rmem_schedule() sk_stream_wmem_schedule() -> sk_wmem_schedule() sk_charge_skb() -> sk_mem_charge() Removeing sk_stream_rfree(): consolidates into sock_rfree() sk_stream_set_owner_r(): consolidates into skb_set_owner_r() sk_stream_mem_schedule() The following functions are added. sk_has_account(): check if the protocol supports accounting sk_mem_uncharge(): do the opposite of sk_mem_charge() In addition, to achieve consolidation, updating sk_wmem_queued is removed from sk_mem_charge(). Next, to consolidate memory accounting functions, this patch adds memory accounting calls to network core functions. Moreover, present memory accounting call is renamed to new accounting call. Finally we replace present memory accounting calls with new interface in TCP and SCTP. Signed-off-by: Takahiro Yasui <tyasui@redhat.com> Signed-off-by: Hideo Aoki <haoki@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:18 -08:00
Herbert Xu	9dd3245a2a	[IPSEC]: Move all calls to xfrm_audit_state_icvfail to xfrm_input Let's nip the code duplication in the bud :) Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:10 -08:00
Herbert Xu	0883ae0e55	[IPSEC]: Fix transport-mode async resume on intput without netfilter When netfilter is off the transport-mode async resumption doesn't work because we don't push back the IP header. This patch fixes that by moving most of the code outside of ifdef NETFILTER since the only part that's not common is the short-circuit in the protocol handler. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:10 -08:00
Ilpo Järvinen	c776ee01bd	[TCP]: Remove seq_rtt ptr from clean_rtx_queue args While checking Gavin's patch I noticed that the returned seq_rtt is not used by the caller. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:07 -08:00
Ilpo Järvinen	0e3a4803aa	[TCP]: Force TSO splits to MSS boundaries If snd_wnd - snd_nxt wasn't multiple of MSS, skb was split on odd boundary by the callers of tcp_window_allows. We try really hard to avoid unnecessary modulos. Therefore the old caller side check "if (skb->len < limit)" was too wide as well because limit is not bound in any way to skb->len and can cause spurious testing for trimming in the middle of the queue while we only wanted that to happen at the tail of the queue. A simple additional caller side check for tcp_write_queue_tail would likely have resulted 2 x modulos because the limit would have to be first calculated from window, however, doing that unnecessary modulo is not mandatory. After a minor change to the algorithm, simply determine first if the modulo is needed at all and at that point immediately decide also from which value it should be calculated from. This approach also kills some duplicated code. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:06 -08:00
Eric W. Biederman	426b5303eb	[NETNS]: Modify the neighbour table code so it handles multiple network namespaces I'm actually surprised at how much was involved. At first glance it appears that the neighbour table data structures are already split by network device so all that should be needed is to modify the user interface commands to filter the set of neighbours by the network namespace of their devices. However a couple things turned up while I was reading through the code. The proxy neighbour table allows entries with no network device, and the neighbour parms are per network device (except for the defaults) so they now need a per network namespace default. So I updated the two structures (which surprised me) with their very own network namespace parameter. Updated the relevant lookup and destroy routines with a network namespace parameter and modified the code that interacts with users to filter out neighbour table entries for devices of other namespaces. I'm a little concerned that we can modify and display the global table configuration and from all network namespaces. But this appears good enough for now. I keep thinking modifying the neighbour table to have per network namespace instances of each table type would should be cleaner. The hash table is already dynamically sized so there are it is not a limiter. The default parameter would be straight forward to take care of. However when I look at the how the network table is built and used I still find some assumptions that there is only a single neighbour table for each type of table in the kernel. The netlink operations, neigh_seq_start, the non-core network users that call neigh_lookup. So while it might be doable it would require more refactoring than my current approach of just doing a little extra filtering in the code. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:03 -08:00
Paul Moore	afeb14b490	[XFRM]: RFC4303 compliant auditing This patch adds a number of new IPsec audit events to meet the auditing requirements of RFC4303. This includes audit hooks for the following events: * Could not find a valid SA [sections 2.1, 3.4.2] . xfrm_audit_state_notfound() . xfrm_audit_state_notfound_simple() * Sequence number overflow [section 3.3.3] . xfrm_audit_state_replay_overflow() * Replayed packet [section 3.4.3] . xfrm_audit_state_replay() * Integrity check failure [sections 3.4.4.1, 3.4.4.2] . xfrm_audit_state_icvfail() While RFC4304 deals only with ESP most of the changes in this patch apply to IPsec in general, i.e. both AH and ESP. The one case, integrity check failure, where ESP specific code had to be modified the same was done to the AH code for the sake of consistency. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:01 -08:00
Eric Dumazet	dfd4f0ae2e	[TCP]: Avoid two divides in __tcp_grow_window() tcp_win_from_space() being signed, compiler might emit an integer divide to compute tcp_win_from_space()/2 . Using right shifts is OK here and less expensive. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:01 -08:00
Eric Dumazet	8beb5c5f12	[TCP]: Avoid a divide in tcp_mtu_probing() tcp_mtu_to_mss() being signed, compiler might emit an integer divide to compute tcp_mtu_to_mss()/2 . Using a right shift is OK here and less expensive. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:00 -08:00
David S. Miller	829942c187	[TCP]: Move mss variable in tcp_mtu_probing() Down into the only scope where it is used. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:59 -08:00
Eric Dumazet	ce55dd3610	[TCP]: tcp_write_timeout.c cleanup Before submiting a patch to change a divide to a right shift, I felt necessary to create a helper function tcp_mtu_probing() to reduce length of lines exceeding 100 chars in tcp_write_timeout(). Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:58 -08:00
Eric Dumazet	b790cedd24	[INET]: Avoid an integer divide in rt_garbage_collect() Since 'goal' is a signed int, compiler may emit an integer divide to compute goal/2. Using a right shift is OK here and less expensive. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:57 -08:00
YOSHIFUJI Hideaki	9cb5734e5b	[TCP]: Convert several length variable to unsigned. Several length variables cannot be negative, so convert int to unsigned int. This also allows us to do sane shift operations on those variables. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:56 -08:00
Eric Dumazet	b92edbe0b8	[TCP] Avoid two divides in tcp_output.c Because 'free_space' variable in __tcp_select_window() is signed, expression (free_space / 2) forces compiler to emit an integer divide. This can be changed to a plain right shift, less expensive. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:41 -08:00
Masahide NAKAMURA	a1b051405b	[XFRM] IPv6: Fix dst/routing check at transformation. IPv6 specific thing is wrongly removed from transformation at net-2.6.25. This patch recovers it with current design. o Update "path" of xfrm_dst since IPv6 transformation should care about routing changes. It is required by MIPv6 and off-link destined IPsec. o Rename nfheader_len which is for non-fragment transformation used by MIPv6 to rt6i_nfheader_len as IPv6 name space. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:36 -08:00
Ilpo Järvinen	bd515c3e48	[TCP]: Fix TSO deferring I'd say that most of what tcp_tso_should_defer had in between there was dead code because of this. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:36 -08:00
Pavel Emelyanov	7054fb9376	[INET]: Uninline the inet_twsk_put function. This one is not that big, but is widely used: saves 1200 bytes from net/ipv4/built-in.o add/remove: 1/0 grow/shrink: 1/12 up/down: 97/-1300 (-1203) function old new delta inet_twsk_put - 87 +87 __inet_lookup_listener 274 284 +10 tcp_sacktag_write_queue 2255 2254 -1 tcp_time_wait 482 411 -71 __inet_check_established 796 722 -74 tcp_v4_err 973 898 -75 __inet_twsk_kill 230 154 -76 inet_twsk_deschedule 180 103 -77 tcp_v4_do_rcv 462 384 -78 inet_hash_connect 686 607 -79 inet_twdr_do_twkill_work 236 150 -86 inet_twdr_twcal_tick 395 307 -88 tcp_v4_rcv 1744 1480 -264 tcp_timewait_state_process 975 644 -331 Export it for ipv6 module. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:28 -08:00
Pavel Emelyanov	77a5ba55da	[INET]: Uninline the __inet_lookup_established function. This is -700 bytes from the net/ipv4/built-in.o add/remove: 1/0 grow/shrink: 1/3 up/down: 340/-1040 (-700) function old new delta __inet_lookup_established - 339 +339 tcp_sacktag_write_queue 2254 2255 +1 tcp_v4_err 1304 973 -331 tcp_v4_rcv 2089 1744 -345 tcp_v4_do_rcv 826 462 -364 Exporting is for dccp module (used via e.g. inet_lookup). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:27 -08:00
Pavel Emelyanov	152da81deb	[INET]: Uninline the __inet_hash function. This one is used in quite many places in the networking code and seems to big to be inline. After the patch net/ipv4/build-in.o loses ~650 bytes: add/remove: 2/0 grow/shrink: 0/5 up/down: 461/-1114 (-653) function old new delta __inet_hash_nolisten - 282 +282 __inet_hash - 179 +179 tcp_sacktag_write_queue 2255 2254 -1 __inet_lookup_listener 284 274 -10 tcp_v4_syn_recv_sock 755 493 -262 tcp_v4_hash 389 35 -354 inet_hash_connect 1086 599 -487 This version addresses the issue pointed by Eric, that while being inline this function was optimized by gcc in respect to the 'listen_possible' argument. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:26 -08:00
Herbert Xu	195ad6a3ac	[IPSEC]: Rename tunnel-mode functions to avoid collisions with tunnels It appears that I've managed to create two different functions both called xfrm6_tunnel_output. This is because we have the plain tunnel encapsulation named xfrmX_tunnel as well as the tunnel-mode encapsulation which lives in the files xfrmX_mode_tunnel.c. This patch renames functions from the latter to use the xfrmX_mode_tunnel prefix to avoid name-space conflicts. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:18 -08:00
Patrick McHardy	33b8e77605	[NETFILTER]: Add CONFIG_NETFILTER_ADVANCED option The NETFILTER_ADVANCED option hides lots of the rather obscure netfilter options when disabled and provides defaults (M) that should allow to run a distribution firewall without further thinking. Defaults to 'y' to avoid breaking current configurations. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:12 -08:00
Patrick McHardy	34498825cb	[NETFILTER]: non-power-of-two jhash optimizations Apply Eric Dumazet's jhash optimizations where applicable. Quoting Eric: Thanks to jhash, hash value uses full 32 bits. Instead of returning hash % size (implying a divide) we return the high 32 bits of the (hash * size) that will give results between [0 and size-1] and same hash distribution. On most cpus, a multiply is less expensive than a divide, by an order of magnitude. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:11 -08:00
Jan Engelhardt	e79ec50b95	[NETFILTER]: Parenthesize macro parameters Parenthesize macro parameters. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:08 -08:00
Jan Engelhardt	643a2c15a4	[NETFILTER]: Introduce nf_inet_address A few netfilter modules provide their own union of IPv4 and IPv6 address storage. Will unify that in this patch series. (1/4): Rename union nf_conntrack_address to union nf_inet_addr and move it to x_tables.h. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:07 -08:00
Jan Engelhardt	df54aae022	[NETFILTER]: x_tables: use %u format specifiers Use %u format specifiers as ->family is unsigned. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:07 -08:00
Patrick McHardy	051578ccbc	[NETFILTER]: nf_nat: properly use RCU for ip_nat_decode_session We need to use rcu_assign_pointer/rcu_dereference to avoid races. Also remove an obsolete CONFIG_IP_NAT_NEEDED ifdef. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:06 -08:00
Patrick McHardy	1e796fda00	[NETFILTER]: constify nf_afinfo Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:05 -08:00
Patrick McHardy	7b2f9631e7	[NETFILTER]: nf_log: constify struct nf_logger and nf_log_packet loginfo arg Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:59 -08:00
Patrick McHardy	f01ffbd6e7	[NETFILTER]: nf_log: move logging stuff to seperate header Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:58 -08:00
Patrick McHardy	cc01dcbd26	[NETFILTER]: nf_nat: pass manip type instead of hook to nf_nat_setup_info nf_nat_setup_info gets the hook number and translates that to the manip type to perform. This is a relict from the time when one manip per hook could exist, the exact hook number doesn't matter anymore, its converted to the manip type. Most callers already know what kind of NAT they want to perform, so pass the maniptype in directly. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:57 -08:00
Patrick McHardy	ce4b1cebdc	[NETFILTER]: nf_nat: sprinkle a few __read_mostlys Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:57 -08:00
Patrick McHardy	2b628a0866	[NETFILTER]: nf_nat: mark NAT protocols const Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:56 -08:00
Patrick McHardy	3ee9e76038	[NETFILTER]: nf_nat_proto_gre: add missing module reference Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:55 -08:00
Patrick McHardy	77236b6e33	[NETFILTER]: ctnetlink: use netlink attribute helpers Use NLA_PUT_BE32, nla_get_be32() etc. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:54 -08:00
Pablo Neira Ayuso	13eae15a24	[NETFILTER]: ctnetlink: add support for NAT sequence adjustments The combination of NAT and helpers may produce TCP sequence adjustments. In failover setups, this information needs to be replicated in order to achieve a successful recovery of mangled, related connections. This patch is particularly useful for conntrackd, see: http://people.netfilter.org/pablo/conntrack-tools/ Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:50 -08:00
Patrick McHardy	d6a2ba07c3	[NETFILTER]: arp_tables: add compat support Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:49 -08:00
Patrick McHardy	11f6dff8af	[NETFILTER]: arp_tables: resync get_entries() with ip_tables Resync get_entries() with ip_tables.c by moving the checks from the setsockopt handler to the function itself. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:48 -08:00
Patrick McHardy	41acd975b9	[NETFILTER]: arp_tables: move ARPT_SO_GET_INFO handling to seperate function Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:47 -08:00
Patrick McHardy	27e2c26b85	[NETFILTER]: arp_tables: move counter allocation to seperate function More resyncing with ip_tables.c as preparation for compat support. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:47 -08:00
Patrick McHardy	fb5b6095f3	[NETFILTER]: arp_tables: move entry and target checks to seperate functions Resync with ip_tables.c as preparation for compat support. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:46 -08:00
Patrick McHardy	70f0bfcf6a	[NETFILTER]: arp_tables: remove ipchains compat hack Remove compatiblity hack copied from ip_tables.c - ipchains didn't even support arp_tables :) Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:45 -08:00
Patrick McHardy	197631201e	[NETFILTER]: arp_tables: use vmalloc_node() Use vmalloc_node() as in ip_tables.c. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:45 -08:00
Patrick McHardy	03dafbbdf8	[NETFILTER]: arp_tables: remove obsolete standard_check function The size check is already performed by xt_check_target, no need to do it again. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:43 -08:00
Patrick McHardy	6d6a55f42d	[NETFILTER]: ip_tables: remove ipchains compatibility hack ipchains support has been removed years ago. kill last remains. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:42 -08:00
Patrick McHardy	c9d8fe1317	[NETFILTER]: {ip,ip6}_tables: fix format strings Use %zu for sizeof() and remove casts. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:39 -08:00
Patrick McHardy	9c54795950	[NETFILTER]: {ip,ip6}_tables: reformat to eliminate differences Reformat ip_tables.c and ip6_tables.c in order to eliminate non-functional differences and minimize diff output. This allows to get a view of the real differences using: sed -e 's/IP6T/IPT/g' \ -e 's/IP6/IP/g' \ -e 's/INET6/INET/g' \ -e 's/ip6t/ipt/g' \ -e 's/ip6/ip/g' \ -e 's/ipv6/ip/g' \ -e 's/icmp6/icmp/g' \ net/ipv6/netfilter/ip6_tables.c \| \ diff -wup /dev/stdin net/ipv4/netfilter/ip_tables.c Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:39 -08:00
Patrick McHardy	b386d9f596	[NETFILTER]: ip_tables: move compat offset calculation to x_tables Its needed by ip6_tables and arp_tables as well. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:31 -08:00
Patrick McHardy	73cd598df4	[NETFILTER]: ip_tables: fix compat types Use compat types and compat iterators when dealing with compat entries for clarity. This doesn't actually make a difference for ip_tables, but is needed for ip6_tables and arp_tables. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:30 -08:00
Patrick McHardy	30c08c41be	[NETFILTER]: ip_tables: account for struct ipt_entry/struct compat_ipt_entry size diff Account for size differences when dumping entries or calculating the entry positions. This doesn't actually make any difference for IPv4 since the structures have the same size, but its logically correct and needed for IPv6. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:30 -08:00
Patrick McHardy	8956695131	[NETFILTER]: x_tables: make xt_compat_match_from_user usable in iterator macros Make xt_compat_match_from_user return an int to make it usable in the *tables iterator macros and kill a now unnecessary wrapper function. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:28 -08:00
Patrick McHardy	4b4782486d	[NETFILTER]: ip_tables: reformat compat code The compat code has some very odd formating, clean it up before porting it to ip6_tables. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:27 -08:00
Patrick McHardy	ac8e27fd89	[NETFILTER]: ip_tables: kill useless wrapper Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:27 -08:00
Joe Perches	f97c1e0c6e	[IPV4] net/ipv4: Use ipv4_is_<type> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:15 -08:00
Pavel Emelyanov	586f121152	[IPV4]: Switch users of ipv4_devconf(_all) to use the pernet one These are scattered over the code, but almost all the "critical" places already have the proper struct net at hand except for snmp proc showing function and routing rtnl handler. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:12 -08:00
Pavel Emelyanov	9355bbd685	[IPV4]: Switch users of ipv4_devconf_dflt to use the pernet one They are all collected in the net/ipv4/devinet.c file and mostly use the IPV4_DEVCONF_DFLT macro. So I add the net parameter to it and patch users accordingly. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:11 -08:00
Pavel Emelyanov	752d14dc6a	[IPV4]: Move the devinet pointers on the struct net This is the core. Add all and default pointers on the netns_ipv4 and register a new pernet subsys to initialize them. Also add the ctl_table_header to register the net.ipv4.ip_forward ctl. I don't allocate additional memory for init_net, but use global devinets. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:11 -08:00
Pavel Emelyanov	c0ce9fb304	[IPV4]: Store the net pointer on devinet's ctl tables Some handers and strategies of devinet sysctl tables need to know the net to propagate the ctl change to all the net devices. I use the (currently unused) extra2 pointer on the tables to get it. Holding the reference on the struct net is not possible, because otherwise we'll get a net->ctl_table->net circular dependency. But since the ctl tables are unregistered during the net destruction, this is safe to get it w/o additional protection. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:10 -08:00
Pavel Emelyanov	32e569b727	[IPV4]: Pass the net pointer to the arp_req_set_proxy() This one will need to set the IPV4_DEVCONF_ALL(PROXY_ARP), but there's no ways to get the net right in place, so we have to pull one from the inet_ioctl's struct sock. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:09 -08:00
Pavel Emelyanov	ea40b324d7	[IPV4]: Make __devinet_sysctl_register return an error Currently, this function is void, so failures in creating sysctls for new/renamed devices are not reported to anywhere. Fixing this is another complex (needed?) task, but this return value is needed during the namespaces creation to handle the case, when we failed to create "all" and "default" entries. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:09 -08:00
Herbert Xu	9055e051b8	[UDP]: Move udp_stats_in6 into net/ipv4/udp.c Now that external users may increment the counters directly, we need to ensure that udp_stats_in6 is always available. Otherwise we'd either have to requrie the external users to be built as modules or ipv6 to be built-in. This isn't too bad because udp_stats_in6 is just a pair of pointers plus an EXPORT, e.g., just 40 (16 + 24) bytes on x86-64. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:06 -08:00
YOSHIFUJI Hideaki	5661df7b6c	[IPVS]: Use htons() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:58:02 -08:00
Denis V. Lunev	f5026fabda	[IPV4]: Thresholds in fib_trie.c are used as consts, so make them const. There are several thresholds for trie fib hash management. They are used in the code as a constants. Make them constants from the compiler point of view. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:57 -08:00
Michal Schmidt	ee34c1eb35	[IP_GRE]: Rebinding of GRE tunnels to other interfaces This is similar to the change already done for IPIP tunnels. Once created, a GRE tunnel can't be bound to another device. To reproduce: # create a tunnel: ip tunnel add tunneltest0 mode gre remote 10.0.0.1 dev eth0 # try to change the bounding device from eth0 to eth1: ip tunnel change tunneltest0 dev eth1 # show the result: ip tunnel show tunneltest0 tunneltest0: gre/ip remote 10.0.0.1 local any dev eth0 ttl inherit Notice the bound device has not changed from eth0 to eth1. This patch fixes it. When changing the binding, it also recalculates the MTU according to the new bound device's MTU. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:56 -08:00
Herbert Xu	aebcf82c1f	[IPSEC]: Do not let packets pass when ICMP flag is off This fixes a logical error in ICMP policy checks which lets packets through if the state ICMP flag is off. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:43 -08:00
Herbert Xu	bb72845e69	[IPSEC]: Make callers of xfrm_lookup to use XFRM_LOOKUP_WAIT This patch converts all callers of xfrm_lookup that used an explicit value of 1 to indiciate blocking to use the new flag XFRM_LOOKUP_WAIT. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:42 -08:00
Herbert Xu	7233b9f33e	[IPSEC]: Fix reversed ICMP6 policy check The policy check I added for ICMP on IPv6 is reversed. This patch fixes that. It also adds an skb->sp check so that unprotected packets that fail the policy check do not crash the machine. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:41 -08:00
Michal Schmidt	5533995b62	[IPIP]: Allow rebinding the tunnel to another interface Once created, an IP tunnel can't be bound to another device. (reported as https://bugzilla.redhat.com/show_bug.cgi?id=419671) To reproduce: # create a tunnel: ip tunnel add tunneltest0 mode ipip remote 10.0.0.1 dev eth0 # try to change the bounding device from eth0 to eth1: ip tunnel change tunneltest0 dev eth1 # show the result: ip tunnel show tunneltest0 tunneltest0: ip/ip remote 10.0.0.1 local any dev eth0 ttl inherit Notice the bound device has not changed from eth0 to eth1. This patch fixes it. When changing the binding, it also recalculates the MTU according to the new bound device's MTU. If the change is acceptable, I'll do the same for GRE and SIT tunnels. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:25 -08:00
Herbert Xu	8b7817f3a9	[IPSEC]: Add ICMP host relookup support RFC 4301 requires us to relookup ICMP traffic that does not match any policies using the reverse of its payload. This patch implements this for ICMP traffic that originates from or terminates on localhost. This is activated on outbound with the new policy flag XFRM_POLICY_ICMP, and on inbound by the new state flag XFRM_STATE_ICMP. On inbound the policy check is now performed by the ICMP protocol so that it can repeat the policy check where necessary. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:23 -08:00
Herbert Xu	d5422efe68	[IPSEC]: Added xfrm_decode_session_reverse and xfrmX_policy_check_reverse RFC 4301 requires us to relookup ICMP traffic that does not match any policies using the reverse of its payload. This patch adds the functions xfrm_decode_session_reverse and xfrmX_policy_check_reverse so we can get the reverse flow to perform such a lookup. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:22 -08:00
Pavel Emelyanov	51602b2a5e	[IPV4]: Cleanup sysctl manipulations in devinet.c This includes: * moving neigh_sysctl_(un)register calls inside devinet_sysctl_(un)register ones, as they are always called in pairs; * making __devinet_sysctl_unregister() to unregister the ipv4_devconf struct, while original devinet_sysctl_unregister() works with the in_device to handle both - devconf and neigh sysctls; * make stubs for CONFIG_SYSCTL=n case to get rid of in-code ifdefs. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:09 -08:00
Pavel Emelyanov	95c9382a34	[INET]: Use BUILD_BUG_ON in inet_timewait_sock.c checks Make the INET_TWDR_TWKILL_SLOTS vs sizeof(twdr->thread_slots) check nicer. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:08 -08:00
Pavel Emelyanov	1f9e636ea2	[TCP]: Use BUILD_BUG_ON for tcp_skb_cb size checking The sizeof(struct tcp_skb_cb) should not be less than the sizeof(skb->cb). This is checked in net/ipv4/tcp.c, but this check can be made more gracefully. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:07 -08:00
YOSHIFUJI Hideaki	c69bce20dd	[NET]: Remove unused "mibalign" argument for snmp_mib_init(). With fixes from Arnaldo Carvalho de Melo. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:02 -08:00
Denis V. Lunev	971b893e79	[IPV4]: last default route is a fib table property Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:01 -08:00
Denis V. Lunev	a2bbe6822f	[IPV4]: Unify assignment of fi to fib_result Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:01 -08:00
Denis V. Lunev	c17860a039	[IPV4]: no need pass pointer to a default into fib_detect_death ipv4: no need pass pointer to a default into fib_detect_death Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:57:00 -08:00
Denis Cheng	1596c97aa8	[IPV4] net/ipv4/cipso_ipv4.c: use LIST_HEAD instead of LIST_HEAD_INIT single list_head variable initialized with LIST_HEAD_INIT could almost always can be replaced with LIST_HEAD declaration, this shrinks the code and looks better. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:52 -08:00
Eric W. Biederman	877a9bff38	[IPV4]: Move trie_local and trie_main into the proc iterator. We only use these variables when displaying the trie in proc so place them into the iterator to make this explicit. We should probably do something smarter to handle the CONFIG_IP_MULTIPLE_TABLES case but at least this makes it clear that the silliness is limited to the display in /proc. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:49 -08:00
Eric W. Biederman	bb80317586	[IPV4]: Remove ip_fib_local_table and ip_fib_main_table defines. There are only 2 users and it doesn't hurt to call fib_get_table instead, and it makes it easier to make the fib network namespace aware. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:49 -08:00
Denis V. Lunev	5a3e55d68e	[NET]: Multiple namespaces in the all dst_ifdown routines. Move dst entries to a namespace loopback to catch refcounting leaks. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:44 -08:00
Pavel Emelyanov	f8b33fdfaf	[ARP]: Consolidate some code in arp_req_set/delete_publc The PROXY_ARP is set on devconfigs in a similar way in both calls. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:38 -08:00
Pavel Emelyanov	46479b4329	[ARP]: Minus one level of ndentation in arp_req_delete The same cleanup for deletion requests. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:38 -08:00
Pavel Emelyanov	43dc170117	[ARP]: Minus one level of indentation in arp_req_set The ATF_PUBL requests are handled completely separate from the others. Emphasize it with a separate function. This also reduces the indentation level. The same issue exists with the arp_delete_request, but when I tried to make it in one patch diff produced completely unreadable patch. So I split it into two, but they may be done with one commit. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:37 -08:00
Pavel Emelyanov	1ff1cc202e	[IPV4] ROUTE: Convert rt_hash_lock_init() macro into function There's no need in having this function exist in a form of macro. Properly formatted function looks much better. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:36 -08:00
Pavel Emelyanov	107f163428	[IPV4] ROUTE: Clean up proc files creation. The rt_cache, stats/rt_cache and rt_acct(optional) files creation looks a bit messy. Clean this out and join them to other proc-related functions under the proper ifdef. The struct net * argument in a new function will help net namespaces patches look nicer. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:36 -08:00
Pavel Emelyanov	78c686e9fa	[IPV4] ROUTE: Collect proc-related functions together The net/ipv4/route.c file declares some entries for proc to dump some routing info. The reading functions are scattered over this file - collect them together. Besides, remove a useless IP_RT_ACCT_CPU macro. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:35 -08:00
Herbert Xu	a59322be07	[UDP]: Only increment counter on first peek/recv The previous move of the the UDP inDatagrams counter caused each peek of the same packet to be counted separately. This may be undesirable. This patch fixes this by adding a bit to sk_buff to record whether this packet has already been seen through skb_recv_datagram. We then only increment the counter when the packet is seen for the first time. The only dodgy part is the fact that skb_recv_datagram doesn't have a good way of returning this new bit of information. So I've added a new function __skb_recv_datagram that does return this and made skb_recv_datagram a wrapper around it. The plan is to eventually replace all uses of skb_recv_datagram with this new function at which time it can be renamed its proper name. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:34 -08:00
Herbert Xu	1781f7f580	[UDP]: Restore missing inDatagrams increments The previous move of the the UDP inDatagrams counter caused the counting of encapsulated packets, SUNRPC data (as opposed to call) packets and RXRPC packets to go missing. This patch restores all of these. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:33 -08:00
Herbert Xu	27ab256864	[UDP]: Avoid repeated counting of checksum errors due to peeking Currently it is possible for two processes to peek on the same socket and end up incrementing the error counter twice for the same packet. This patch fixes it by making skb_kill_datagram return whether it succeeded in unlinking the packet and only incrementing the counter if it did. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:32 -08:00
Pavel Emelyanov	68dd299bc8	[INET]: Merge sys.net.ipv4.ip_forward and sys.net.ipv4.conf.all.forwarding AFAIS these two entries should do the same thing - change the forwarding state on ipv4_devconf and on all the devices. I propose to merge the handlers together using ctl paths. The inet_forward_change() is static after this and I move it higher to be closer to other "propagation" helpers and to avoid diff making patches based on { and } matching :) i.e. - make them easier to read. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:31 -08:00
Pavel Emelyanov	3e37c3f997	[IPV4]: Use ctl paths to register net/ipv4/ table This is the same as I did for the net/core/ table in the second patch in his series: use the paths and isolate the whole table in the .c file. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:27 -08:00
Pavel Emelyanov	9ba6397976	[IPV4]: Cleanup the sysctl_net_ipv4.c file This includes several cleanups: * tune Makefile to compile out this file when SYSCTL=n. Now it looks like net/core/sysctl_net_core.c one; * move the ipv4_config to af_inet.c to exist all the time; * remove additional sysctl_ip_nonlocal_bind declaration (it is already declared in net/ip.h); * remove no nonger needed ifdefs from this file. This is a preparation for using ctl paths for net/ipv4/ sysctl table. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:27 -08:00
Patrick McHardy	4b3d15ef4a	[NETFILTER]: {nfnetlink,ip,ip6}_queue: kill issue_verdict Now that issue_verdict doesn't need to free the queue entries anymore, all it does is disable local BHs and call nf_reinject. Move the BH disabling to the okfn invocation in nf_reinject and kill the issue_verdict functions. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:15 -08:00
Patrick McHardy	02f014d888	[NETFILTER]: nf_queue: move list_head/skb/id to struct nf_info Move common fields for queue management to struct nf_info and rename it to struct nf_queue_entry. The avoids one allocation/free per packet and simplifies the code a bit. Alternatively we could add some private room at the tail, but since all current users use identical structs this seems easier. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:14 -08:00
Patrick McHardy	9521409265	[NETFILTER]: ip_queue: deobfuscate entry lookups A queue entry lookup currently looks like this: ipq_find_dequeue_entry -> __ipq_find_dequeue_entry -> __ipq_find_entry -> cmpfn -> id_cmp Use simple open-coded list walking and kill the cmpfn for ipq_find_dequeue_entry. Instead add it to ipq_flush (after similar cleanups) and use ipq_flush for both complete flushes and flushing entries related to a device. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:12 -08:00
Patrick McHardy	0ac41e8146	[NETFILTER]: {nf_netlink,ip,ip6}_queue: use list_for_each_entry Use list_add_tail/list_for_each_entry instead of list_add and list_for_each_prev as a preparation for switching to RCU. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:11 -08:00
Patrick McHardy	c01cd429fc	[NETFILTER]: nf_queue: move queueing related functions/struct to seperate header Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:10 -08:00
Patrick McHardy	f9d8928f83	[NETFILTER]: nf_queue: remove unused data pointer Remove the data pointer from struct nf_queue_handler. It has never been used and is useless for the only handler that really matters, nfnetlink_queue, since the handler is shared between all instances. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:10 -08:00
Patrick McHardy	e3ac529815	[NETFILTER]: nf_queue: make queue_handler const Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:09 -08:00
Patrick McHardy	1999414a4e	[NETFILTER]: Mark hooks __read_mostly Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:07 -08:00
Patrick McHardy	41c5b31703	[NETFILTER]: Use nf_register_hooks for multiple registrations Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:06 -08:00
Patrick McHardy	279c2c74b6	[NETFILTER]: nf_conntrack_proto_icmp: kill extern declaration in .c file Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:05 -08:00
Patrick McHardy	1841a4c7ae	[NETFILTER]: nf_ct_h323: remove ipv6 module dependency nf_conntrack_h323 needs ip6_route_output for the call forwarding filter. Add a ->route function to nf_afinfo and use that to avoid pulling in the ipv6 module. Fix the #ifdef for the IPv6 code while I'm at it - the IPv6 support is only needed when IPv6 conntrack is enabled. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:05 -08:00
Maciej Soltysiak	17dfc93f6d	[NETFILTER]: {ip,ip6}t_LOG: log GID Log GID in addition to UID Signed-off-by: Maciej Soltysiak <maciej.soltysiak@ae.poznan.pl> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:03 -08:00
Patrick McHardy	cb76c6a597	[NETFILTER]: ip_tables: remove obsolete SAME target Remove the ipt_SAME target as scheduled in feature-removal-schedule. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:01 -08:00
Jan Engelhardt	c9fd496809	[NETFILTER]: Merge ipt_TOS into xt_DSCP Merge ipt_TOS into xt_DSCP. Merge ipt_TOS (tos v0 target) into xt_DSCP. They both modify the same field in the IPv4 header, so it seems reasonable to keep them in one piece. This is part two of the implicit 4-patch series to move tos to xtables and extend it by IPv6. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:59 -08:00
Jan Engelhardt	c3b33e6a2c	[NETFILTER]: Merge ipt_tos into xt_dscp Merge ipt_tos into xt_dscp. Merge ipt_tos (tos v0 match) into xt_dscp. They both match on the same field in the IPv4 header, so it seems reasonable to keep them in one piece. This is part one of the implicit 4-patch series to move tos to xtables and extend it by IPv6. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:58 -08:00
Jan Engelhardt	4c37799ccf	[NETFILTER]: Use lowercase names for matches in Kconfig Unify netfilter match kconfig descriptions Consistently use lowercase for matches in kconfig one-line descriptions and name the match module. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:57 -08:00
Laszlo Attila Toth	e2cf5ecbea	[NETFILTER]: ipt_addrtype: limit address type checking to an interface Addrtype match has a new revision (1), which lets address type checking limited to the interface the current packet belongs to. Either incoming or outgoing interface can be used depending on the current hook. In the FORWARD hook two maches should be used if both interfaces have to be checked. The new structure is ipt_addrtype_info_v1. Revision 0 lets older userspace programs use the match as earlier. ipt_addrtype_info is used. Signed-off-by: Laszlo Attila Toth <panther@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:56 -08:00
Laszlo Attila Toth	0553811612	[IPV4]: Add inet_dev_addr_type() Address type search can be limited to an interface by inet_dev_addr_type function. Signed-off-by: Laszlo Attila Toth <panther@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:56 -08:00
Jan Engelhardt	0265ab44ba	[NETFILTER]: merge ipt_owner/ip6t_owner in xt_owner xt_owner merges ipt_owner and ip6t_owner, and adds a flag to match on socket (non-)existence. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:55 -08:00
Patrick McHardy	9e67d5a739	[NETFILTER]: x_tables: remove obsolete overflow check We're not multiplying the size with the number of CPUs anymore, so the check is obsolete. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:54 -08:00
Eric Dumazet	259d4e41f3	[NETFILTER]: x_tables: struct xt_table_info diet Instead of using a big array of NR_CPUS entries, we can compute the size needed at runtime, using nr_cpu_ids This should save some ram (especially on David's machines where NR_CPUS=4096 : 32 KB can be saved per table, and 64KB for dynamically allocated ones (because of slab/slub alignements) ) In particular, the 'bootstrap' tables are not any more static (in data section) but on stack as their size is now very small. This also should reduce the size used on stack in compat functions (get_info() declares an automatic variable, that could be bigger than kernel stack size for big NR_CPUS) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:54 -08:00
Jan Engelhardt	d3c5ee6d54	[NETFILTER]: x_tables: consistent and unique symbol names Give all Netfilter modules consistent and unique symbol names. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:53 -08:00
Li Zefan	4c61097957	[NETFILTER]: replace list_for_each with list_for_each_entry Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:52 -08:00
Herbert Xu	2fcb45b6b8	[IPSEC]: Use the correct family for input state lookup When merging the input paths of IPsec I accidentally left a hard-coded AF_INET for the state lookup call. This broke IPv6 obviously. This patch fixes by getting the input callers to specify the family through skb->cb. Credit goes to Kazunori Miyazawa for diagnosing this and providing an initial patch. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:49 -08:00
Wang Chen	bbca17680f	[UDP]: Counter increment should be in USER mode for recvmsg System calls should be USER. So change the BH to USER for UDP*_INC_STATS_BH(). Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:49 -08:00
Wang Chen	b2bf1e2659	[UDP]: Clean up for IS_UDPLITE macro Since we have macro IS_UDPLITE, we can use it. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:48 -08:00
Wang Chen	cb75994ec3	[UDP]: Defer InDataGrams increment until recvmsg() does checksum Thanks dave, herbert, gerrit, andi and other people for your discussion about this problem. UdpInDatagrams can be confusing because it counts packets that might be dropped later. Move UdpInDatagrams into recvmsg() as allowed by the RFC. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:47 -08:00
Ilpo Järvinen	6859d49475	[TCP]: Abstract tp->highest_sack accessing & point to next skb Pointing to the next skb is necessary to avoid referencing already SACKed skbs which will soon be on a separate list. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:46 -08:00
Ilpo Järvinen	7201883599	[TCP]: Cleanup local variables of clean_rtx_queue Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:46 -08:00
Ilpo Järvinen	ea60658cde	[TCP]: Add unlikely() to urgent handling in clean_rtx_queue Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:45 -08:00
Ilpo Järvinen	89d478f7f2	[TCP]: Remove duplicated code block from clean_rtx_queue Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:44 -08:00
Ilpo Järvinen	234b686070	[TCP]: Add tcp_for_write_queue_from_safe and use it in mtu_probe Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:43 -08:00
Ilpo Järvinen	d67c58e9ae	[TCP]: Remove local variable and use packets_in_flight directly Lines won't be that long and it's compiler's job to optimize them. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:43 -08:00
Ilpo Järvinen	50c4817e99	[TCP]: MTUprobe: prepare skb fields earlier They better be valid when call to write_queue functions is made once things that follow are going in. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:42 -08:00
Ilpo Järvinen	c3a05c6050	[TCP]: Cong.ctrl modules: remove unused good_ack from cong_avoid Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:41 -08:00
Ilpo Järvinen	ede9f3b186	[TCP]: Unite identical code from two seqno split blocks Bogus seqno compares just mislead, the code is identical for both sides of the seqno compare (and was even executed just once because of return in between). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:41 -08:00
Ilpo Järvinen	407ef1de03	[TCP]: Remove superflucious FLAG_DATA_SACKED To get there, highest_sack must have advanced. When it advances, a new skb is SACKed, which already sets that FLAG. Besides, the original purpose of it has puzzled me, never understood why LOST bit setting of retransmitted skb is marked with FLAG_DATA_SACKED. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:40 -08:00
Ilpo Järvinen	bce392f3b0	[TCP]: Move LOSTRETRANS MIB outside !(L\|S) check Usually those skbs will have L set, not counting them as lost retransmissions is misleading. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:39 -08:00
Pavel Emelyanov	bfada697bd	[IPV4]: Use ctl paths to register devinet sysctls This looks very much like the patch for neighbors. The path is also located on the stack and is prepared inside the function. This time, the call to the registering function is guarded with the RTNL lock, but I decided to keep it on the stack not to litter the devinet.c file with unneeded names and to make it look similar to the neighbors code. This is also intended to help us with the net namespaces and saves the vmlinux size as well - this time by more than 670 bytes. The difference from the first version is just the patch offsets, that changed due to changes in the patch #2. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:37 -08:00
Pavel Emelyanov	66f27a5203	[IPV4]: Unify and cleanup calls to devinet_sysctl_register Currently this call is used to register sysctls for devices and for the "default" confs. The "all" sysctls are registered separately. Besides, the inet_device is passed to this function, but it is not needed there at all - just the device name and ifindex are required. Thanks to Herbert, who noticed, that this call doesn't even require the devconf pointer (the last argument) - all we need we can take from the in_device itself. The fix is to make a __devinet_sysctl_register(), which registers sysctls for all "devices" we need, including "default" and "all" :) The original devinet_sysctl_register() works with struct net_device, not the inet_device, and calls the introduced function, passing the device name and ifindex (to be used as procname and ctl_name) into it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:36 -08:00
Pavel Emelyanov	9fa8964299	[IPV4]: Cleanup the devinet_sysctl_register I moved the call to kmalloc() from the *t declaration into the code (this is confusing when a variable is initialized with the result of some call) and removed unneeded comment near the error path. Just like I did with the neigh ctl-s. Besides, I fixed the goto's and the labels - they were indented with spaces :( Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:25 -08:00
Patrick McHardy	be0ea7d5da	[NETFILTER]: Convert old checksum helper names Kill the defines again, convert to the new checksum helper names and remove the dependency of NET_ACT_NAT on NETFILTER. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:15 -08:00
Ilpo Järvinen	ea4f76ae13	[TCP]: Two fixes to new sacktag code 1) Skip condition used to be wrong way around which made SACK processing very broken, missed many blocks because of that. 2) Use highest_sack advancement only if some skbs are already sacked because otherwise tcp_write_queue_next may move things too far (occurs mainly with GSO). The other similar advancement is not problem because highest_sack was previosly put to point a sacked skb. These problems were located because of problem report from Matt Mathis <mathis@psc.edu>. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:10 -08:00
Pavel Emelyanov	df97c708d5	[NET]: Eliminate unused argument from sk_stream_alloc_pskb The 3rd argument is always zero (according to grep :) Eliminate it and merge the function with sk_stream_alloc_skb. This saves 44 more bytes, and together with the previous patch we have: add/remove: 1/0 grow/shrink: 0/8 up/down: 183/-751 (-568) function old new delta sk_stream_alloc_skb - 183 +183 ip_rt_init 529 525 -4 arp_ignore 112 107 -5 __inet_lookup_listener 284 274 -10 tcp_sendmsg 2583 2481 -102 tcp_sendpage 1449 1300 -149 tso_fragment 417 258 -159 tcp_fragment 1149 988 -161 __tcp_push_pending_frames 1998 1837 -161 Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:08 -08:00
Pavel Emelyanov	f561d0f27d	[NET]: Uninline the sk_stream_alloc_pskb This function seems too big for inlining. Indeed, it saves half-a-kilo when uninlined: add/remove: 1/0 grow/shrink: 0/7 up/down: 195/-719 (-524) function old new delta sk_stream_alloc_pskb - 195 +195 ip_rt_init 529 525 -4 __inet_lookup_listener 284 274 -10 tcp_sendmsg 2583 2486 -97 tcp_sendpage 1449 1305 -144 tso_fragment 417 267 -150 tcp_fragment 1149 992 -157 __tcp_push_pending_frames 1998 1841 -157 Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:07 -08:00
Joonwoo Park	3015a347dc	[IPV4] fib_hash: kmalloc + memset conversion to kzalloc fib_hash: kmalloc + memset conversion to kzalloc fix to avoid memset entirely. Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:07 -08:00
Joonwoo Park	88f8349164	[IPV4] fib_semantics: kmalloc + memset conversion to kzalloc fib_semantics: kmalloc + memset conversion to kzalloc fix to avoid memset entirely. Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:06 -08:00
Ilpo Järvinen	8512430e55	[TCP]: Move FRTO checks out from write queue abstraction funcs Better place exists in update_send_head (other non-queue related adjustments are done there as well) which is the only caller of tcp_advance_send_head (now that the bogus call from mtu_probe is gone). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:05 -08:00
Pavel Emelyanov	8d8ad9d7c4	[NET]: Name magic constants in sock_wake_async() The sock_wake_async() performs a bit different actions depending on "how" argument. Unfortunately this argument ony has numerical magic values. I propose to give names to their constants to help people reading this function callers understand what's going on without looking into this function all the time. I suppose this is 2.6.25 material, but if it's not (or the naming seems poor/bad/awful), I can rework it against the current net-2.6 tree. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:03 -08:00
Pavel Emelyanov	85b606800b	[IPVS]: Relax the module get/put in ip_vs_app.c Both try_module_get/module_put already handle the module == NULL case, so no need in manual checking. This patch fits both net-2.6 and net-2.6.25. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:35 -08:00
Eric Dumazet	beb659bd8c	[PATCH] IPV4 : Move ip route cache flush (secret_rebuild) from softirq to workqueue Every 600 seconds (ip_rt_secret_interval), a softirq flush of the whole ip route cache is triggered. On loaded machines, this can starve softirq for many seconds and can eventually crash. This patch moves this flush to a workqueue context, using the worker we intoduced in commit `39c90ece75` (IPV4: Convert rt_check_expire() from softirq processing to workqueue.) Also, immediate flushes (echo 0 >/proc/sys/net/ipv4/route/flush) are using rt_do_flush() helper function, wich take attention to rescheduling. Next step will be to handle delayed flushes ("echo -1 >/proc/sys/net/ipv4/route/flush" or "ip route flush cache") Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:33 -08:00
Pavel Emelyanov	42a73808ed	[RAW]: Consolidate proc interface. Both ipv6/raw.c and ipv4/raw.c use the seq files to walk through the raw sockets hash and show them. The "walking" code is rather huge, but is identical in both cases. The difference is the hash table to walk over and the protocol family to check (this was not in the first virsion of the patch, which was noticed by YOSHIFUJI) Make the ->open store the needed hash table and the family on the allocated raw_iter_state and make the start/next/stop callbacks work with it. This removes most of the code. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:32 -08:00
Pavel Emelyanov	ab70768ec7	[RAW]: Consolidate proto->unhash callback Same as the ->hash one, this is easily consolidated. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:31 -08:00
Pavel Emelyanov	65b4c50b47	[RAW]: Consolidate proto->hash callback Having the raw_hashinfo it's easy to consolidate the raw[46]_hash functions. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:31 -08:00
Pavel Emelyanov	b673e4dfc8	[RAW]: Introduce raw_hashinfo structure The ipv4/raw.c and ipv6/raw.c contain many common code (most of which is proc interface) which can be consolidated. Most of the places to consolidate deal with the raw sockets hashtable, so introduce a struct raw_hashinfo which describes the raw sockets hash. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:30 -08:00
Pavel Emelyanov	7bc54c9030	[IPv4] RAW: Compact the API for the kernel The raw sockets functions are explicitly used from inside the kernel in two places: 1. in ip_local_deliver_finish to intercept skb-s 2. in icmp_error For this purposes many functions and even data structures, that are naturally internal for raw protocol, are exported. Compact the API to two functions and hide all the other (including hash table and rwlock) inside the net/ipv4/raw.c Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:28 -08:00
Denis V. Lunev	97c53cacf0	[NET]: Make rtnetlink infrastructure network namespace aware (v3) After this patch none of the netlink callback support anything except the initial network namespace but the rtnetlink infrastructure now handles multiple network namespaces. Changes from v2: - IPv6 addrlabel processing Changes from v1: - no need for special rtnl_unlock handling - fixed IPv6 ndisc Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:25 -08:00
Denis V. Lunev	b854272b3c	[NET]: Modify all rtnetlink methods to only work in the initial namespace (v2) Before I can enable rtnetlink to work in all network namespaces I need to be certain that something won't break. So this patch deliberately disables all of the rtnletlink methods in everything except the initial network namespace. After the methods have been audited this extra check can be disabled. Changes from v1: - added IPv6 addrlabel protection Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2008-01-28 14:54:24 -08:00
David S. Miller	1b0b04f9fb	[IPCONFIG]: Mark vendor_class_identifier as __initdata. Based upon a suggestion by Francois Romieu. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:22 -08:00
Rumen G. Bogdanovski	b209639e8a	[IPVS]: Create synced connections with their real state With this patch the synced connections are created with their real state, which can be changed on the next synchronizations if necessary. This way on fail-over all the connections will be treated according to their actual state, causing no scheduling problems (the active and the nonactive connections have different weights in the schedulers). The backwards compatibility is preserved and the existing tools will show the true connection states even on the backup director. Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:21 -08:00
Rumen G. Bogdanovski	7a4fbb1fa4	[IPVS]: Flag synced connections and expose them in proc This patch labels the sync-created connections with IP_VS_CONN_F_SYNC flag and creates /proc/net/ip_vs_conn_sync to enable monitoring of the origin of the connections, if they are local or created by the synchronization. Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:21 -08:00
Ilpo Järvinen	20de20beba	[TCP]: Correct DSACK check placing Previously one of the in-block skip branches was missing it. Also, drop it from tail-fully-processed case because the next iteration will do exactly the same thing, i.e., process the SACK block that contains the DSACK information. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:15 -08:00
Eric Dumazet	8dbde28d97	[NET]: NET_CLS_ROUTE : convert ip_rt_acct to per_cpu variables ip_rt_acct needs 4096 bytes per cpu to perform some accounting. It is actually allocated as a single huge array [4096*NR_CPUS] (rounded up to a power of two) Converting it to a per cpu variable is wanted to : - Save space on machines were num_possible_cpus() < NR_CPUS - Better NUMA placement (each cpu gets memory on its node) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:08 -08:00
Ilpo Järvinen	68f8353b48	[TCP]: Rewrite SACK block processing & sack_recv_cache use Key points of this patch are: - In case new SACK information is advance only type, no skb processing below previously discovered highest point is done - Optimize cases below highest point too since there's no need to always go up to highest point (which is very likely still present in that SACK), this is not entirely true though because I'm dropping the fastpath_skb_hint which could previously optimize those cases even better. Whether that's significant, I'm not too sure. Currently it will provide skipping by walking. Combined with RB-tree, all skipping would become fast too regardless of window size (can be done incrementally later). Previously a number of cases in TCP SACK processing fails to take advantage of costly stored information in sack_recv_cache, most importantly, expected events such as cumulative ACK and new hole ACKs. Processing on such ACKs result in rather long walks building up latencies (which easily gets nasty when window is huge). Those latencies are often completely unnecessary compared with the amount of _new_ information received, usually for cumulative ACK there's no new information at all, yet TCP walks whole queue unnecessary potentially taking a number of costly cache misses on the way, etc.! Since the inclusion of highest_sack, there's a lot information that is very likely redundant (SACK fastpath hint stuff, fackets_out, highest_sack), though there's no ultimate guarantee that they'll remain the same whole the time (in all unearthly scenarios). Take advantage of this knowledge here and drop fastpath hint and use direct access to highest SACKed skb as a replacement. Effectively "special cased" fastpath is dropped. This change adds some complexity to introduce better coveraged "fastpath", though the added complexity should make TCP behave more cache friendly. The current ACK's SACK blocks are compared against each cached block individially and only ranges that are new are then scanned by the high constant walk. For other parts of write queue, even when in previously known part of the SACK blocks, a faster skip function is used (if necessary at all). In addition, whenever possible, TCP fast-forwards to highest_sack skb that was made available by an earlier patch. In typical case, no other things but this fast-forward and mandatory markings after that occur making the access pattern quite similar to the former fastpath "special case". DSACKs are special case that must always be walked. The local to recv_sack_cache copying could be more intelligent w.r.t DSACKs which are likely to be there only once but that is left to a separate patch. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:07 -08:00
Ilpo Järvinen	fd6dad616d	[TCP]: Earlier SACK block verification & simplify access to them Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:07 -08:00
Ilpo Järvinen	9e10c47cb9	[TCP]: Create tcp_sacktag_one(). Worker function that implements the main logic of the inner-most loop of tcp_sacktag_write_queue(). Idea was originally presented by David S. Miller. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:06 -08:00
Ilpo Järvinen	b7d4815f35	[TCP]: Prior_fackets can be replaced by highest_sack seq Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:05 -08:00
Ilpo Järvinen	9f58f3b721	[TCP]: Make lost retrans detection more self-contained Highest_sack_end_seq is no longer calculated in the loop, thus it can be pushed to the worker function altogether making that function independent of the sacktag. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:04 -08:00
Ilpo Järvinen	a47e5a988a	[TCP]: Convert highest_sack to sk_buff to allow direct access It is going to replace the sack fastpath hint quite soon... :-) Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:03 -08:00
Ilpo Järvinen	85cc391c0e	[TCP]: non-FACK SACK follows conservative SACK loss recovery Many assumptions that are true when no reordering or other strange events happen are not a part of the RFC3517. FACK implementation is based on such assumptions. Previously (before the rewrite) the non-FACK SACK was basically doing fast rexmit and then it times out all skbs when first cumulative ACK arrives, which cannot really be called SACK based recovery :-). RFC3517 SACK disables these things: - Per SKB timeouts & head timeout entry to recovery - Marking at least one skb while in recovery (RFC3517 does this only for the fast retransmission but not for the other skbs when cumulative ACKs arrive in the recovery) - Sacktag's loss detection flavors B and C (see comment before tcp_sacktag_write_queue) This does not implement the "last resort" rule 3 of NextSeg, which allows retransmissions also when not enough SACK blocks have yet arrived above a segment for IsLost to return true [RFC3517]. The implementation differs from RFC3517 in these points: - Rate-halving is used instead of FlightSize / 2 - Instead of using dupACKs to trigger the recovery, the number of SACK blocks is used as FACK does with SACK blocks+holes (which provides more accurate number). It seems that the difference can affect negatively only if the receiver does not generate SACK blocks at all even though it claimed to be SACK-capable. - Dupthresh is not a constant one. Dynamical adjustments include both holes and sacked segments (equal to what FACK has) due to complexity involved in determining the number sacked blocks between highest_sack and the reordered segment. Thus it's will be an over-estimate. Implementation note: tcp_clean_rtx_queue doesn't need a lost_cnt tweak because head skb at that point cannot be SACKED_ACKED (nor would such situation last for long enough to cause problems). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:03 -08:00
Ilpo Järvinen	f577111302	[TCP]: Extend reordering detection to cover CA_Loss partially This implements more accurately what is stated in sacktag's overall comment: "Both of these heuristics are not used in Loss state, when we cannot account for retransmits accurately." When CA_Loss state is entered, the state changer ensures that undo_marker is only set if no TCPCB_RETRANS skbs were found, thus having non-zero undo_marker in CA_Loss basically tells that the R-bits still accurately reflect the current state of TCP. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:02 -08:00
Ilpo Järvinen	b9d86585dc	[TCP]: Move !in_sack test earlier in sacktag & reorganize if()s All intermediate conditions include it already, make them simpler as well. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:01 -08:00
Rainer Jochem	62013dbb84	[IPV4] ipconfig: Implement DHCP Class-identifier From : Rainer Jochem <rainer.jochem@mpi-sb.mpg.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:59 -08:00
David S. Miller	294b4baf29	[IPSEC]: Kill afinfo->nf_post_routing After changeset: [NETFILTER]: Introduce NF_INET_ hook values It always evaluates to NF_INET_POST_ROUTING. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:55 -08:00
Patrick McHardy	6e23ae2a48	[NETFILTER]: Introduce NF_INET_ hook values The IPv4 and IPv6 hook values are identical, yet some code tries to figure out the "correct" value by looking at the address family. Introduce NF_INET_* values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__ section for userspace compatibility. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:55 -08:00

... 7 8 9 10 11 ...

2691 Commits