linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-01 01:31:44 +00:00

Author	SHA1	Message	Date
Denis V. Lunev	68fcadd16c	[TCP]: Place file operations directly into tcp_seq_afinfo. No need to have separate never-used variable. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:13:30 -07:00
Denis V. Lunev	9427c4b36b	[TCP]: Move seq_ops from tcp_iter_state to tcp_seq_afinfo. No need to create seq_operations for each instance of 'netstat'. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:12:13 -07:00
Denis V. Lunev	a4146b1b2c	[TCP]: Replace struct net on tcp_iter_state with seq_net_private. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:11:14 -07:00
Florian Westphal	4dfc281702	[Syncookies]: Add support for TCP options via timestamps. Allow the use of SACK and window scaling when syncookies are used and the client supports tcp timestamps. Options are encoded into the timestamp sent in the syn-ack and restored from the timestamp echo when the ack is received. Based on earlier work by Glenn Griffin. This patch avoids increasing the size of structs by encoding TCP options into the least significant bits of the timestamp and by not using any 'timestamp offset'. The downside is that the timestamp sent in the packet after the synack will increase by several seconds. changes since v1: don't duplicate timestamp echo decoding function, put it into ipv4/syncookie.c and have ipv6/syncookies.c use it. Feedback from Glenn Griffin: fix line indented with spaces, kill redundant if () Reviewed-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-10 03:12:40 -07:00
Florian Westphal	2051f11fb8	[TCP]: Shrink syncookie_secret by 8 byte. the first u32 copied from syncookie_secret is overwritten by the minute-counter four lines below. After adjusting the destination address, the size of syncookie_secret can be reduced accordingly. AFAICS, the only other user of syncookie_secret[] is the ipv6 syncookie support. Because ipv6 syncookies only grab 44 bytes from syncookie_secret[], this shouldn't affect them in any way. With fixes from Glenn Griffin. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Glenn Griffin <ggriffin.kernel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-23 22:21:28 -07:00
Patrick McManus	ec3c0982a2	[TCP]: TCP_DEFER_ACCEPT updates - process as established Change TCP_DEFER_ACCEPT implementation so that it transitions a connection to ESTABLISHED after handshake is complete instead of leaving it in SYN-RECV until some data arrvies. Place connection in accept queue when first data packet arrives from slow path. Benefits: - established connection is now reset if it never makes it to the accept queue - diagnostic state of established matches with the packet traces showing completed handshake - TCP_DEFER_ACCEPT timeouts are expressed in seconds and can now be enforced with reasonable accuracy instead of rounding up to next exponential back-off of syn-ack retry. Signed-off-by: Patrick McManus <mcmanus@ducksong.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 16:33:01 -07:00
Daniel Lezcano	6f8b13bcb3	[NETNS][IPV6] tcp6 - make proc per namespace Make the proc for tcp6 to be per namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 04:14:45 -07:00
Daniel Lezcano	f40c8174d3	[NETNS][IPV4] tcp - make proc handle the network namespaces This patch, like udp proc, makes the proc functions to take care of which namespace the socket belongs. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-21 04:13:54 -07:00
Glenn Griffin	c6aefafb7e	[TCP]: Add IPv6 support to TCP SYN cookies Updated to incorporate Eric's suggestion of using a per cpu buffer rather than allocating on the stack. Just a two line change, but will resend in it's entirety. Signed-off-by: Glenn Griffin <ggriffin.kernel@gmail.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-04 15:18:21 +09:00
Denis V. Lunev	9b0f976f27	[INET]: Remove struct net_proto_family* from _init calls. struct net_proto_family* is not used in icmp[v6]_init, ndisc_init, igmp_init and tcp_v4_init. Remove it. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-29 11:13:15 -08:00
Ilpo Järvinen	cea14e0ed6	[TCP]: Uninline tcp_is_cwnd_limited net/ipv4/tcp_cong.c: tcp_reno_cong_avoid \| -65 1 function changed, 65 bytes removed, diff: -65 net/ipv4/arp.c: arp_ignore \| -5 1 function changed, 5 bytes removed, diff: -5 net/ipv4/tcp_bic.c: bictcp_cong_avoid \| -57 1 function changed, 57 bytes removed, diff: -57 net/ipv4/tcp_cubic.c: bictcp_cong_avoid \| -61 1 function changed, 61 bytes removed, diff: -61 net/ipv4/tcp_highspeed.c: hstcp_cong_avoid \| -63 1 function changed, 63 bytes removed, diff: -63 net/ipv4/tcp_hybla.c: hybla_cong_avoid \| -85 1 function changed, 85 bytes removed, diff: -85 net/ipv4/tcp_htcp.c: htcp_cong_avoid \| -57 1 function changed, 57 bytes removed, diff: -57 net/ipv4/tcp_veno.c: tcp_veno_cong_avoid \| -52 1 function changed, 52 bytes removed, diff: -52 net/ipv4/tcp_scalable.c: tcp_scalable_cong_avoid \| -61 1 function changed, 61 bytes removed, diff: -61 net/ipv4/tcp_yeah.c: tcp_yeah_cong_avoid \| -75 1 function changed, 75 bytes removed, diff: -75 net/ipv4/tcp_illinois.c: tcp_illinois_cong_avoid \| -54 1 function changed, 54 bytes removed, diff: -54 net/dccp/ccids/ccid3.c: ccid3_update_send_interval \| -7 ccid3_hc_tx_packet_recv \| +7 2 functions changed, 7 bytes added, 7 bytes removed, diff: +0 net/ipv4/tcp_cong.c: tcp_is_cwnd_limited \| +88 1 function changed, 88 bytes added, diff: +88 built-in.o: 14 functions changed, 95 bytes added, 642 bytes removed, diff: -547 ...Again some gcc artifacts visible as well. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:48 -08:00
Ilpo Järvinen	490d504693	[TCP]: Uninline tcp_set_state net/ipv4/tcp.c: tcp_close_state \| -226 tcp_done \| -145 tcp_close \| -564 tcp_disconnect \| -141 4 functions changed, 1076 bytes removed, diff: -1076 net/ipv4/tcp_input.c: tcp_fin \| -86 tcp_rcv_state_process \| -164 2 functions changed, 250 bytes removed, diff: -250 net/ipv4/tcp_ipv4.c: tcp_v4_connect \| -209 1 function changed, 209 bytes removed, diff: -209 net/ipv4/arp.c: arp_ignore \| +5 1 function changed, 5 bytes added, diff: +5 net/ipv6/tcp_ipv6.c: tcp_v6_connect \| -158 1 function changed, 158 bytes removed, diff: -158 net/sunrpc/xprtsock.c: xs_sendpages \| -2 1 function changed, 2 bytes removed, diff: -2 net/dccp/ccids/ccid3.c: ccid3_update_send_interval \| +7 1 function changed, 7 bytes added, diff: +7 net/ipv4/tcp.c: tcp_set_state \| +238 1 function changed, 238 bytes added, diff: +238 built-in.o: 12 functions changed, 250 bytes added, 1695 bytes removed, diff: -1445 I've no explanation why some unrelated changes seem to occur consistently as well (arp_ignore, ccid3_update_send_interval; I checked the arp_ignore asm and it seems to be due to some reordered of operation order causing some extra opcodes to be generated). Still, the benefits are pretty obvious from the codiff's results. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:47 -08:00
Ilpo Järvinen	4828e7f49a	[TCP]: Remove TCPCB_URG & TCPCB_AT_TAIL as unnecessary The snd_up check should be enough. I suspect this has been there to provide a minor optimization in clean_rtx_queue which used to have a small if (!->sacked) block which could skip snd_up check among the other work. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:23 -08:00
Ilpo Järvinen	90840defab	[TCP]: Introduce tcp_wnd_end() to reduce line lengths Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:22 -08:00
Hideo Aoki	3ab224be6d	[NET] CORE: Introducing new memory accounting interface. This patch introduces new memory accounting functions for each network protocol. Most of them are renamed from memory accounting functions for stream protocols. At the same time, some stream memory accounting functions are removed since other functions do same thing. Renaming: sk_stream_free_skb() -> sk_wmem_free_skb() __sk_stream_mem_reclaim() -> __sk_mem_reclaim() sk_stream_mem_reclaim() -> sk_mem_reclaim() sk_stream_mem_schedule -> __sk_mem_schedule() sk_stream_pages() -> sk_mem_pages() sk_stream_rmem_schedule() -> sk_rmem_schedule() sk_stream_wmem_schedule() -> sk_wmem_schedule() sk_charge_skb() -> sk_mem_charge() Removeing sk_stream_rfree(): consolidates into sock_rfree() sk_stream_set_owner_r(): consolidates into skb_set_owner_r() sk_stream_mem_schedule() The following functions are added. sk_has_account(): check if the protocol supports accounting sk_mem_uncharge(): do the opposite of sk_mem_charge() In addition, to achieve consolidation, updating sk_wmem_queued is removed from sk_mem_charge(). Next, to consolidate memory accounting functions, this patch adds memory accounting calls to network core functions. Moreover, present memory accounting call is renamed to new accounting call. Finally we replace present memory accounting calls with new interface in TCP and SCTP. Signed-off-by: Takahiro Yasui <tyasui@redhat.com> Signed-off-by: Hideo Aoki <haoki@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:18 -08:00
YOSHIFUJI Hideaki	9cb5734e5b	[TCP]: Convert several length variable to unsigned. Several length variables cannot be negative, so convert int to unsigned int. This also allows us to do sane shift operations on those variables. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:59:56 -08:00
Ilpo Järvinen	6859d49475	[TCP]: Abstract tp->highest_sack accessing & point to next skb Pointing to the next skb is necessary to avoid referencing already SACKed skbs which will soon be on a separate list. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:46 -08:00
Ilpo Järvinen	234b686070	[TCP]: Add tcp_for_write_queue_from_safe and use it in mtu_probe Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:43 -08:00
Ilpo Järvinen	c3a05c6050	[TCP]: Cong.ctrl modules: remove unused good_ack from cong_avoid Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:41 -08:00
Ilpo Järvinen	8512430e55	[TCP]: Move FRTO checks out from write queue abstraction funcs Better place exists in update_send_head (other non-queue related adjustments are done there as well) which is the only caller of tcp_advance_send_head (now that the bogus call from mtu_probe is gone). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:05 -08:00
Ilpo Järvinen	68f8353b48	[TCP]: Rewrite SACK block processing & sack_recv_cache use Key points of this patch are: - In case new SACK information is advance only type, no skb processing below previously discovered highest point is done - Optimize cases below highest point too since there's no need to always go up to highest point (which is very likely still present in that SACK), this is not entirely true though because I'm dropping the fastpath_skb_hint which could previously optimize those cases even better. Whether that's significant, I'm not too sure. Currently it will provide skipping by walking. Combined with RB-tree, all skipping would become fast too regardless of window size (can be done incrementally later). Previously a number of cases in TCP SACK processing fails to take advantage of costly stored information in sack_recv_cache, most importantly, expected events such as cumulative ACK and new hole ACKs. Processing on such ACKs result in rather long walks building up latencies (which easily gets nasty when window is huge). Those latencies are often completely unnecessary compared with the amount of _new_ information received, usually for cumulative ACK there's no new information at all, yet TCP walks whole queue unnecessary potentially taking a number of costly cache misses on the way, etc.! Since the inclusion of highest_sack, there's a lot information that is very likely redundant (SACK fastpath hint stuff, fackets_out, highest_sack), though there's no ultimate guarantee that they'll remain the same whole the time (in all unearthly scenarios). Take advantage of this knowledge here and drop fastpath hint and use direct access to highest SACKed skb as a replacement. Effectively "special cased" fastpath is dropped. This change adds some complexity to introduce better coveraged "fastpath", though the added complexity should make TCP behave more cache friendly. The current ACK's SACK blocks are compared against each cached block individially and only ranges that are new are then scanned by the high constant walk. For other parts of write queue, even when in previously known part of the SACK blocks, a faster skip function is used (if necessary at all). In addition, whenever possible, TCP fast-forwards to highest_sack skb that was made available by an earlier patch. In typical case, no other things but this fast-forward and mandatory markings after that occur making the access pattern quite similar to the former fastpath "special case". DSACKs are special case that must always be walked. The local to recv_sack_cache copying could be more intelligent w.r.t DSACKs which are likely to be there only once but that is left to a separate patch. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:07 -08:00
Ilpo Järvinen	a47e5a988a	[TCP]: Convert highest_sack to sk_buff to allow direct access It is going to replace the sack fastpath hint quite soon... :-) Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:03 -08:00
Jens Axboe	9c55e01c0c	[TCP]: Splice receive support. Support for network splice receive. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:31 -08:00
Ilpo J�rvinen	6e42141009	[TCP] MTUprobe: fix potential sk_send_head corruption When the abstraction functions got added, conversion here was made incorrectly. As a result, the skb may end up pointing to skb which got included to the probe skb and then was freed. For it to trigger, however, skb_transmit must fail sending as well. Signed-off-by: Ilpo J�rvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-19 23:24:09 -08:00
Chuck Lever	c1bd24b768	[TCP]: Remove unneeded implicit type cast when calling tcp_minshall_update() The tcp_minshall_update() function is called in exactly one place, and is passed an unsigned integer for the mss_len argument. Make the sign of the argument match the sign of the passed variable in order to eliminate an unneeded implicit type cast and a mixed sign comparison in tcp_minshall_update(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-23 21:27:55 -07:00
David S. Miller	0800f17026	[TCP]: Minor coding style fixup. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:13 -07:00
Ilpo Järvinen	b76892051c	[TCP]: Avoid clearing sacktag hint in trivial situations There's no reason to clear the sacktag skb hint when small part of the rexmit queue changes. Account changes (if any) instead when fragmenting/collapsing. RTO/FRTO do not touch SACKED_ACKED bits so no need to discard SACK tag hint at all. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:12 -07:00
Ilpo Järvinen	5af4ec236f	[TCP]: clear_all_retrans_hints prefixed by tcp_ In addition, fix its function comment spacing. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>	2007-10-10 16:52:09 -07:00
Ilpo Järvinen	6ff03ac355	[TCP]: tcp_packets_out_inc to tcp_output.c (no callers elsewhere) Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:28 -07:00
Ilpo Järvinen	e9144bd8da	[TCP]: Remove unnecessary wrapper tcp_packets_out_dec Makes caller side more obvious, there's no need to have a wrapper for this oneliner! Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:27 -07:00
Ilpo Järvinen	e60402d0a9	[TCP]: Move sack_ok access to obviously named funcs & cleanup Previously code had IsReno/IsFack defined as macros that were local to tcp_input.c though sack_ok field has user elsewhere too for the same purpose. This changes them to static inlines as preferred according the current coding style and unifies the access to sack_ok across multiple files. Magic bitops of sack_ok for FACK and DSACK are also abstracted to functions with appropriate names. Note: - One sack_ok = 1 remains but that's self explanary, i.e., it enables sack - Couple of !IsReno cases are changed to tcp_is_sack - There were no users for IsDSack => I dropped it Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:48:00 -07:00
Ilpo Järvinen	b9c4595bc4	[TCP]: Don't panic if S+L skb is detected BUG_ON is an overkill. In fact, I was mislead by BUG_TRAP severity (equals to WARN_ON) which is much lower than BUG_ON's (that panics). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:59 -07:00
Ilpo Järvinen	1b6d427bb7	[TCP]: Reduce sacked_out with reno when purging write_queue Previously TCP had a transitional state during which reno counted segments that are already below the current window into sacked_out, which is now prevented. In addition, re-try now the unconditional S+L skb catching. This approach conservatively calls just remove_sack and leaves reset_sack() calls alone. The best solution to the whole problem would be to first calculate the new sacked_out fully (this patch does not move reno_sack_reset calls from original sites and thus does not implement this). However, that would require very invasive change to fastretrans_alert (perhaps even slicing it to two halves). Alternatively, all callers of tcp_packets_in_flight (i.e., users that depend on sacked_out) should be postponed until the new sacked_out has been calculated but it isn't any simpler alternative. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:58 -07:00
Ilpo Järvinen	005903bc3a	[TCP]: Left out sync->verify (the new meaning of it) & definify Left_out was dropped a while ago, thus leaving verifying consistency of the "left out" as only task for the function in question. Thus make it's name more appropriate. In addition, it is intentionally converted to #define instead of static inline because the location of the invariant failure is the most important thing to have if this ever triggers. I think it would have been helpful e.g. in this case where the location of the failure point had to be based on some quesswork: http://lkml.org/lkml/2007/5/2/464 ...Luckily the guesswork seems to have proved to be correct. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:57 -07:00
Ilpo Järvinen	83ae40885f	[TCP]: Add tcp_left_out(tp) "back" to get cleaner looking lines tp->left_out got removed but nothing came to replace it back then (users just did addition by themselves), so add function for users now. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:56 -07:00
Ilpo Järvinen	b5860bbac7	[TCP]: Tighten tcp_sock's belt, drop left_out It is easily calculable when needed and user are not that many after all. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:55 -07:00
Ilpo Järvinen	af610b4ca1	[TCP]: Add tcp_dec_pcount_approx int variant Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:54 -07:00
Ilpo Järvinen	bdf1ee5d3b	[TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it No other users exist for tcp_ecn.h. Very few things remain in tcp.h, for most TCP ECN functions callers reside within a single .c file and can be placed there. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:54 -07:00
David S. Miller	f8ab18d2d9	[TCP]: Fix MD5 signature handling on big-endian. Based upon a report and initial patch by Peter Lieven. tcp4_md5sig_key and tcp6_md5sig_key need to start with the exact same members as tcp_md5sig_key. Because they are both cast to that type by tcp_v{4,6}_md5_do_lookup(). Unfortunately tcp{4,6}_md5sig_key use a u16 for the key length instead of a u8, which is what tcp_md5sig_key uses. This just so happens to work by accident on little-endian, but on big-endian it doesn't. Instead of casting, just place tcp_md5sig_key as the first member of the address-family specific structures, adjust the access sites, and kill off the ugly casts. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-28 15:18:35 -07:00
David S. Miller	3516ffb0fe	[TCP]: Invoke tcp_sendmsg() directly, do not use inet_sendmsg(). As discovered by Evegniy Polyakov, if we try to sendmsg after a connection reset, we can do incredibly stupid things. The core issue is that inet_sendmsg() tries to autobind the socket, but we should never do that for TCP. Instead we should just go straight into TCP's sendmsg() code which will do all of the necessary state and pending socket error checks. TCP's sendpage already directly vectors to tcp_sendpage(), so this merely brings sendmsg() in line with that. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-08-02 19:42:28 -07:00
Stephen Hemminger	30cfd0baf0	[TCP]: congestion control API pass RTT in microseconds This patch changes the API for the callback that is done after an ACK is received. It solves a couple of issues: * Some congestion controls want higher resolution value of RTT (controlled by TCP_CONG_RTT_SAMPLE flag). These don't really want a ktime, but all compute a RTT in microseconds. * Other congestion control could use RTT at jiffies resolution. To keep API consistent the units should be the same for both cases, just the resolution should change. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:27:57 -07:00
Stephen Hemminger	16751347a0	[TCP]: remove unused argument to cong_avoid op None of the existing TCP congestion controls use the rtt value pased in the ca_ops->cong_avoid interface. Which is lucky because seq_rtt could have been -1 when handling a duplicate ack. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-18 01:46:58 -07:00
Pavel Emelianov	e4fd5da39f	[TCP]: Consolidate checking for tcp orphan count being too big. tcp_out_of_resources() and tcp_close() perform the same checking of number of orphan sockets. Move this code into common place. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-31 01:23:34 -07:00
Ilpo Järvinen	0ec96822d5	[TCP]: Use S+L catcher only with SACK for now TCP has a transitional state when SACK is not in use during which this invariant is temporarily broken. Without SACK, tcp_clean_rtx_queue does not decrement sacked_out. Therefore calls to tcp_sync_left_out before sacked_out is again corrected by tcp_fastretrans_alert can trigger this trap as sacked_out still has couple of segments that are already out of window. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:30:34 -07:00
Ilpo Järvinen	d551e4541d	[TCP] FRTO: RFC4138 allows Nagle override when new data must be sent This is a corner case where less than MSS sized new data thingie is awaiting in the send queue. For F-RTO to work correctly, a new data segment must be sent at certain point or F-RTO cannot be used at all. RFC4138 allows overriding of Nagle at that point. Implementation uses frto_counter states 2 and 3 to distinguish when Nagle override is needed. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 00:58:16 -07:00
Ilpo Järvinen	34588b4c04	[TCP]: Catch skb with S+L bugs earlier SACKED_ACKED and LOST are mutually exclusive with SACK, thus having their sum larger than packets_out is bug with SACK. Eventually these bugs trigger traps in the tcp_clean_rtx_queue with SACK but it's much more informative to do this here. Non-SACK TCP, however, could get more than packets_out duplicate ACKs which each increment sacked_out, so it makes sense to do this kind of limitting for non-SACK TCP but not for SACK enabled one. Perhaps the author had the opposite in mind but did the logic accidently wrong way around? Anyway, the sacked_out incrementer code for non-SACK already deals this issue before calling sync_left_out so this trapping can be done unconditionally. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 00:57:33 -07:00
Stephen Hemminger	164891aadf	[TCP]: Congestion control API update. Do some simple changes to make congestion control API faster/cleaner. * use ktime_t rather than timeval * merge rtt sampling into existing ack callback this means one indirect call versus two per ack. * use flags bits to store options/settings Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:45 -07:00
Ilpo Järvinen	9e412ba763	[TCP]: Sed magic converts func(sk, tp, ...) -> func(sk, ...) This is (mostly) automated change using magic: sed -e '/struct sock \sk/ N' -e '/struct sock \sk/ N' -e '/struct sock \sk/ N' -e '/struct sock \sk/ N' -e 's\|struct sock \sk,[\n\t ]struct tcp_sock \tp$[^{]\n{\n$\| struct sock \sk\1\tstruct tcp_sock tp = tcp_sk(sk);\n\|g' -e 's\|struct sock \sk, struct tcp_sock \tp\| struct sock \*sk\|g' -e 's\|sk, tp$[^-]$\|sk\1\|g' Fixed four unused variable (tp) warnings that were introduced. In addition, manually added newlines after local variables and tweaked function arguments positioning. $ gcc --version gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1) ... $ codiff -fV built-in.o.old built-in.o.new net/ipv4/route.c: rt_cache_flush \| +14 1 function changed, 14 bytes added net/ipv4/tcp.c: tcp_setsockopt \| -5 tcp_sendpage \| -25 tcp_sendmsg \| -16 3 functions changed, 46 bytes removed net/ipv4/tcp_input.c: tcp_try_undo_recovery \| +3 tcp_try_undo_dsack \| +2 tcp_mark_head_lost \| -12 tcp_ack \| -15 tcp_event_data_recv \| -32 tcp_rcv_state_process \| -10 tcp_rcv_established \| +1 7 functions changed, 6 bytes added, 69 bytes removed, diff: -63 net/ipv4/tcp_output.c: update_send_head \| -9 tcp_transmit_skb \| +19 tcp_cwnd_validate \| +1 tcp_write_wakeup \| -17 __tcp_push_pending_frames \| -25 tcp_push_one \| -8 tcp_send_fin \| -4 7 functions changed, 20 bytes added, 63 bytes removed, diff: -43 built-in.o.new: 18 functions changed, 40 bytes added, 178 bytes removed, diff: -138 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:34 -07:00
Andi Kleen	4ac02bab77	[TCP]: Uninline tcp_done(). The function is quite big and has several call sites and nothing to collapse by compiler optimization on inlining. Besides it's nicer to read in a in .c file. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:25 -07:00
Herbert Xu	604763722c	[NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARY When a transmitted packet is looped back directly, CHECKSUM_PARTIAL maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should treat it as such in the stack. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:43 -07:00

1 2 3

139 Commits