linux

Author	SHA1	Message	Date
Allan Stephens	05646c9110	[TIPC]: Optimize stream send routine to avoid fragmentation This patch enhances TIPC's stream socket send routine so that it avoids transmitting data in chunks that require fragmentation and reassembly, thereby improving performance at both the sending and receiving ends of the connection. The "maximum packet size" hint that records MTU info allows the socket to decide how big a chunk it should send; in the event that the hint has become stale, fragmentation may still occur, but the data will be passed correctly and the hint will be updated in time for the following send. Note: The 66060 byte pseudo-MTU used for intra-node connections requires the send routine to perform an additional check to ensure it does not exceed TIPC"s limit of 66000 bytes of user data per chunk. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Jon Paul Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:06:12 -07:00
David S. Miller	e06e7c6158	[IPV4]: The scheduled removal of multipath cached routing support. With help from Chris Wedgwood. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:05:57 -07:00
Marcel Holtmann	ef222013fc	[Bluetooth] Add hci_recv_fragment() helper function Most drivers must handle fragmented HCI data packets and events. This patch adds a generic function for their reassembly to the Bluetooth core layer and thus allows to shrink the complexity of the drivers. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2007-07-11 06:42:04 +02:00
Ben Dooks	825a2ff189	AX88796 network driver Support for the Asix AX88796 network controller, an NE2000 compatible 10/100 ethernet device with internal PHY. The driver supports PHY settings via either ioctl() or the ethtool driver ops. Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-07-10 12:41:08 -04:00
Vlad Yasevich	8a4794914f	[SCTP] Flag a pmtu change request Currently, if the socket is owned by the user, we drop the ICMP message. As a result SCTP forgets that path MTU changed and never adjusting it's estimate. This causes all subsequent packets to be fragmented. With this patch, we'll flag the association that it needs to udpate it's estimate based on the already updated routing information. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Acked-by: Sridhar Samudrala <sri@us.ibm.com>	2007-06-13 20:44:42 +00:00
Vlad Yasevich	c910b47e18	[SCTP] Update pmtu handling to be similar to tcp Introduce new function sctp_transport_update_pmtu that updates the transports and destination caches view of the path mtu. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Acked-by: Sridhar Samudrala <sri@us.ibm.com>	2007-06-13 20:44:42 +00:00
G. Liakhovetski	c0cfe7faa1	[IrDA]: Fix Rx/Tx path race. From: G. Liakhovetski <gl@dsa-ac.de> We need to switch to NRM _before_ sending the final packet otherwise we might hit a race condition where we get the first packet from the peer while we're still in LAP_XMIT_P. Signed-off-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-08 19:15:17 -07:00
Paul Moore	ba6ff9f2b5	[NetLabel]: consolidate the struct socket/sock handling to just struct sock The current NetLabel code has some redundant APIs which allow both "struct socket" and "struct sock" types to be used; this may have made sense at some point but it is wasteful now. Remove the functions that operate on sockets and convert the callers. Not only does this make the code smaller and more consistent but it pushes the locking burden up to the caller which can be more intelligent about the locks. Also, perform the same conversion (socket to sock) on the SELinux/NetLabel glue code where it make sense. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-08 13:33:09 -07:00
Joy Latten	4aa2e62c45	xfrm: Add security check before flushing SAD/SPD Currently we check for permission before deleting entries from SAD and SPD, (see security_xfrm_policy_delete() security_xfrm_state_delete()) However we are not checking for authorization when flushing the SPD and the SAD completely. It was perhaps missed in the original security hooks patch. This patch adds a security check when flushing entries from the SAD and SPD. It runs the entire database and checks each entry for a denial. If the process attempting the flush is unable to remove all of the entries a denial is logged the the flush function returns an error without removing anything. This is particularly useful when a process may need to create or delete its own xfrm entries used for things like labeled networking but that same process should not be able to delete other entries or flush the entire database. Signed-off-by: Joy Latten<latten@austin.ibm.com> Signed-off-by: Eric Paris <eparis@parisplace.org> Signed-off-by: James Morris <jmorris@namei.org>	2007-06-07 13:42:46 -07:00
David S. Miller	df2bc459a3	[UDP]: Revert 2-pass hashing changes. This reverts changesets: `6aaf47fa48` `b7b5f487ab` `de34ed91c4` `fc038410b4` There are still some correctness issues recently discovered which do not have a known fix that doesn't involve doing a full hash table scan on port bind. So revert for now. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:40:50 -07:00
Patrick McHardy	ef7c79ed64	[NETLINK]: Mark netlink policies const Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:40:10 -07:00
Patrick McHardy	f0e48dbfc5	[TCP]: Honour sk_bound_dev_if in tcp_v4_send_ack A time_wait socket inherits sk_bound_dev_if from the original socket, but it is not used when sending ACK packets using ip_send_reply. Fix by passing the oif to ip_send_reply in struct ip_reply_arg and use it for output routing. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-07 13:38:51 -07:00
David S. Miller	1c92b4e50e	[AF_UNIX]: Make socket locking much less confusing. The unix_state_*() locking macros imply that there is some rwlock kind of thing going on, but the implementation is actually a spinlock which makes the code more confusing than it needs to be. So use plain unix_state_lock and unix_state_unlock. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-03 18:08:40 -07:00
Pavel Emelianov	e4fd5da39f	[TCP]: Consolidate checking for tcp orphan count being too big. tcp_out_of_resources() and tcp_close() perform the same checking of number of orphan sockets. Move this code into common place. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-31 01:23:34 -07:00
Arnaldo Carvalho de Melo	4e07a91c37	[SOCK]: Shrink struct sock by 8 bytes on 64-bit. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-31 01:23:32 -07:00
David S. Miller	01e67d08fa	[XFRM]: Allow XFRM_ACQ_EXPIRES to be tunable via sysctl. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-31 01:23:23 -07:00
David S. Miller	14e50e57ae	[XFRM]: Allow packet drops during larval state resolution. The current IPSEC rule resolution behavior we have does not work for a lot of people, even though technically it's an improvement from the -EAGAIN buisness we had before. Right now we'll block until the key manager resolves the route. That works for simple cases, but many folks would rather packets get silently dropped until the key manager resolves the IPSEC rules. We can't tell these folks to "set the socket non-blocking" because they don't have control over the non-block setting of things like the sockets used to resolve DNS deep inside of the resolver libraries in libc. With that in mind I coded up the patch below with some help from Herbert Xu which provides packet-drop behavior during larval state resolution, controllable via sysctl and off by default. This lays the framework to either: 1) Make this default at some point or... 2) Move this logic into xfrm{4,6}_policy.c and implement the ARP-like resolution queue we've all been dreaming of. The idea would be to queue packets to the policy, then once the larval state is resolved by the key manager we re-resolve the route and push the packets out. The packets would timeout if the rule didn't get resolved in a certain amount of time. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-24 18:17:54 -07:00
Marcel Holtmann	5dee9e7c4c	[Bluetooth] Fix L2CAP configuration parameter handling The L2CAP configuration parameter handling was missing the support for rejecting unknown options. The capability to reject unknown options is mandatory since the Bluetooth 1.2 specification. This patch implements its and also simplifies the parameter parsing. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2007-05-24 14:27:19 +02:00
Yasuyuki Kozakai	fda6143683	[NETFILTER]: nf_conntrack: Removes unused destroy operation of l3proto Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:47:46 -07:00
Yasuyuki Kozakai	c874d5f726	[NETFILTER]: nf_conntrack: Removes duplicated declarations These are also in include/net/netfilter/nf_conntrack_helper.h Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:47:45 -07:00
Yasuyuki Kozakai	ba4c7cbadd	[NETFILTER]: nf_nat: remove unused argument of function allocating binding nf_nat_rule_find, alloc_null_binding and alloc_null_binding_confirmed do not use the argument 'info', which is actually ct->nat.info. If they are necessary to access it again, we can use the argument 'ct' instead. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:47:44 -07:00
David S. Miller	fc038410b4	[UDP]: Fix AF-specific references in AF-agnostic code. __udp_lib_port_inuse() cannot make direct references to inet_sk(sk)->rcv_saddr as that is ipv4 specific state and this code is used by ipv6 too. Use an operations vector to solve this, and this also paves the way for ipv6 support for non-wild saddr hashing in UDP. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-10 23:47:22 -07:00
Jeff Garzik	2c4f365ad2	Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream	2007-05-09 18:54:49 -04:00
John Anthony Kazos Jr	121e70b69a	include files: convert "include" subdirectory to UTF-8 Convert the "include" subdirectory to UTF-8. Signed-off-by: John Anthony Kazos Jr. <jakj@j-a-k-j.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-05-09 08:58:21 +02:00
Christoph Hellwig	6272e26679	cleanup compat ioctl handling Merge all compat ioctl handling into compat_ioctl.c instead of splitting it over compat.c and compat_ioctl.c. This also allows to get rid of ioctl32.h Signed-off-by: Christoph Hellwig <hch@lst.de> Looks-good-to: Andi Kleen <ak@suse.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-08 11:15:09 -07:00
Larry Finger	f5cdf30618	[PATCH] ieee80211: add ieee80211_channel_to_freq The routines that interrogate the ieee80211_geo struct are missing a channel to frequency entry. This patch adds it. Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2007-05-08 11:51:59 -04:00
Jiri Benc	f0706e828e	[MAC80211]: Add mac80211 wireless stack. Add mac80211, the IEEE 802.11 software MAC layer. Signed-off-by: Jiri Benc <jbenc@suse.cz> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2007-05-05 11:45:53 -07:00
Vlad Yasevich	07d9396771	[SCTP]: Set assoc_id correctly during INIT collision. During the INIT/COOKIE-ACK collision cases, it's possible to get into a situation where the association id is not yet set at the time of the user event generation. As a result, user events have an association id set to 0 which will confuse applications. This happens if we hit case B of duplicate cookie processing. In the particular example found and provided by Oscar Isaula <Oscar.Isaula@motorola.com>, flow looks like this: A B ---- INIT-------> (lost) <---------INIT------ ---- INIT-ACK---> <------ Cookie ECHO When the Cookie Echo is received, we end up trying to update the association that was created on A as a result of the (lost) INIT, but that association doesn't have the ID set yet. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-04 13:55:27 -07:00
Sridhar Samudrala	827bf12236	[SCTP]: Re-order SCTP initializations to avoid race with sctp_rcv() Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-04 13:36:30 -07:00
Jamal Hadi Salim	5a6d34162f	[XFRM] SPD info TLV aggregation Aggregate the SPD info TLVs. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-04 12:55:39 -07:00
Jamal Hadi Salim	af11e31609	[XFRM] SAD info TLV aggregationx Aggregate the SAD info TLVs. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-04 12:55:13 -07:00
Jennifer Hunt	561e036006	[AF_IUCV]: Implementation of a skb backlog queue With the inital implementation we missed to implement a skb backlog queue . The result is that socket receive processing tossed packets. Since AF_IUCV connections are working synchronously it leads to connection hangs. Problems with read, close and select also occured. Using a skb backlog queue is fixing all of these problems . Signed-off-by: Jennifer Hunt <jenhunt@us.ibm.com> Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-04 12:22:07 -07:00
Eric Dumazet	db3459d1a7	[IPV6]: Some cleanups in include/net/ipv6.h 1) struct ip6_flowlabel : moves 'users' field to avoid two 32bits holes for 64bit arches. Shrinks by 8 bytes sizeof(struct ip6_flowlabel) 2) ipv6_addr_cmp() and ipv6_addr_copy() dont need (void ) casts : Compiler might take into account natural alignement of in6_addr structs to emit better code for memcpy()/memcmp() Casts to (void ) force byte accesses. 3) ipv6_addr_prefix() optimization : Better to clear whole struct, as compiler can emit better code for memset(addr, 0, 16) (2 stores on x86_64), and avoid some conditional branches. # size vmlinux.after vmlinux.before text data bss dec hex filename 5262262 647612 557432 6467306 62aeea vmlinux.after 5262550 647612 557432 6467594 62b00a vmlinux.before thats 288 bytes saved. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 17:39:04 -07:00
Ilpo Järvinen	0ec96822d5	[TCP]: Use S+L catcher only with SACK for now TCP has a transitional state when SACK is not in use during which this invariant is temporarily broken. Without SACK, tcp_clean_rtx_queue does not decrement sacked_out. Therefore calls to tcp_sync_left_out before sacked_out is again corrected by tcp_fastretrans_alert can trigger this trap as sacked_out still has couple of segments that are already out of window. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:30:34 -07:00
Eric Dumazet	709525fad8	[IPV6]: Get rid of __HAVE_ARCH_ADDR_SET. __HAVE_ARCH_ADDR_SET seems unused these days, just get rid of it. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:08:43 -07:00
Linus Torvalds	152a6a9da1	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits) [IPV4] SNMP: Support OutMcastPkts and OutBcastPkts [IPV4] SNMP: Support InMcastPkts and InBcastPkts [IPV4] SNMP: Support InTruncatedPkts [IPV4] SNMP: Support InNoRoutes [SNMP]: Add definitions for {In,Out}BcastPkts [TCP] FRTO: RFC4138 allows Nagle override when new data must be sent [TCP] FRTO: Delay skb available check until it's mandatory [XFRM]: Restrict upper layer information by bundle. [TCP]: Catch skb with S+L bugs earlier [PATCH] INET : IPV4 UDP lookups converted to a 2 pass algo [L2TP]: Add the ability to autoload a pppox protocol module. [SKB]: Introduce skb_queue_walk_safe() [AF_IUCV/IUCV]: smp_call_function deadlock [IPV6]: Fix slab corruption running ip6sic [TCP]: Update references in two old comments [XFRM]: Export SPD info [IPV6]: Track device renames in snmp6. [SCTP]: Fix sctp_getsockopt_local_addrs_old() to use local storage. [NET]: Remove NETIF_F_INTERNAL_STATS, default to internal stats. [NETPOLL]: Remove CONFIG_NETPOLL_RX ...	2007-04-30 08:14:42 -07:00
Ilpo Järvinen	d551e4541d	[TCP] FRTO: RFC4138 allows Nagle override when new data must be sent This is a corner case where less than MSS sized new data thingie is awaiting in the send queue. For F-RTO to work correctly, a new data segment must be sent at certain point or F-RTO cannot be used at all. RFC4138 allows overriding of Nagle at that point. Implementation uses frto_counter states 2 and 3 to distinguish when Nagle override is needed. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 00:58:16 -07:00
Masahide NAKAMURA	157bfc2502	[XFRM]: Restrict upper layer information by bundle. On MIPv6 usage, XFRM sub policy is enabled. When main (IPsec) and sub (MIPv6) policy selectors have the same address set but different upper layer information (i.e. protocol number and its ports or type/code), multiple bundle should be created. However, currently we have issue to use the same bundle created for the first time with all flows covered by the case. It is useful for the bundle to have the upper layer information to be restructured correctly if it does not match with the flow. 1. Bundle was created by two policies Selector from another policy is added to xfrm_dst. If the flow does not match the selector, it goes to slow path to restructure new bundle by single policy. 2. Bundle was created by one policy Flow cache is added to xfrm_dst as originated one. If the flow does not match the cache, it goes to slow path to try searching another policy. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 00:58:09 -07:00
Ilpo Järvinen	34588b4c04	[TCP]: Catch skb with S+L bugs earlier SACKED_ACKED and LOST are mutually exclusive with SACK, thus having their sum larger than packets_out is bug with SACK. Eventually these bugs trigger traps in the tcp_clean_rtx_queue with SACK but it's much more informative to do this here. Non-SACK TCP, however, could get more than packets_out duplicate ACKs which each increment sacked_out, so it makes sense to do this kind of limitting for non-SACK TCP but not for SACK enabled one. Perhaps the author had the opposite in mind but did the logic accidently wrong way around? Anyway, the sacked_out incrementer code for non-SACK already deals this issue before calling sync_left_out so this trapping can be done unconditionally. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 00:57:33 -07:00
Martin Schwidefsky	04b090d50c	[AF_IUCV/IUCV]: smp_call_function deadlock Calling smp_call_function can lead to a deadlock if it is called from tasklet context. Fixing this deadlock requires to move the smp_call_function from the tasklet context to a work queue. To do that queue the path pending interrupts to a separate list and move the path cleanup out of iucv_path_sever to iucv_path_connect and iucv_path_pending. This creates a new requirement for iucv_path_connect: it may not be called from tasklet context anymore. Also fixed compile problem for CONFIG_HOTPLUG_CPU=n and another one when walking the cpu_online mask. When doing this, we must disable cpu hotplug. Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-28 23:03:59 -07:00
Jamal Hadi Salim	ecfd6b1837	[XFRM]: Export SPD info With this patch you can use iproute2 in user space to efficiently see how many policies exist in different directions. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-28 21:20:32 -07:00
Pavel Roskin	6693228da9	[PATCH] Remove comment about IEEE80211_RADIOTAP_FCS IEEE80211_RADIOTAP_FCS is obsolete and should not be used. It's no longer defined. Remove it from the comment too. Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2007-04-28 11:01:03 -04:00
Jouni Malinen	85d32e7b0e	[PATCH] Update my email address from jkmaline@cc.hut.fi to j@w1.fi After 13 years of use, it looks like my email address is finally going to disappear. While this is likely to drop the amount of incoming spam greatly ;-), it may also affect more appropriate messages, so let's update my email address in various places. In addition, Host AP mailing list is subscribers-only and linux-wireless can also be used for discussing issues related to this driver which is now shown in MAINTAINERS. Signed-off-by: Jouni Malinen <j@w1.fi> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2007-04-28 11:01:01 -04:00
Pavel Roskin	a0d69f229f	[PATCH] sparse-annotate radiotap header Document that all fields must be little endian. Use annotated types even in the comments. Consistently use shorter type names (u8, s8). Realign the comments. Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2007-04-28 11:01:00 -04:00
Marcelo Tosatti	876c9d3aeb	[PATCH] Marvell Libertas 8388 802.11b/g USB driver Add the Marvell Libertas 8388 802.11 USB driver. Signed-off-by: Marcelo Tosatti <marcelo@kvack.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2007-04-28 11:00:54 -04:00
David Howells	b8b8fd2dc2	[NET]: Fix networking compilation errors Fix miscellaneous networking compilation errors. () Export ktime_add_ns() for modules. () wext_proc_init() should have an ANSI declaration. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-27 15:31:24 -07:00
Johannes Berg	295f4a1fa3	[WEXT]: Clean up how wext is called. This patch cleans up the call paths from the core code into wext. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 20:43:56 -07:00
David Howells	651350d10f	[AF_RXRPC]: Add an interface to the AF_RXRPC module for the AFS filesystem to use Add an interface to the AF_RXRPC module so that the AFS filesystem module can more easily make use of the services available. AFS still opens a socket but then uses the action functions in lieu of sendmsg() and registers an intercept functions to grab messages before they're queued on the socket Rx queue. This permits AFS (or whatever) to: (1) Avoid the overhead of using the recvmsg() call. (2) Use different keys directly on individual client calls on one socket rather than having to open a whole slew of sockets, one for each key it might want to use. (3) Avoid calling request_key() at the point of issue of a call or opening of a socket. This is done instead by AFS at the point of open(), unlink() or other VFS operation and the key handed through. (4) Request the use of something other than GFP_KERNEL to allocate memory. Furthermore: () The socket buffer markings used by RxRPC are made available for AFS so that it can interpret the cooked RxRPC messages itself. () rxgen (un)marshalling abort codes are made available. The following documentation for the kernel interface is added to Documentation/networking/rxrpc.txt: ========================= AF_RXRPC KERNEL INTERFACE ========================= The AF_RXRPC module also provides an interface for use by in-kernel utilities such as the AFS filesystem. This permits such a utility to: (1) Use different keys directly on individual client calls on one socket rather than having to open a whole slew of sockets, one for each key it might want to use. (2) Avoid having RxRPC call request_key() at the point of issue of a call or opening of a socket. Instead the utility is responsible for requesting a key at the appropriate point. AFS, for instance, would do this during VFS operations such as open() or unlink(). The key is then handed through when the call is initiated. (3) Request the use of something other than GFP_KERNEL to allocate memory. (4) Avoid the overhead of using the recvmsg() call. RxRPC messages can be intercepted before they get put into the socket Rx queue and the socket buffers manipulated directly. To use the RxRPC facility, a kernel utility must still open an AF_RXRPC socket, bind an addess as appropriate and listen if it's to be a server socket, but then it passes this to the kernel interface functions. The kernel interface functions are as follows: () Begin a new client call. struct rxrpc_call rxrpc_kernel_begin_call(struct socket sock, struct sockaddr_rxrpc srx, struct key key, unsigned long user_call_ID, gfp_t gfp); This allocates the infrastructure to make a new RxRPC call and assigns call and connection numbers. The call will be made on the UDP port that the socket is bound to. The call will go to the destination address of a connected client socket unless an alternative is supplied (srx is non-NULL). If a key is supplied then this will be used to secure the call instead of the key bound to the socket with the RXRPC_SECURITY_KEY sockopt. Calls secured in this way will still share connections if at all possible. The user_call_ID is equivalent to that supplied to sendmsg() in the control data buffer. It is entirely feasible to use this to point to a kernel data structure. If this function is successful, an opaque reference to the RxRPC call is returned. The caller now holds a reference on this and it must be properly ended. () End a client call. void rxrpc_kernel_end_call(struct rxrpc_call call); This is used to end a previously begun call. The user_call_ID is expunged from AF_RXRPC's knowledge and will not be seen again in association with the specified call. () Send data through a call. int rxrpc_kernel_send_data(struct rxrpc_call call, struct msghdr msg, size_t len); This is used to supply either the request part of a client call or the reply part of a server call. msg.msg_iovlen and msg.msg_iov specify the data buffers to be used. msg_iov may not be NULL and must point exclusively to in-kernel virtual addresses. msg.msg_flags may be given MSG_MORE if there will be subsequent data sends for this call. The msg must not specify a destination address, control data or any flags other than MSG_MORE. len is the total amount of data to transmit. () Abort a call. void rxrpc_kernel_abort_call(struct rxrpc_call call, u32 abort_code); This is used to abort a call if it's still in an abortable state. The abort code specified will be placed in the ABORT message sent. () Intercept received RxRPC messages. typedef void (rxrpc_interceptor_t)(struct sock sk, unsigned long user_call_ID, struct sk_buff skb); void rxrpc_kernel_intercept_rx_messages(struct socket sock, rxrpc_interceptor_t interceptor); This installs an interceptor function on the specified AF_RXRPC socket. All messages that would otherwise wind up in the socket's Rx queue are then diverted to this function. Note that care must be taken to process the messages in the right order to maintain DATA message sequentiality. The interceptor function itself is provided with the address of the socket and handling the incoming message, the ID assigned by the kernel utility to the call and the socket buffer containing the message. The skb->mark field indicates the type of message: MARK MEANING =============================== ======================================= RXRPC_SKB_MARK_DATA Data message RXRPC_SKB_MARK_FINAL_ACK Final ACK received for an incoming call RXRPC_SKB_MARK_BUSY Client call rejected as server busy RXRPC_SKB_MARK_REMOTE_ABORT Call aborted by peer RXRPC_SKB_MARK_NET_ERROR Network error detected RXRPC_SKB_MARK_LOCAL_ERROR Local error encountered RXRPC_SKB_MARK_NEW_CALL New incoming call awaiting acceptance The remote abort message can be probed with rxrpc_kernel_get_abort_code(). The two error messages can be probed with rxrpc_kernel_get_error_number(). A new call can be accepted with rxrpc_kernel_accept_call(). Data messages can have their contents extracted with the usual bunch of socket buffer manipulation functions. A data message can be determined to be the last one in a sequence with rxrpc_kernel_is_data_last(). When a data message has been used up, rxrpc_kernel_data_delivered() should be called on it.. Non-data messages should be handled to rxrpc_kernel_free_skb() to dispose of. It is possible to get extra refs on all types of message for later freeing, but this may pin the state of a call until the message is finally freed. () Accept an incoming call. struct rxrpc_call * rxrpc_kernel_accept_call(struct socket sock, unsigned long user_call_ID); This is used to accept an incoming call and to assign it a call ID. This function is similar to rxrpc_kernel_begin_call() and calls accepted must be ended in the same way. If this function is successful, an opaque reference to the RxRPC call is returned. The caller now holds a reference on this and it must be properly ended. () Reject an incoming call. int rxrpc_kernel_reject_call(struct socket sock); This is used to reject the first incoming call on the socket's queue with a BUSY message. -ENODATA is returned if there were no incoming calls. Other errors may be returned if the call had been aborted (-ECONNABORTED) or had timed out (-ETIME). () Record the delivery of a data message and free it. void rxrpc_kernel_data_delivered(struct sk_buff skb); This is used to record a data message as having been delivered and to update the ACK state for the call. The socket buffer will be freed. () Free a message. void rxrpc_kernel_free_skb(struct sk_buff skb); This is used to free a non-DATA socket buffer intercepted from an AF_RXRPC socket. () Determine if a data message is the last one on a call. bool rxrpc_kernel_is_data_last(struct sk_buff skb); This is used to determine if a socket buffer holds the last data message to be received for a call (true will be returned if it does, false if not). The data message will be part of the reply on a client call and the request on an incoming call. In the latter case there will be more messages, but in the former case there will not. () Get the abort code from an abort message. u32 rxrpc_kernel_get_abort_code(struct sk_buff skb); This is used to extract the abort code from a remote abort message. () Get the error number from a local or network error message. int rxrpc_kernel_get_error_number(struct sk_buff *skb); This is used to extract the error number from a message indicating either a local error occurred or a network error occurred. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 15:50:17 -07:00
David Howells	17926a7932	[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both Provide AF_RXRPC sockets that can be used to talk to AFS servers, or serve answers to AFS clients. KerberosIV security is fully supported. The patches and some example test programs can be found in: http://people.redhat.com/~dhowells/rxrpc/ This will eventually replace the old implementation of kernel-only RxRPC currently resident in net/rxrpc/. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 15:48:28 -07:00
Adrian Bunk	42bad1da50	[NETLINK]: Possible cleanups. - make the following needlessly global variables static: - core/rtnetlink.c: struct rtnl_msg_handlers[] - netfilter/nf_conntrack_proto.c: struct nf_ct_protos[] - make the following needlessly global functions static: - core/rtnetlink.c: rtnl_dump_all() - netlink/af_netlink.c: netlink_queue_skip() Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 00:57:41 -07:00
Jamal Hadi Salim	28d8909bc7	[XFRM]: Export SAD info. On a system with a lot of SAs, counting SAD entries chews useful CPU time since you need to dump the whole SAD to user space; i.e something like ip xfrm state ls \| grep -i src \| wc -l I have seen taking literally minutes on a 40K SAs when the system is swapping. With this patch, some of the SAD info (that was already being tracked) is exposed to user space. i.e you do: ip xfrm state count And you get the count; you can also pass -s to the command line and get the hash info. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-26 00:10:29 -07:00
Herbert Xu	7f7d9a6b96	[IPV6]: Consolidate common SNMP code This patch moves the non-proc SNMP code into addrconf.c and reuses IPv4 SNMP code where applicable. As a result we can skip proc.o if /proc is disabled. Note that I've made a number of functions static since they're only used by addrconf.c for now. If they ever get used elsewhere we can always remove the static. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:52 -07:00
Herbert Xu	5e0f04351d	[IPV4]: Consolidate common SNMP code This patch moves the SNMP code shared between IPv4/IPv6 from proc.c into net/ipv4/af_inet.c. This makes sense because these functions aren't specific to /proc. As a result we can again skip proc.o if /proc is disabled. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:51 -07:00
Johannes Berg	43fb45cb79	[WIRELESS] cfg80211: Update comment for locking. This patch adds a comment that was part of my rtnl locking patch for cfg80211 but which I forgot for the merge. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:48 -07:00
Stephen Hemminger	164891aadf	[TCP]: Congestion control API update. Do some simple changes to make congestion control API faster/cleaner. * use ktime_t rather than timeval * merge rtt sampling into existing ack callback this means one indirect call versus two per ack. * use flags bits to store options/settings Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:45 -07:00
Johannes Berg	9e101eab15	[WIRELESS]: Remove wext over netlink. As scheduled, this patch removes the pointless wext over netlink code. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:42 -07:00
Johannes Berg	704232c271	[WIRELESS] cfg80211: New wireless config infrastructure. This patch creates the core cfg80211 code along with some sysfs bits. This is a stripped down version to allow mac80211 to function, but doesn't include any configuration yet except for creating and removing virtual interfaces. This patch includes the nl80211 header file but it only contains the interface types which the cfg80211 interface for creating virtual interfaces relies on. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:41 -07:00
YOSHIFUJI Hideaki	97fc8d0bc5	[IPV6] SNMP: Use put_unaligned() instead of memcpy(). Hint from David Miller <davem@davemloft.net>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:37 -07:00
YOSHIFUJI Hideaki	2334e97355	[IPV6] SNMP: Avoid unaligned accesses. Because stats pointer may not be aligned for u64, use memcpy to fill u64 values. Issue reported by David Miller <davem@davemloft.net>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-04-25 22:29:35 -07:00
Ilpo Järvinen	9e412ba763	[TCP]: Sed magic converts func(sk, tp, ...) -> func(sk, ...) This is (mostly) automated change using magic: sed -e '/struct sock \sk/ N' -e '/struct sock \sk/ N' -e '/struct sock \sk/ N' -e '/struct sock \sk/ N' -e 's\|struct sock \sk,[\n\t ]struct tcp_sock \tp$[^{]\n{\n$\| struct sock \sk\1\tstruct tcp_sock tp = tcp_sk(sk);\n\|g' -e 's\|struct sock \sk, struct tcp_sock \tp\| struct sock \*sk\|g' -e 's\|sk, tp$[^-]$\|sk\1\|g' Fixed four unused variable (tp) warnings that were introduced. In addition, manually added newlines after local variables and tweaked function arguments positioning. $ gcc --version gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1) ... $ codiff -fV built-in.o.old built-in.o.new net/ipv4/route.c: rt_cache_flush \| +14 1 function changed, 14 bytes added net/ipv4/tcp.c: tcp_setsockopt \| -5 tcp_sendpage \| -25 tcp_sendmsg \| -16 3 functions changed, 46 bytes removed net/ipv4/tcp_input.c: tcp_try_undo_recovery \| +3 tcp_try_undo_dsack \| +2 tcp_mark_head_lost \| -12 tcp_ack \| -15 tcp_event_data_recv \| -32 tcp_rcv_state_process \| -10 tcp_rcv_established \| +1 7 functions changed, 6 bytes added, 69 bytes removed, diff: -63 net/ipv4/tcp_output.c: update_send_head \| -9 tcp_transmit_skb \| +19 tcp_cwnd_validate \| +1 tcp_write_wakeup \| -17 __tcp_push_pending_frames \| -25 tcp_push_one \| -8 tcp_send_fin \| -4 7 functions changed, 20 bytes added, 63 bytes removed, diff: -43 built-in.o.new: 18 functions changed, 40 bytes added, 178 bytes removed, diff: -138 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:34 -07:00
Andi Kleen	9958089a43	[NET]: Move sk_setup_caps() out of line. It is far too large to be an inline and not in any hot paths. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:26 -07:00
Andi Kleen	4ac02bab77	[TCP]: Uninline tcp_done(). The function is quite big and has several call sites and nothing to collapse by compiler optimization on inlining. Besides it's nicer to read in a in .c file. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:25 -07:00
YOSHIFUJI Hideaki	334901700f	[IPV4] SNMP: Move some statistic bits to net/ipv4/proc.c. This also fixes memory leak in error path. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:12 -07:00
YOSHIFUJI Hideaki	bf99f1bde3	[IPV6] SNMP: Netlink interface. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:10 -07:00
Patrick McHardy	0463d4ae25	[NET_SCHED]: Eliminate qdisc_tree_lock Since we're now holding the rtnl during the entire dump operation, we can remove qdisc_tree_lock, whose only purpose is to protect dump callbacks from concurrent changes to the qdisc tree. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:07 -07:00
Herbert Xu	604763722c	[NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARY When a transmitted packet is looped back directly, CHECKSUM_PARTIAL maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should treat it as such in the stack. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:43 -07:00
Patrick McHardy	c5c2523893	[XFRM]: Optimize MTU calculation Replace the probing based MTU estimation, which usually takes 2-3 iterations to find a fitting value and may underestimate the MTU, by an exact calculation. Also fix underestimation of the XFRM trailer_len, which causes unnecessary reallocations. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:38 -07:00
David Howells	716ea3a7aa	[NET]: Move generic skbuff stuff from XFRM code to generic code Move generic skbuff stuff from XFRM code to generic code so that AF_RXRPC can use it too. The kdoc comments I've attached to the functions needs to be checked by whoever wrote them as I had to make some guesses about the workings of these functions. Signed-off-By: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:33 -07:00
Arnaldo Carvalho de Melo	2a123b86e2	[BLUETOOTH]: Introduce skb->data accessor methods for hci_{acl,event,sco}_hdr For consistency with other skb data accessors, reducing the number of direct accesses to skb->data. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2007-04-25 22:28:21 -07:00
Thomas Graf	73417f617a	[NET] fib_rules: Flush route cache after rule modifications The results of FIB rules lookups are cached in the routing cache except for IPv6 as no such cache exists. So far, it was the responsibility of the user to flush the cache after modifying any rules. This lead to many false bug reports due to misunderstanding of this concept. This patch automatically flushes the route cache after inserting or deleting a rule. Thanks to Muli Ben-Yehuda <muli@il.ibm.com> for catching a bug in the previous patch. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:18 -07:00
Thomas Graf	0947c9fe56	[NET] fib_rules: goto rule action This patch adds a new rule action FR_ACT_GOTO which allows to skip a set of rules by jumping to another rule. The rule to jump to is specified via the FRA_GOTO attribute which carries a rule preference. Referring to a rule which doesn't exists is explicitely allowed. Such goto rules are marked with the flag FIB_RULE_UNRESOLVED and will act like a rule with a non-matching selector. The rule will become functional as soon as its target is present. The goto action enables performance optimizations by reducing the average number of rules that have to be passed per lookup. Example: 0: from all lookup local 40: not from all to 192.168.23.128 goto 32766 41: from all fwmark 0xa blackhole 42: from all fwmark 0xff blackhole 32766: from all lookup main Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:12 -07:00
David S. Miller	b3da2cf37c	[INET]: Use jhash + random secret for ehash. The days are gone when this was not an issue, there are folks out there with huge bot networks that can be used to attack the established hash tables on remote systems. So just like the routing cache and connection tracking hash, use Jenkins hash with random secret input. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:06 -07:00
Johannes Berg	d30045a0bc	[NETLINK]: introduce NLA_BINARY type This patch introduces a new NLA_BINARY attribute policy type with the verification of simply checking the maximum length of the payload. It also fixes a small typo in the example. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:05 -07:00
Vlad Yasevich	703315712c	[SCTP]: Implement SCTP_MAX_BURST socket option. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:04 -07:00
Vlad Yasevich	a5a35e7675	[SCTP]: Implement sac_info field in SCTP_ASSOC_CHANGE notification. As stated in the sctp socket api draft: sac_info: variable If the sac_state is SCTP_COMM_LOST and an ABORT chunk was received for this association, sac_info[] contains the complete ABORT chunk as defined in the SCTP specification RFC2960 [RFC2960] section 3.3.7. We now save received ABORT chunks into the sac_info field and pass that to the user. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:03 -07:00
Vlad Yasevich	bdf3092af6	[SCTP]: Honor flags when setting peer address parameters Parameters only take effect when a corresponding flag bit is set and a value is specified. This means we need to check the flags in addition to checking for non-zero value. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:02 -07:00
Vlad Yasevich	1ae4114dce	[SCTP]: Implement SCTP_ADDR_CONFIRMED state for ADDR_CHNAGE event Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:01 -07:00
Vlad Yasevich	d49d91d79a	[SCTP]: Implement SCTP_PARTIAL_DELIVERY_POINT option. This option induces partial delivery to run as soon as the specified amount of data has been accumulated on the association. However, we give preference to fully reassembled messages over PD messages. In any case, window and buffer is freed up. Signed-off-by: Vlad Yasevich <vladislav.yasevich@.hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:00 -07:00
Vlad Yasevich	b6e1331f3c	[SCTP]: Implement SCTP_FRAGMENT_INTERLEAVE socket option This option was introduced in draft-ietf-tsvwg-sctpsocket-13. It prevents head-of-line blocking in the case of one-to-many endpoint. Applications enabling this option really must enable SCTP_SNDRCV event so that they would know where the data belongs. Based on an earlier patch by Ivan Skytte Jørgensen. Additionally, this functionality now permits multiple associations on the same endpoint to enter Partial Delivery. Applications should be extra careful, when using this functionality, to track EOR indicators. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:59 -07:00
Patrick McHardy	a48b5a6144	[NET_SCHED]: Unline tcf_destroy Uninline tcf_destroy and add a helper function to destroy an entire filter chain. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:56 -07:00
Patrick McHardy	3bebcda280	[NET_SCHED]: turn PSCHED_GET_TIME into inline function Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:55 -07:00
Patrick McHardy	03cc45c0a5	[NET_SCHED]: turn PSCHED_TDIFF_SAFE into inline function Also rename to psched_tdiff_bounded. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:54 -07:00
Patrick McHardy	8edc0c31d6	[NET_SCHED]: kill PSCHED_TDIFF Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:53 -07:00
Patrick McHardy	a084980dcb	[NET_SCHED]: kill PSCHED_SET_PASTPERFECT/PSCHED_IS_PASTPERFECT Use direct assignment and comparison instead. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:51 -07:00
Patrick McHardy	104e087898	[NET_SCHED]: kill PSCHED_TLESS Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:50 -07:00
Patrick McHardy	7c59e25f31	[NET_SCHED]: kill PSCHED_TADD/PSCHED_TADD2 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:49 -07:00
Patrick McHardy	26e252df1e	[NET_SCHED]: kill PSCHED_AUDIT_TDIFF Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:48 -07:00
Thomas Graf	1d00a4eb42	[NETLINK]: Remove error pointer from netlink message handler The error pointer argument in netlink message handlers is used to signal the special case where processing has to be interrupted because a dump was started but no error happened. Instead it is simpler and more clear to return -EINTR and have netlink_run_queue() deal with getting the queue right. nfnetlink passed on this error pointer to its subsystem handlers but only uses it to signal the start of a netlink dump. Therefore it can be removed there as well. This patch also cleans up the error handling in the affected message handlers to be consistent since it had to be touched anyway. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:30 -07:00
Thomas Graf	c454673da7	[NET] rules: Unified rules dumping Implements a unified, protocol independant rules dumping function which is capable of both, dumping a specific protocol family or all of them. This speeds up dumping as less lookups are required. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:17 -07:00
Thomas Graf	c127ea2c45	[IPv6]: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:13 -07:00
Thomas Graf	fa34ddd739	[DECNet]: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:12 -07:00
Thomas Graf	be577ddc2b	[PKT_SCHED] qdisc: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:09 -07:00
Thomas Graf	63f3444fb9	[IPv4]: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:08 -07:00
Thomas Graf	9d9e6a5819	[NET] rules: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:07 -07:00
Thomas Graf	c8822a4e00	[NEIGH]: Use rtnl registration interface Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:06 -07:00
Thomas Graf	e284986385	[RTNL]: Message handler registration interface This patch adds a new interface to register rtnetlink message handlers replacing the exported rtnl_links[] array which required many message handlers to be exported unnecessarly. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:04 -07:00
Arnaldo Carvalho de Melo	dc5fc579b9	[NETLINK]: Use nlmsg_trim() where appropriate Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:37 -07:00
Arnaldo Carvalho de Melo	27a884dc3c	[SK_BUFF]: Convert skb->tail to sk_buff_data_t So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes on 64bit architectures, allowing us to combine the 4 bytes hole left by the layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4 64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN... :-) Many calculations that previously required that skb->{transport,network, mac}_header be first converted to a pointer now can be done directly, being meaningful as offsets or pointers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:28 -07:00
Patrick McHardy	00c04af9df	[NET_SCHED]: kill jiffie conversion macros Now that all packet schedulers have been converted to hrtimers most users of PSCHED_JIFFIE2US and PSCHED_US2JIFFIE are gone. The remaining users use it to convert external time units to packet scheduler clock ticks, so use PSCHED_TICKS_PER_SEC instead. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:14 -07:00
Patrick McHardy	4179477f63	[NET_SCHED]: Add hrtimer based qdisc watchdog Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:05 -07:00
Patrick McHardy	641b9e0e8b	[NET_SCHED]: Use ktime as clocksource Get rid of the manual clock source selection mess and use ktime. Also use a scalar representation, which allows to clean up pkt_sched.h a bit more and results in less ktime_to_ns() calls in most cases. The PSCHED_US2JIFFIE/PSCHED_JIFFIE2US macros are implemented quite inefficient by this patch, following patches will convert all qdiscs to hrtimers and get rid of them entirely. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:04 -07:00
Patrick McHardy	010c7d6f86	[NETFILTER]: nf_conntrack: uninline notifier registration functions Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:46 -07:00
Patrick McHardy	a3c5029cf7	[NETFILTER]: nfnetlink: use mutex instead of semaphore Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:43 -07:00
Patrick McHardy	ac5357ebac	[NETFILTER]: nf_conntrack: remove ugly hack in l4proto registration Remove ugly special-casing of nf_conntrack_l4proto_generic, all it wants is its sysctl tables registered, so do that explicitly in an init function and move the remaining protocol initialization and cleanup code to nf_conntrack_proto.c as well. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:40 -07:00
Patrick McHardy	587aa64163	[NETFILTER]: Remove IPv4 only connection tracking/NAT Remove the obsolete IPv4 only connection tracking/NAT as scheduled in feature-removal-schedule. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:34 -07:00
Arnaldo Carvalho de Melo	9c70220b73	[SK_BUFF]: Introduce skb_transport_header(skb) For the places where we need a pointer to the transport header, it is still legal to touch skb->h.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:31 -07:00
Arnaldo Carvalho de Melo	aa8223c7bb	[SK_BUFF]: Introduce tcp_hdr(), remove skb->h.th Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:26 -07:00
Arnaldo Carvalho de Melo	4bedb45203	[SK_BUFF]: Introduce udp_hdr(), remove skb->h.uh Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:22 -07:00
Arnaldo Carvalho de Melo	ea2ae17d64	[SK_BUFF]: Introduce skb_transport_offset() For the quite common 'skb->h.raw - skb->data' sequence. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:16 -07:00
Arnaldo Carvalho de Melo	0660e03f6b	[SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6h Now the skb->nh union has just one member, .raw, i.e. it is just like the skb->mac union, strange, no? I'm just leaving it like that till the transport layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or ->mac_header_offset?), ditto for ->{h,nh}. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:14 -07:00
Arnaldo Carvalho de Melo	eddc9ec53b	[SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:10 -07:00
Arnaldo Carvalho de Melo	c9bdd4b525	[IP]: Introduce ip_hdrlen() For the common sequence "skb->nh.iph->ihl * 4", removing a good number of open coded skb->nh.iph uses, now to go after the rest... Just out of curiosity, here are the idioms found to get the same result: skb->nh.iph->ihl << 2 skb->nh.iph->ihl<<2 skb->nh.iph->ihl * 4 skb->nh.iph->ihl4 (skb->nh.iph)->ihl sizeof(u32) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:07 -07:00
Arnaldo Carvalho de Melo	d56f90a7c9	[SK_BUFF]: Introduce skb_network_header() For the places where we need a pointer to the network header, it is still legal to touch skb->nh.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:59 -07:00
Arnaldo Carvalho de Melo	c1d2bbe1cd	[SK_BUFF]: Introduce skb_reset_network_header(skb) For the common, open coded 'skb->nh.raw = skb->data' operation, so that we can later turn skb->nh.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:46 -07:00
Arnaldo Carvalho de Melo	37e6636669	[LLC]: Kill llc_set_pdu_hdr We'll have skb_reset_network_header soon. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:42 -07:00
Arnaldo Carvalho de Melo	459a98ed88	[SK_BUFF]: Introduce skb_reset_mac_header(skb) For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:32 -07:00
Eric Dumazet	92f37fd2ee	[NET]: Adding SO_TIMESTAMPNS / SCM_TIMESTAMPNS support Now that network timestamps use ktime_t infrastructure, we can add a new SOL_SOCKET sockopt SO_TIMESTAMPNS. This command is similar to SO_TIMESTAMP, but permits transmission of a 'timespec struct' instead of a 'timeval struct' control message. (nanosecond resolution instead of microsecond) Control message is labelled SCM_TIMESTAMPNS instead of SCM_TIMESTAMP A socket cannot mix SO_TIMESTAMP and SO_TIMESTAMPNS : the two modes are mutually exclusive. sock_recv_timestamp() became too big to be fully inlined so I added a __sock_recv_timestamp() helper function. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: linux-arch@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:21 -07:00
Stephen Hemminger	a2a316fd06	[NET]: Replace CONFIG_NET_DEBUG with sysctl. Covert network warning messages from a compile time to runtime choice. Removes kernel config option and replaces it with new /proc/sys/net/core/warnings. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:05 -07:00
Eric Dumazet	ae40eb1ef3	[NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolution Now network timestamps use ktime_t infrastructure, we can add a new ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'. User programs can thus access to nanosecond resolution. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:04 -07:00
David S. Miller	fe067e8ab5	[TCP]: Abstract out all write queue operations. This allows the write queue implementation to be changed, for example, to one which allows fast interval searching. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:02 -07:00
Herbert Xu	759e5d0064	[UDP]: Clean up UDP-Lite receive checksum This patch eliminates some duplicate code for the verification of receive checksums between UDP-Lite and UDP. It does this by introducing __skb_checksum_complete_head which is identical to __skb_checksum_complete_head apart from the fact that it takes a length parameter rather than computing the first skb->len bytes. As a result UDP-Lite will be able to use hardware checksum offload for packets which do not use partial coverage checksums. It also means that UDP-Lite loopback no longer does unnecessary checksum verification. If any NICs start support UDP-Lite this would also start working automatically. This patch removes the assumption that msg_flags has MSG_TRUNC clear upon entry in recvmsg. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:51 -07:00
Neil Horman	95c385b4d5	[IPV6] ADDRCONF: Optimistic Duplicate Address Detection (RFC 4429) Support. Nominally an autoconfigured IPv6 address is added to an interface in the Tentative state (as per RFC 2462). Addresses in this state remain in this state while the Duplicate Address Detection process operates on them to determine their uniqueness on the network. During this period, these tentative addresses may not be used for communication, increasing the time before a node may be able to communicate on a network. Using Optimistic Duplicate Address Detection, autoconfigured addresses may be used immediately for communication on the network, as long as certain rules are followed to avoid conflicts with other nodes during the Duplicate Address Detection process. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:43 -07:00
Eric Dumazet	b7aa0bf70c	[NET]: convert network timestamps to ktime_t We currently use a special structure (struct skb_timeval) and plain 'struct timeval' to store packet timestamps in sk_buffs and struct sock. This has some drawbacks : - Fixed resolution of micro second. - Waste of space on 64bit platforms where sizeof(struct timeval)=16 I suggest using ktime_t that is a nice abstraction of high resolution time services, currently capable of nanosecond resolution. As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits a 8 byte shrink of this structure on 64bit architectures. Some other structures also benefit from this size reduction (struct ipq in ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...) Once this ktime infrastructure adopted, we can more easily provide nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or SO_TIMESTAMPNS/SCM_TIMESTAMPNS) Note : this patch includes a bug correction in compat_sock_get_timestamp() where a "err = 0;" was missing (so this syscall returned -ENOENT instead of 0) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> CC: John find <linux.kernel@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:34 -07:00
James Morris	9d729f72dc	[NET]: Convert xtime.tv_sec to get_seconds() Where appropriate, convert references to xtime.tv_sec to the get_seconds() helper function. Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:32 -07:00
Eric Dumazet	fa438ccfdf	[NET]: Keep sk_backlog near sk_lock sk_backlog is a critical field of struct sock. (known famous words) It is (ab)used in hot paths, in particular in release_sock(), tcp_recvmsg(), tcp_v4_rcv(), sk_receive_skb(). It really makes sense to place it next to sk_lock, because sk_backlog is only used after sk_lock locked (and thus memory cache line in L1 cache). This should reduce cache misses and sk_lock acquisition time. (In theory, we could only move the head pointer near sk_lock, and leaving tail far away, because 'tail' is normally not so hot, but keep it simple :) ) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:27 -07:00
Ilpo Järvinen	3cfe3baaf0	[TCP]: Add two new spurious RTO responses to FRTO New sysctl tcp_frto_response is added to select amongst these responses: - Rate halving based; reuses CA_CWR state (default) - Very conservative; used to be the only one available (=1) - Undo cwr; undoes ssthresh and cwnd reductions (=2) The response with rate halving requires a new parameter to tcp_enter_cwr because FRTO has already reduced ssthresh and doing a second reduction there has to be prevented. In addition, to keep things nice on 80 cols screen, a local variable was added. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:23 -07:00
John Heffner	886236c124	[TCP]: Add RFC3742 Limited Slow-Start, controlled by variable sysctl_tcp_max_ssthresh. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:19 -07:00
Ilpo Järvinen	46d0de4ed9	[TCP] FRTO: Entry is allowed only during (New)Reno like recovery This interpretation comes from RFC4138: "If the sender implements some loss recovery algorithm other than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD NOT be entered when earlier fast recovery is underway." I think the RFC means to say (especially in the light of Appendix B) that ...recovery is underway (not just fast recovery) or was underway when it was interrupted by an earlier (F-)RTO that hasn't yet been resolved (snd_una has not advanced enough). Thus, my interpretation is that whenever TCP has ever retransmitted other than head, basic version cannot be used because then the order assumptions which are used as FRTO basis do not hold. NewReno has only the head segment retransmitted at a time. Therefore, walk up to the segment that has not been SACKed, if that segment is not retransmitted nor anything before it, we know for sure, that nothing after the non-SACKed segment should be either. This assumption is valid because TCPCB_EVER_RETRANS does not leave holes but each non-SACKed segment is rexmitted in-order. Check for retrans_out > 1 avoids more expensive walk through the skb list, as we can know the result beforehand: F-RTO will not be allowed. SACKed skb can turn into non-SACked only in the extremely rare case of SACK reneging, in this case we might fail to detect retransmissions if there were them for any other than head. To get rid of that feature, whole rexmit queue would have to be walked (always) or FRTO should be prevented when SACK reneging happens. Of course RTO should still trigger after reneging which makes this issue even less likely to show up. And as long as the response is as conservative as it's now, nothing bad happens even then. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:12 -07:00
Ilpo Järvinen	bdaae17da8	[TCP] FRTO: Moved tcp_use_frto from tcp.h to tcp_input.c In addition, removed inline. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:02 -07:00
Patrick McHardy	c01003c205	[IFB]: Fix crash on input device removal The input_device pointer is not refcounted, which means the device may disappear while packets are queued, causing a crash when ifb passes packets with a stale skb->dev pointer to netif_rx(). Fix by storing the interface index instead and do a lookup where neccessary. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-29 11:46:52 -07:00
Jeff Garzik	a9c87a10db	Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream-fixes	2007-03-28 02:21:18 -04:00
Jean Tourrilhes	c2805fbb86	[PATCH] WE-22 : prevent information leak on 64 bit Johannes Berg discovered that kernel space was leaking to userspace on 64 bit platform. He made a first patch to fix that. This is an improved version of his patch. Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2007-03-27 14:10:26 -04:00
David S. Miller	f11e6659ce	[IPV6]: Fix routing round-robin locking. As per RFC2461, section 6.3.6, item #2, when no routers on the matching list are known to be reachable or probably reachable we do round robin on those available routes so that we make sure to probe as many of them as possible to detect when one becomes reachable faster. Each routing table has a rwlock protecting the tree and the linked list of routes at each leaf. The round robin code executes during lookup and thus with the rwlock taken as a reader. A small local spinlock tries to provide protection but this does not work at all for two reasons: 1) The round-robin list manipulation, as coded, goes like this (with read lock held): walk routes finding head and tail spin_lock(); rotate list using head and tail spin_unlock(); While one thread is rotating the list, another thread can end up with stale values of head and tail and then proceed to corrupt the list when it gets the lock. This ends up causing the OOPS in fib6_add() later onthat many people have been hitting. 2) All the other code paths that run with the rwlock held as a reader do not expect the list to change on them, they expect it to remain completely fixed while they hold the lock in that way. So, simply stated, it is impossible to implement this correctly using a manipulation of the list without violating the rwlock locking semantics. Reimplement using a per-fib6_node round-robin pointer. This way we don't need to manipulate the list at all, and since the round-robin pointer can only ever point to real existing entries we don't need to perform any locking on the changing of the round-robin pointer itself. We only need to reset the round-robin pointer to NULL when the entry it is pointing to is removed. The idea is from Thomas Graf and it is very similar to how this was implemented before the advanced router selection code when in. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:05 -07:00
Alexey Kuznetsov	ecbb416939	[NET]: Fix neighbour destructor handling. ->neigh_destructor() is killed (not used), replaced with ->neigh_cleanup(), which is called when neighbor entry goes to dead state. At this point everything is still valid: neigh->dev, neigh->parms etc. The device should guarantee that dead neighbor entries (neigh->dead != 0) do not get private part initialized, otherwise nobody will cleanup it. I think this is enough for ipoib which is the only user of this thing. Initialization private part of neighbor entries happens in ipib start_xmit routine, which is not reached when device is down. But it would be better to add explicit test for neigh->dead in any case. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:01 -07:00
Thomas Graf	e1701c68c1	[NET]: Fix fib_rules compatibility breakage Based upon a patch from Patrick McHardy. The fib_rules netlink attribute policy introduced in 2.6.19 broke userspace compatibilty. When specifying a rule with "from all" or "to all", iproute adds a zero byte long netlink attribute, but the policy requires all addresses to have a size equal to sizeof(struct in_addr)/sizeof(struct in6_addr), resulting in a validation error. Check attribute length of FRA_SRC/FRA_DST in the generic framework by letting the family specific rules implementation provide the length of an address. Report an error if address length is non zero but no address attribute is provided. Fix actual bug by checking address length for non-zero instead of relying on availability of attribute. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-25 18:48:00 -07:00
Vlad Yasevich	749bf9215e	[SCTP]: Reset some transport and association variables on restart If the association has been restarted, we need to reset the transport congestion variables as well as accumulated error counts and CACC variables. If we do not, the association will use the wrong values and may terminate prematurely. This was found with a scenario where the peer restarted the association when lksctp was in the last HB timeout for its association. The restart happened, but the error counts have not been reset and when the timeout occurred, a newly restarted association was terminated due to excessive retransmits. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-20 00:09:45 -07:00
Vlad Yasevich	0b58a81146	[SCTP]: Clean up stale data during association restart During association restart we may have stale data sitting on the ULP queue waiting for ordering or reassembly. This data may cause severe problems if not cleaned up. In particular stale data pending ordering may cause problems with receive window exhaustion if our peer has decided to restart the association. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-20 00:09:43 -07:00
Eric Paris	ef41aaa0b7	[IPSEC]: xfrm_policy delete security check misplaced The security hooks to check permissions to remove an xfrm_policy were actually done after the policy was removed. Since the unlinking and deletion are done in xfrm_policy_by* functions this moves the hooks inside those 2 functions. There we have all the information needed to do the security check and it can be done before the deletion. Since auditing requires the result of that security check err has to be passed back and forth from the xfrm_policy_by* functions. This patch also fixes a bug where a deletion that failed the security check could cause improper accounting on the xfrm_policy (xfrm_get_policy didn't have a put on the exit path for the hold taken by xfrm_policy_by) It also fixes the return code when no policy is found in xfrm_add_pol_expire. In old code (at least back in the 2.6.18 days) err wasn't used before the return when no policy is found and so the initialization would cause err to be ENOENT. But since err has since been used above when we don't get a policy back from the xfrm_policy_by function we would always return 0 instead of the intended ENOENT. Also fixed some white space damage in the same area. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Venkat Yekkirala <vyekkirala@trustedcs.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-07 16:08:09 -08:00
David S. Miller	64a146513f	[NET]: Revert incorrect accept queue backlog changes. This reverts two changes: `8488df894d` `248f06726e` A backlog value of N really does mean allow "N + 1" connections to queue to a listening socket. This allows one to specify "0" as the backlog and still get 1 connection. Noticed by Gerrit Renker and Rick Jones. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-06 11:21:05 -08:00
Eric Dumazet	187f5f84ef	[INET]: twcal_jiffie should be unsigned long, not int Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-05 13:32:48 -08:00
Patrick McHardy	ec68e97ded	[NETFILTER]: conntrack: fix {nf,ip}_ct_iterate_cleanup endless loops Fix {nf,ip}_ct_iterate_cleanup unconfirmed list handling: - unconfirmed entries can not be killed manually, they are removed on confirmation or final destruction of the conntrack entry, which means we might iterate forever without making forward progress. This can happen in combination with the conntrack event cache, which holds a reference to the conntrack entry, which is only released when the packet makes it all the way through the stack or a different packet is handled. - taking references to an unconfirmed entry and using it outside the locked section doesn't work, the list entries are not refcounted and another CPU might already be waiting to destroy the entry What the code really wants to do is make sure the references of the hash table to the selected conntrack entries are released, so they will be destroyed once all references from skbs and the event cache are dropped. Since unconfirmed entries haven't even entered the hash yet, simply mark them as dying and skip confirmation based on that. Reported and tested by Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-05 13:25:18 -08:00
Wei Dong	8488df894d	[NET]: Fix bugs in "Whether sock accept queue is full" checking when I use linux TCP socket, and find there is a bug in function sk_acceptq_is_full(). When a new SYN comes, TCP module first checks its validation. If valid, send SYN,ACK to the client and add the sock to the syn hash table. Next time if received the valid ACK for SYN,ACK from the client. server will accept this connection and increase the sk->sk_ack_backlog -- which is done in function tcp_check_req().We check wether acceptq is full in function tcp_v4_syn_recv_sock(). Consider an example: After listen(sockfd, 1) system call, sk->sk_max_ack_backlog is set to 1. As we know, sk->sk_ack_backlog is initialized to 0. Assuming accept() system call is not invoked now. 1. 1st connection comes. invoke sk_acceptq_is_full(). sk- >sk_ack_backlog=0 sk->sk_max_ack_backlog=1, function return 0 accept this connection. Increase the sk->sk_ack_backlog 2. 2nd connection comes. invoke sk_acceptq_is_full(). sk- >sk_ack_backlog=1 sk->sk_max_ack_backlog=1, function return 0 accept this connection. Increase the sk->sk_ack_backlog 3. 3rd connection comes. invoke sk_acceptq_is_full(). sk- >sk_ack_backlog=2 sk->sk_max_ack_backlog=1, function return 1. Refuse this connection. I think it has bugs. after listen system call. sk->sk_max_ack_backlog=1 but now it can accept 2 connections. Signed-off-by: Wei Dong <weid@np.css.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-02 20:37:33 -08:00
Patrick McHardy	4498121ca3	[NET]: Handle disabled preemption in gfp_any() ctnetlink uses netlink_unicast from an atomic_notifier_chain (which is called within a RCU read side critical section) without holding further locks. netlink_unicast calls netlink_trim with the result of gfp_any() for the gfp flags, which are passed down to pskb_expand_header. gfp_any() only checks for softirq context and returns GFP_KERNEL, resulting in this warning: BUG: sleeping function called from invalid context at mm/slab.c:3032 in_atomic():1, irqs_disabled():0 no locks held by rmmod/7010. Call Trace: [<ffffffff8109467f>] debug_show_held_locks+0x9/0xb [<ffffffff8100b0b4>] __might_sleep+0xd9/0xdb [<ffffffff810b5082>] __kmalloc+0x68/0x110 [<ffffffff811ba8f2>] pskb_expand_head+0x4d/0x13b [<ffffffff81053147>] netlink_broadcast+0xa5/0x2e0 [<ffffffff881cd1d7>] :nfnetlink:nfnetlink_send+0x83/0x8a [<ffffffff8834f6a6>] :nf_conntrack_netlink:ctnetlink_conntrack_event+0x94c/0x96a [<ffffffff810624d6>] notifier_call_chain+0x29/0x3e [<ffffffff8106251d>] atomic_notifier_call_chain+0x32/0x60 [<ffffffff881d266d>] :nf_conntrack:destroy_conntrack+0xa5/0x1d3 [<ffffffff881d194e>] :nf_conntrack:nf_ct_cleanup+0x8c/0x12c [<ffffffff881d4614>] :nf_conntrack:kill_l3proto+0x0/0x13 [<ffffffff881d482a>] :nf_conntrack:nf_conntrack_l3proto_unregister+0x90/0x94 [<ffffffff883551b3>] :nf_conntrack_ipv4:nf_conntrack_l3proto_ipv4_fini+0x2b/0x5d [<ffffffff8109d44f>] sys_delete_module+0x1b5/0x1e6 [<ffffffff8105f245>] trace_hardirqs_on_thunk+0x35/0x37 [<ffffffff8105911e>] system_call+0x7e/0x83 Since netlink_unicast is supposed to be callable from within RCU read side critical sections, make gfp_any() check for in_atomic() instead of in_softirq(). Additionally nfnetlink_send needs to use gfp_any() as well for the call to netlink_broadcast). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-28 09:42:13 -08:00
Adrian Bunk	a39a21982c	[IRDA] net/irda/: proper prototypes This patch adds proper prototypes for some functions in include/net/irda/irda.h Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-26 11:42:43 -08:00
Kazunori MIYAZAWA	73d605d1ab	[IPSEC]: changing API of xfrm6_tunnel_register This patch changes xfrm6_tunnel register and deregister interface to prepare for solving the conflict of device tunnels with inter address family IPsec tunnel. There is no device which conflicts with IPv4 over IPv6 IPsec tunnel. Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-13 12:55:55 -08:00
Kazunori MIYAZAWA	c0d56408e3	[IPSEC]: Changing API of xfrm4_tunnel_register. This patch changes xfrm4_tunnel register and deregister interface to prepare for solving the conflict of device tunnels with inter address family IPsec tunnel. Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-13 12:54:47 -08:00
Patrick McHardy	fe3eb20c1a	[NETFILTER]: nf_conntrack: change nf_conntrack_l[34]proto_unregister to void No caller checks the return value, and since its usually called within the module unload path there's nothing a module could do about errors anyway, so BUG on invalid conditions and return void. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:14:28 -08:00
Patrick McHardy	c0e912d7ed	[NETFILTER]: nf_conntrack: fix invalid conntrack statistics RCU assumption NF_CT_STAT_INC assumes rcu_read_lock in nf_hook_slow disables preemption as well, making it legal to use __get_cpu_var without disabling preemption manually. The assumption is not correct anymore with preemptable RCU, additionally we need to protect against softirqs when not holding nf_conntrack_lock. Add NF_CT_STAT_INC_ATOMIC macro, which disables local softirqs, and use where necessary. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:13:43 -08:00
Patrick McHardy	923f4902fe	[NETFILTER]: nf_conntrack: properly use RCU API for nf_ct_protos/nf_ct_l3protos arrays Replace preempt_{enable,disable} based RCU by proper use of the RCU API and add missing rcu_read_lock/rcu_read_unlock calls in all paths not obviously only used within packet process context (nfnetlink_conntrack). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:12:57 -08:00
Arjan van de Ven	540473208f	[PATCH] mark struct file_operations const 1 Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:44 -08:00

1 2 3 4 5 ...

1108 Commits