linux

Author	SHA1	Message	Date
Gerrit Renker	23479cbfd3	dccp: Clean up old feature-negotiation infrastructure The code removed by this patch is no longer referenced or used, the added lines update documentation and copyrights. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:30 +02:00
Gerrit Renker	c49b22729f	dccp: Integration of dynamic feature activation - part 3 (client side) This integrates feature-activation in the client, with these details: 1. When dccp_parse_options() fails, the reset code is already set, request_sent _state_process() currently overrides this with `Packet Error', which is not intended - so changed to use the reset code set in dccp_parse_options(); 2. There was a FIXME to change the error code when dccp_ackvec_add() fails. I have looked this up and found that: * the check whether ackno < ISN is already made earlier, * this Response is likely the 1st packet with an Ackno that the client gets, * so when dccp_ackvec_add() fails, the reason is likely not a packet error. 3. When feature negotiation fails, the socket should be marked as not usable, so that the application is notified that an error occurs. This is achieved by a new label, which uses an error code of `Aborted' and which sets the socket state to CLOSED, as well as sk_err. 4. Avoids parsing the Ack twice in Respond state by not doing option processing again in dccp_rcv_respond_partopen_state_process (as option processing has already been done on the request_sock in dccp_check_req). Since this addresses congestion-control initialisation, a corresponding FIXME has been removed. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:30 +02:00
Gerrit Renker	e70cacb90d	dccp: Integration of dynamic feature activation - part 2 (server side) This patch integrates the activation of features at the end of negotiation into the server-side code. Note: In dccp_create_openreq_child the request_sock argument is no longer constant, since dccp_activate_values() uses the feature-negotiation list on dreq to sort out the initialisation values for the different features of the child socket; and purges this queue after use (but the `req' argument to openreq_child can and does still remain constant). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:30 +02:00
Gerrit Renker	3a53a9adfa	dccp: Integration of dynamic feature activation - part 1 (socket setup) This first patch out of three replaces the hardcoded default settings with initialisation code for the dynamic feature negotiation. Note on retransmitting Confirm options: --------------------------------------- This patch also defers flushing the client feature-negotiation queue, due to the following considerations. As long as the client is in PARTOPEN, it needs to retransmit the Confirm options for the Change options received on the DCCP-Response from the server. Otherwise, if the packet containing the Confirm options gets dropped in the network, the connection aborts due to undefined feature negotiation state. Thanks to Leandro Melo de Sales who reported a bug in an earlier revision of the patch set, resulting from not retransmitting the Confirm options. The patch now ensures that the client feature-negotiation queue is flushed only when entering the OPEN state. Since confirmed Change options are removed as soon as they are confirmed (in the DCCP-Response), this ensures that Confirm options are retransmitted. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:30 +02:00
Gerrit Renker	c926c6aed3	dccp: Feature activation handlers This patch provides the post-processing of feature negotiation state, after the negotiation has completed. To this purpose, handlers are used and added to the dccp_feat_table. Each handler is passed a boolean flag whether the RX or TX side of the feature is meant. Several handlers are provided already, new handlers can easily be added. The initialisation is now fully dynamic, i.e. CCIDs are activated only after the feature negotiation. The integration of this dynamic activation is done in the subsequent patches. Thanks to Wei Yongjun for pointing out the necessity of skipping over empty Confirm options while copying the negotiated feature values. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:30 +02:00
Gerrit Renker	d2150b7bff	dccp: Processing Confirm options Analogous to the previous patch, this adds code to interpret incoming Confirm feature-negotiation options. Both functions operate on the feature-negotiation list of either the request_sock (server) or the dccp_sock (client). Thanks to Wei Yongjun for pointing out that it is overly restrictive to check the entire list of confirmed SP values. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:29 +02:00
Gerrit Renker	5a146b97d5	dccp: Process incoming Change feature-negotiation options This adds/replaces code for processing incoming ChangeL/R options. The main difference is that: * mandatory FN options are now interpreted inside the function (there are too many individual cases to do this externally); * the function returns an appropriate Reset code or 0, which is then used to fill in the data for the Reset packet. Old code, which is no longer used or referenced, has been removed. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:29 +02:00
Gerrit Renker	c664d4f4e2	dccp: Preference list reconciliation This provides two functions to * reconcile preference lists (with appropriate return codes) and * reorder the preference list if successful reconciliation changed the preferred value. The patch also removes the old code for processing SP/NN Change options, since new code to process these is mostly there already; related references have been commented out. The code for processing Change options follows in the next patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:29 +02:00
Gerrit Renker	f8a644c07e	dccp: Integrate feature-negotiation insertion code The patch implements insertion of feature negotiation at the server (listening and request socket) and the client (connecting socket). In dccp_insert_options(), several statements have been grouped together now to achieve (I hope) better efficiency by reducing the number of tests each packet has to go through: - Ack Vectors are sent if the packet is neither a Data or a Request packet; - a previous issue is corrected - feature negotiation options are allowed on DataAck packets (5.8). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:29 +02:00
Gerrit Renker	0ef118a017	dccp: Insert feature-negotiation options into skb This patch replaces the earlier insertion routine from options.c, so that code specific to feature negotiation can remain in feat.c. This is possible by calling a function already existing in options.c. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:29 +02:00
Gerrit Renker	cf9ddf73b9	dccp: Header option insertion routine for feature-negotiation The patch extends existing code: * Confirm options divide into the confirmed value plus an optional preference list for SP values. Previously only the preference list was echoed for SP values, now the confirmed value is added as per RFC 4340, 6.1; * length and sanity checks are added to avoid illegal memory (or NULL) access. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:29 +02:00
Gerrit Renker	d0440ee6f6	dccp: Support for Mandatory options Support for Mandatory options is provided by this patch, which will be used by subsequent feature-negotiation patches. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2008-09-04 07:45:28 +02:00
Gerrit Renker	b9aaac1c53	dccp: Increase the scope of variable-length htonl/ntohl functions This extends the scope of two available functions, encode\|decode_value_var, to work up to 6 (8) bytes, to match maximum requirements in the RFC. These functions are going to be used both by general option processing and feature negotiation code, hence declarations have been put into feat.h. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2008-09-04 07:45:28 +02:00
Gerrit Renker	c8041e264b	dccp: API to query the current TX/RX CCID This provides function to query the current TX/RX CCID dynamically, without reliance on the minisock value, using dynamic information available in the currently loaded CCID module. This query function is then used to (a) provide the getsockopt part for getting/setting CCIDs via sockopts; (b) replace the current test for "which CCID is in use" in probe.c. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:28 +02:00
Gerrit Renker	fade756f18	dccp: Set per-connection CCIDs via socket options With this patch, TX/RX CCIDs can now be changed on a per-connection basis, which overrides the defaults set by the global sysctl variables for TX/RX CCIDs. To make full use of this facility, the remaining patches of this patch set are needed, which track dependencies and activate negotiated feature values. Note on the maximum number of CCIDs that can be registered: ----------------------------------------------------------- The maximum number of CCIDs that can be registered on the socket is constrained by the space in a Confirm/Change feature negotiation option. The space in these in turn depends on the size of header options as defined in RFC 4340, 5.8. Since this is a recurring constant, it has been moved from ackvec.h into linux/dccp.h, clarifying its purpose. Relative to this size, the maximum number of CCID identifiers that can be present in a Confirm option (which always consumes 1 byte more than a Change option, cf. 6.1) is 2 bytes less than the maximum TLV size: one for the CCID-feature-type and one for the selected value. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:28 +02:00
Gerrit Renker	73bbe095bb	dccp: Tidy up setsockopt calls This splits the setsockopt calls into two groups, depending on whether an integer argument (val) is required and whether routines being called do their own locking. Some options (such as setting the CCID) use u8 rather than int, so that for these the test with regard to integer-sizeof can not be used. The second switch-case statement now only has those statements which need locking and which make use of `val'. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Eugene Teo <eugeneteo@kernel.sg>	2008-09-04 07:45:28 +02:00
Gerrit Renker	17c30b40ed	dccp: Deprecate Ack Ratio sysctl This patch deprecates the Ack Ratio sysctl, since * Ack Ratio is entirely ignored by CCID-3 and CCID-4, * Ack Ratio currently doesn't work in CCID-2 (i.e. is always set to 1); * even if it would work in CCID-2, there is no point for a user to change it: - Ack Ratio is constrained by cwnd (RFC 4341, 6.1.2), - if Ack Ratio > cwnd, the system resorts to spurious RTO timeouts (since waiting for Acks which will never arrive in this window), - cwnd is not a user-configurable value. The only reasonable place for Ack Ratio is to print it for debugging. It is planned to do this later on, as part of e.g. dccp_probe. With this patch Ack Ratio is now under full control of feature negotiation: * Ack Ratio is resolved as a dependency of the selected CCID; * if the chosen CCID supports it (i.e. CCID == CCID-2), Ack Ratio is set to the default of 2, following RFC 4340, 11.3 - "New connections start with Ack Ratio 2 for both endpoints"; * what happens then is part of another patch set, since it concerns the dynamic update of Ack Ratio while the connection is in full flight. Thanks to Tomasz Grobelny for discussion leading up to this patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2008-09-04 07:45:28 +02:00
Gerrit Renker	20f41eee82	dccp: Feature negotiation for minimum-checksum-coverage This provides feature negotiation for server minimum checksum coverage which so far has been missing. Since sender/receiver coverage values range only from 0...15, their type has also been reduced in size from u16 to u4. Feature-negotiation options are now generated for both sender and receiver coverage, i.e. when the peer has `forgotten' to enable partial coverage then feature negotiation will automatically enable (negotiate) the partial coverage value for this connection. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:27 +02:00
Gerrit Renker	668144f7b4	dccp: Deprecate old setsockopt framework The previous setsockopt interface, which passed socket options via struct dccp_so_feat, is complicated/difficult to use. Continuing to support it leads to ugly code since the old approach did not distinguish between NN and SP values. This patch removes the old setsockopt interface and replaces it with two new functions to register NN/SP values for feature negotiation. These are essentially wrappers around the internal __feat_register functions, with checking added to avoid * wrong usage (type); * changing values while the connection is in progress. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:27 +02:00
Gerrit Renker	d4c8741c43	dccp: Mechanism to resolve CCID dependencies This adds a hook to resolve features whose value depends on the choice of CCID. It is done at the server since it can only be done after the CCID values have been negotiated; i.e. the client will add its CCID preference list on the Change options sent in the Request, which will be reconciled with the local preference list of the server. The concept is documented on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\ implementation_notes.html#ccid_dependencies Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:27 +02:00
Gerrit Renker	093e1f46cf	dccp: Resolve dependencies of features on choice of CCID This provides a missing link in the code chain, as several features implicitly depend and/or rely on the choice of CCID. Most notably, this is the Send Ack Vector feature, but also Ack Ratio and Send Loss Event Rate (also taken care of). For Send Ack Vector, the situation is as follows: * since CCID2 mandates the use of Ack Vectors, there is no point in allowing endpoints which use CCID2 to disable Ack Vector features such a connection; * a peer with a TX CCID of CCID2 will always expect Ack Vectors, and a peer with a RX CCID of CCID2 must always send Ack Vectors (RFC 4341, sec. 4); * for all other CCIDs, the use of (Send) Ack Vector is optional and thus negotiable. However, this implies that the code negotiating the use of Ack Vectors also supports it (i.e. is able to supply and to either parse or ignore received Ack Vectors). Since this is not the case (CCID-3 has no Ack Vector support), the use of Ack Vectors is here disabled, with a comment in the source code. An analogous consideration arises for the Send Loss Event Rate feature, since the CCID-3 implementation does not support the loss interval options of RFC 4342. To make such use explicit, corresponding feature-negotiation options are inserted which signal the use of the loss event rate option, as it is used by the CCID3 code. Lastly, the values of the Ack Ratio feature are matched to the choice of CCID. The patch implements this as a function which is called after the user has made all other registrations for changing default values of features. The table is variable-length, the reserved (and hence for feature-negotiation invalid, confirmed by considering section 19.4 of RFC 4340) feature number `0' is used to mark the end of the table. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:27 +02:00
Gerrit Renker	71bb49596b	dccp: Query supported CCIDs This provides a data structure to record which CCIDs are locally supported and three accessor functions: - a test function for internal use which is used to validate CCID requests made by the user; - a copy function so that the list can be used for feature-negotiation; - documented getsockopt() support so that the user can query capabilities. The data structure is a table which is filled in at compile-time with the list of available CCIDs (which in turn depends on the Kconfig choices). Using the copy function for cloning the list of supported CCIDs is useful for feature negotiation, since the negotiation is now with the full list of available CCIDs (e.g. {2, 3}) instead of the default value {2}. This means negotiation will not fail if the peer requests to use CCID3 instead of CCID2. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:27 +02:00
Gerrit Renker	86349c8d9c	dccp: Registration routines for changing feature values Two registration routines, for SP and NN features, are provided by this patch, replacing a previous routine which was used for both feature types. These are internal-only routines and therefore start with `__feat_register'. It further exports the known limits of Sequence Window and Ack Ratio as symbolic constants. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:27 +02:00
Gerrit Renker	5591d28628	dccp: Limit feature negotiation to connection setup phase This patch starts the new implementation of feature negotiation: 1. Although it is theoretically possible to perform feature negotiation at any time (and RFC 4340 supports this), in practice this is prohibitively complex, as it requires to put traffic on hold for each new negotiation. 2. As a byproduct of restricting feature negotiation to connection setup, the feature-negotiation retransmit timer is no longer required. This part is now mapped onto the protocol-level retransmission. Details indicating why timers are no longer needed can be found on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\ implementation_notes.html This patch disables anytime negotiation, subsequent patches work out full feature negotiation support for connection setup. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:27 +02:00
Gerrit Renker	702083839b	dccp: Cleanup routines for feature negotiation This inserts the required de-allocation routines for memory allocated by feature negotiation in the socket destructors, replacing dccp_feat_clean() in one instance. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:26 +02:00
Gerrit Renker	828755cee0	dccp: Per-socket initialisation of feature negotiation This provides feature-negotiation initialisation for both DCCP sockets and DCCP request_sockets, to support feature negotiation during connection setup. It also resolves a FIXME regarding the congestion control initialisation. Thanks to Wei Yongjun for help with the IPv6 side of this patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:26 +02:00
Gerrit Renker	3001fc0569	dccp: List management for new feature negotiation This adds list fields and list management functions for the new feature negotiation implementation. The new code is kept in parallel to the old code, until removed at the end of the patch set. Thanks to Arnaldo for suggestions to improve the code. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:26 +02:00
Gerrit Renker	b4eec20637	dccp: Implement lookup table for feature-negotiation information A lookup table for feature-negotiation information, extracted from RFC 4340/42, is provided by this patch. All currently known features can be found in this table, along with their feature location, their default value, and type. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:26 +02:00
Gerrit Renker	5c7c9451f1	dccp: Basic data structure for feature negotiation This patch prepares for the new and extended feature-negotiation routines. The following feature-negotiation data structures are provided: * a container for the various (SP or NN) values, * symbolic state names to track feature states, * an entry struct which holds all current information together, * elementary functions to fill in and process these structures. Entry structs are arranged as FIFO for the following reason: RFC 4340 specifies that if multiple options of the same type are present, they are processed in the order of their appearance in the packet; which means that this order needs to be preserved in the local data structure (the later insertion code also respects this order). The struct list_head has been chosen for the following reasons: the most frequent operations are * add new entry at tail (when receiving Change or setting socket options); * delete entry (when Confirm has been received); * deep copy of entire list (cloning from listening socket onto request socket). The NN value has been set to 64 bit, which is a currently sufficient upper limit (Sequence Window feature has 48 bit). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-09-04 07:45:26 +02:00
Gerrit Renker	959fd992f0	dccp ccid-3: Replace lazy BUG_ON with condition The BUG_ON(w_tot == 0) only holds if there is no more than 1 loss interval in the loss history. If there is only a single loss interval, the calc_i_mean() routine need in fact not be called (RFC 3448, 6.3.1). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:25 +02:00
Gerrit Renker	432649916b	dccp: Toggle debug output without module unloading This sets the sysfs permissions so that root can toggle the `debug' parameter available for nearly every DCCP module. This is useful since there are various module inter-dependencies. The debug flag can now be toggled at runtime using echo 1 > /sys/module/dccp/parameters/dccp_debug echo 1 > /sys/module/dccp_ccid2/parameters/ccid2_debug echo 1 > /sys/module/dccp_ccid3/parameters/ccid3_debug echo 1 > /sys/module/dccp_tfrc_lib/parameters/tfrc_debug The last is not very useful yet, since no code at the moment calls the tfrc_debug() macro. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:25 +02:00
Gerrit Renker	48816322ad	dccp: Empty the write queue when disconnecting dccp_disconnect() can be called due to several reasons: 1. when the connection setup failed (inet_stream_connect()); 2. when shutting down (inet_shutdown(), inet_csk_listen_stop()); 3. when aborting the connection (dccp_close() with 0 linger time). In case (1) the write queue is empty. This patch empties the write queue, if in case (2) or (3) it was not yet empty. This avoids triggering the write-queue BUG_TRAP in sk_stream_kill_queues() later on. It also seems natural to do: when breaking an association, to delete all packets that were originally intended for the soon-disconnected end (compare with call to tcp_write_queue_purge in tcp_disconnect()). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:25 +02:00
Gerrit Renker	eac7726bf5	dccp: Fill in the Data fields for "Option Error" Resets This updates the use of the `out_invalid_option' label, which produces a Reset (code 5, "Option Error"), to fill in the Data1...Data3 fields as specified in RFC 4340, 5.6. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:25 +02:00
Gerrit Renker	faf61c3319	dccp: Silently ignore options with nonsensical lengths This updates the option-parsing code with regard to RFC 4340, 5.8: "[..] options with nonsensical lengths (length byte less than two or more than the remaining space in the options portion of the header) MUST be ignored, and any option space following an option with nonsensical length MUST likewise be ignored." Hence in the following cases erratic options will be ignored: 1. The type byte of a multi-byte option is the last byte of the header options (i.e. effective option length of 1). 2. The value of the length byte is less than the minimum 2. This has been changed from previously 3: although no multi-byte option with a length less than 3 yet exists (cf. table 3 in 5.8), a length of 2 is valid. (The switch-statement in dccp_parse has further per-option length checks.) 3. The option length exceeds the length of the remaining option space. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:24 +02:00
Wei Yongjun	ba1a6c7bc0	dccp: Always generate a Reset in response to option errors RFC4340 states that if a packet is received with an option error (such as a Mandatory Option as the last byte of the option list), the endpoint should repond with a Reset. In the LISTEN and RESPOND states, the endpoint correctly reponds with Reset, while in the REQUEST/OPEN states, packets with option errors are just ignored. The packet sequence is as follows: Case 1: Endpoint A Endpoint B (CLOSED) (CLOSED) <---------------- REQUEST RESPONSE -----------------> (1) (with invalid option) <---------------- RESET (with Reset Code 5, "Option Error") (1) currently just ignored, no Reset is sent Case 2: Endpoint A Endpoint B (OPEN) (OPEN) DATA-ACK -----------------> (2) (with invalid option) <---------------- RESET (with Reset Code 5, "Option Error") (2) currently just ignored, no Reset is sent This patch fixes the problem, by generating a Reset instead of silently ignoring option errors. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-09-04 07:45:24 +02:00
Linus Torvalds	316343e2cf	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: bnx2x: Accessing un-mapped page ath9k: Fix TX control flag use for no ACK and RTS/CTS ath9k: Fix TX status reporting iwlwifi: fix STATUS_EXIT_PENDING is not set on pci_remove iwlwifi: call apm stop on exit iwlwifi: fix Tx cmd memory allocation failure handling iwlwifi: fix rx_chain computation iwlwifi: fix station mimo power save values iwlwifi: remove false rxon if rx chain changes iwlwifi: fix hidden ssid discovery in passive channels iwlwifi: W/A for the TSF correction in IBSS netxen: Remove workaround for chipset quirk pcnet-cs, axnet_cs: add new IDs, remove dup ID with less info ixgbe: initialize interrupt throttle rate net/usb/pegasus: avoid hundreds of diagnostics tipc: Don't use structure names which easily globally conflict.	2008-09-03 16:21:02 -07:00
David S. Miller	6c00055a81	tipc: Don't use structure names which easily globally conflict. Andrew Morton reported a build failure on sparc32, because TIPC uses names like "struct node" and there is a like named data structure defined in linux/node.h This just regexp replaces "struct node" to "struct tipc_node" to avoid this and any future similar problems. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-02 23:38:32 -07:00
Linus Torvalds	d26acd92fa	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: ipsec: Fix deadlock in xfrm_state management. ipv: Re-enable IP when MTU > 68 net/xfrm: Use an IS_ERR test rather than a NULL test ath9: Fix ath_rx_flush_tid() for IRQs disabled kernel warning message. ath9k: Incorrect key used when group and pairwise ciphers are different. rt2x00: Compiler warning unmasked by fix of BUILD_BUG_ON mac80211: Fix debugfs union misuse and pointer corruption wireless/libertas/if_cs.c: fix memory leaks orinoco: Multicast to the specified addresses iwlwifi: fix 64bit platform firmware loading iwlwifi: fix apm_stop (wrong bit polarity for FLAG_INIT_DONE) iwlwifi: workaround interrupt handling no some platforms iwlwifi: do not use GFP_DMA in iwl_tx_queue_init net/wireless/Kconfig: clarify the description for CONFIG_WIRELESS_EXT_SYSFS net: Unbreak userspace usage of linux/mroute.h pkt_sched: Fix locking of qdisc_root with qdisc_root_sleeping_lock() ipv6: When we droped a packet, we should return NET_RX_DROP instead of 0	2008-09-02 21:02:14 -07:00
David S. Miller	37b08e34a9	ipsec: Fix deadlock in xfrm_state management. Ever since commit `4c563f7669` ("[XFRM]: Speed up xfrm_policy and xfrm_state walking") it is illegal to call __xfrm_state_destroy (and thus xfrm_state_put()) with xfrm_state_lock held. If we do, we'll deadlock since we have the lock already and __xfrm_state_destroy() tries to take it again. Fix this by pushing the xfrm_state_put() calls after the lock is dropped. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-02 20:14:15 -07:00
Thomas Graf	2c10b32bf5	netlink: Remove compat API for nested attributes Removes all _nested_compat() functions from the API. The prio qdisc no longer requires them and netem has its own format anyway. Their existance is only confusing. Resend: Also remove the wrapper macro. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-02 17:30:27 -07:00
Breno Leitao	06770843c2	ipv: Re-enable IP when MTU > 68 Re-enable IP when the MTU gets back to a valid size. This patch just checks if the in_dev is NULL on a NETDEV_CHANGEMTU event and if MTU is valid (bigger than 68), then re-enable in_dev. Also a function that checks valid MTU size was created. Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-02 17:28:58 -07:00
Julien Brunel	9d7d74029e	net/xfrm: Use an IS_ERR test rather than a NULL test In case of error, the function xfrm_bundle_create returns an ERR pointer, but never returns a NULL pointer. So a NULL test that comes after an IS_ERR test should be deleted. The semantic match that finds this problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @match_bad_null_test@ expression x, E; statement S1,S2; @@ x = xfrm_bundle_create(...) ... when != x = E * if (x != NULL) S1 else S2 // </smpl> Signed-off-by: Julien Brunel <brunel@diku.dk> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-02 17:24:28 -07:00
Jouni Malinen	2b58b20939	mac80211: Fix debugfs union misuse and pointer corruption debugfs union in struct ieee80211_sub_if_data is misused by including a common default_key dentry as a union member. This ends occupying the same memory area with the first dentry in other union members (structures; usually drop_unencrypted). Consequently, debugfs operations on default_key symlinks and drop_unencrypted entry are using the same dentry pointer even though they are supposed to be separate ones. This can lead to removing entries incorrectly or potentially leaving something behind since one of the dentry pointers gets lost. Fix this by moving the default_key dentry to a new struct (common_debugfs) that contains dentries (more to be added in future) that are shared by all vif types. The debugfs union must only be used for vif type-specific entries to avoid this type of pointer corruption. Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-09-02 17:39:50 -04:00
Florian Mickler	d9664741e0	net/wireless/Kconfig: clarify the description for CONFIG_WIRELESS_EXT_SYSFS Current setup with hal and NetworkManager will fail to work without newest hal version with this config option disabled. Although this will solve itself by time, at the moment it is dishonest to say that we don't know any software that uses it, if there are many many people relying on old hal versions. Signed-off-by: Florian Mickler <florian@mickler.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-09-02 15:03:19 -04:00
Linus Torvalds	e77295dc9e	Merge branch 'for-2.6.27' of git://linux-nfs.org/~bfields/linux * 'for-2.6.27' of git://linux-nfs.org/~bfields/linux: nfsd: fix buffer overrun decoding NFSv4 acl sunrpc: fix possible overrun on read of /proc/sys/sunrpc/transports nfsd: fix compound state allocation error handling svcrdma: Fix race between svc_rdma_recvfrom thread and the dto_tasklet	2008-09-02 10:58:11 -07:00
Cyrill Gorcunov	27df6f25ff	sunrpc: fix possible overrun on read of /proc/sys/sunrpc/transports Vegard Nossum reported ---------------------- > I noticed that something weird is going on with /proc/sys/sunrpc/transports. > This file is generated in net/sunrpc/sysctl.c, function proc_do_xprt(). When > I "cat" this file, I get the expected output: > $ cat /proc/sys/sunrpc/transports > tcp 1048576 > udp 32768 > But I think that it does not check the length of the buffer supplied by > userspace to read(). With my original program, I found that the stack was > being overwritten by the characters above, even when the length given to > read() was just 1. David Wagner added (among other things) that copy_to_user could be probably used here. Ingo Oeser suggested to use simple_read_from_buffer() here. The conclusion is that proc_do_xprt doesn't check for userside buffer size indeed so fix this by using Ingo's suggestion. Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> CC: Ingo Oeser <ioe-lkml@rameria.de> Cc: Neil Brown <neilb@suse.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Greg Banks <gnb@sgi.com> Cc: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-09-01 14:24:24 -04:00
David S. Miller	b171e19ed0	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/mlme.c	2008-08-29 23:06:00 -07:00
Jarek Poplawski	102396ae65	pkt_sched: Fix locking of qdisc_root with qdisc_root_sleeping_lock() Use qdisc_root_sleeping_lock() instead of qdisc_root_lock() where appropriate. The only difference is while dev is deactivated, when currently we can use a sleeping qdisc with the lock of noop_qdisc. This shouldn't be dangerous since after deactivation root lock could be used only by gen_estimator code, but looks wrong anyway. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-29 14:27:52 -07:00
Yang Hongyang	3cc76caa98	ipv6: When we droped a packet, we should return NET_RX_DROP instead of 0 Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-29 14:27:51 -07:00
David S. Miller	143b11c03c	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6	2008-08-29 14:02:13 -07:00
Henrique de Moraes Holschuh	1563574448	rfkill: rename rfkill_mutex to rfkill_global_mutex rfkill_mutex and rfkill->mutex are too easy to confuse with each other. Rename rfkill_mutex to rfkill_global_mutex, so that they are easier to tell apart with just one glance. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Michael Buesch <mb@bu3sch.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-29 16:24:11 -04:00
Henrique de Moraes Holschuh	f745ba03a1	rfkill: add WARN and BUG_ON paranoia (v2) BUG_ON() and WARN() the heck out of buggy drivers calling into the rfkill subsystem. Also switch from WARN_ON(1) to the new descriptive WARN(). Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-29 16:24:10 -04:00
Felipe Balbi	01b510b9c2	rfkill: add missing line break Trivial patch adding a missing line break on rfkill_claim_show(). Signed-off-by: Felipe Balbi <felipe.balbi@nokia.com> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.co> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-29 16:24:10 -04:00
Henrique de Moraes Holschuh	849e0576a7	rfkill: use strict_strtoul (v2) Switch sysfs parsing to something that actually works properly. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-29 16:24:10 -04:00
Jouni Malinen	36aedc903e	mac80211/cfg80211: HT capabilities for NEW_STA Allow userspace (e.g., hostapd) to set HT capabilities for associated STAs. This is based on a patch from Zhu Yi <yi.zhu@intel.com> (only the NL80211_ATTR_HT_CAPABILITY for NEW_STA part is included here). Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-29 16:24:09 -04:00
Daniel Wagner	2f58bbf27f	mac80211: Use only precedence level of DSCP field for frame classification Bit 4-5 of DSCP should not be considered by classify_d1. The 802.11 QoS Priority field is only depending on the precedence level. Signed-off-by: Daniel Wagner <wagi@monom.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-29 16:24:05 -04:00
Jouni Malinen	43ac2ca384	mac80211: Handle scan result IEs in one block Clean up and extend scan result processing by storing all the IEs from Beacon/Probe Response frames in a single block instead of allocating memory for each specific IE separately. This removes lot of unnecessary code and automatically supports reporting of new IEs (e.g., IEEE 802.11r) into user space without need to manually extend mac80211 scanning code whenever a new protocol adds IE(s). Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-29 16:24:05 -04:00
Jouni Malinen	9f1ba9062e	mac80211/cfg80211: Add BSS configuration options for AP mode This change adds a new cfg80211 command, NL80211_CMD_SET_BSS, to allow AP mode BSS parameters to be changed from user space (e.g., hostapd). The drivers using mac80211 are expected to be modified with separate changes to use the new BSS info parameter for short slot time in the bss_info_changed() handler. Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-29 16:23:55 -04:00
Johannes Berg	7f93ea3e24	mac80211: fill start-sequence-number for BA session start Otherwise, drivers are required to keep track of the sequence numbers themselves, and they really shouldn't be since we already do it for them. I'll fix the race once we figure out how this code should work at all, it's currently disabled. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-29 16:23:55 -04:00
Eric Dumazet	a627266570	ip: speedup /proc/net/rt_cache handling When scanning route cache hash table, we can avoid taking locks for empty buckets. Both /proc/net/rt_cache and NETLINK RTM_GETROUTE interface are taken into account. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-28 01:11:25 -07:00
Andi Kleen	6be547a61d	inet_diag: Add empty bucket optimization to inet_diag too Skip quickly over empty buckets in inet_diag. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-28 01:09:54 -07:00
Andi Kleen	6eac560407	tcp: Skip empty hash buckets faster in /proc/net/tcp On most systems most of the TCP established/time-wait hash buckets are empty. When walking the hash table for /proc/net/tcp their read locks would always be aquired just to find out they're empty. This patch changes the code to check first if the buckets have any entries before taking the lock, which is much cheaper than taking a lock. Since the hash tables are large this makes a measurable difference on processing /proc/net/tcp, especially on architectures with slow read_lock (e.g. PPC) On a 2GB Core2 system time cat /proc/net/tcp > /dev/null (with a mostly empty hash table) goes from 0.046s to 0.005s. On systems with slower atomics (like P4 or POWER4) or larger hash tables (more RAM) the difference is much higher. This can be noticeable because there are some daemons around who regularly scan /proc/net/tcp. Original idea for this patch from Marcus Meissner, but redone by me. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-28 01:08:02 -07:00
Vlad Yasevich	d97240552c	sctp: fix random memory dereference with SCTP_HMAC_IDENT option. The number of identifiers needs to be checked against the option length. Also, the identifier index provided needs to be verified to make sure that it doesn't exceed the bounds of the array. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 16:09:49 -07:00
Vlad Yasevich	328fc47ea0	sctp: correct bounds check in sctp_setsockopt_auth_key The bonds check to prevent buffer overlflow was not exactly right. It still allowed overflow of up to 8 bytes which is sizeof(struct sctp_authkey). Since optlen is already checked against the size of that struct, we are guaranteed not to cause interger overflow either. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 16:08:54 -07:00
David S. Miller	4d40555250	Merge branch 'lvs-next-2.6' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-2.6	2008-08-27 05:11:26 -07:00
David S. Miller	6c36810a73	Merge branch 'no-iwlwifi' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2008-08-27 04:29:50 -07:00
Hugh Dickins	d994af0d50	ipv4: mode 0555 in ipv4_skeleton vpnc on today's kernel says Cannot open "/proc/sys/net/ipv4/route/flush": d--------- 0 root root 0 2008-08-26 11:32 /proc/sys/net/ipv4/route d--------- 0 root root 0 2008-08-26 19:16 /proc/sys/net/ipv4/neigh Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 02:35:18 -07:00
Philip Love	7982d5e1b3	tcp: fix tcp header size miscalculation when window scale is unused The size of the TCP header is miscalculated when the window scale ends up being 0. Additionally, this can be induced by sending a SYN to a passive open port with a window scale option with value 0. Signed-off-by: Philip Love <love_phil@emc.com> Signed-off-by: Adam Langley <agl@imperialviolet.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 02:33:50 -07:00
Jarek Poplawski	f6f9b93f16	pkt_sched: Fix gen_estimator locks While passing a qdisc root lock to gen_new_estimator() and gen_replace_estimator() dev could be deactivated or even before grafting proper root qdisc as qdisc_sleeping (e.g. qdisc_create), so using qdisc_root_lock() is not enough. This patch adds qdisc_root_sleeping_lock() for this, plus additional checks, where necessary. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 02:25:17 -07:00
Jarek Poplawski	f7a54c13c7	pkt_sched: Use rcu_assign_pointer() to change dev_queue->qdisc These pointers are RCU protected, so proper primitives should be used. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 02:22:07 -07:00
Jarek Poplawski	666d9bbedf	pkt_sched: Fix dev_graft_qdisc() locking During dev_graft_qdisc() dev is deactivated, so qdisc_root_lock() returns wrong lock of noop_qdisc instead of qdisc_sleeping. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 02:15:20 -07:00
Gerrit Renker	eff253c427	dccp ccid-3: Replace lazy BUG_ON with condition The BUG_ON(w_tot == 0) only holds if there is no more than 1 loss interval in the loss history. If there is only a single loss interval, the calc_i_mean() routine need in fact not be called (RFC 3448, 6.3.1). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-08-27 07:22:00 +02:00
Gerrit Renker	157439fa4a	dccp: Toggle debug output without module unloading This sets the sysfs permissions so that root can toggle the `debug' parameter available for nearly every DCCP module. This is useful since there are various module inter-dependencies. The debug flag can now be toggled at runtime using echo 1 > /sys/module/dccp/parameters/dccp_debug echo 1 > /sys/module/dccp_ccid2/parameters/ccid2_debug echo 1 > /sys/module/dccp_ccid3/parameters/ccid3_debug echo 1 > /sys/module/dccp_tfrc_lib/parameters/tfrc_debug The last is not very useful yet, since no code at the moment calls the tfrc_debug() macro. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-08-27 07:22:00 +02:00
Gerrit Renker	b569d5a134	dccp: Empty the write queue when disconnecting dccp_disconnect() can be called due to several reasons: 1. when the connection setup failed (inet_stream_connect()); 2. when shutting down (inet_shutdown(), inet_csk_listen_stop()); 3. when aborting the connection (dccp_close() with 0 linger time). In case (1) the write queue is empty. This patch empties the write queue, if in case (2) or (3) it was not yet empty. This avoids triggering the write-queue BUG_TRAP in sk_stream_kill_queues() later on. It also seems natural to do: when breaking an association, to delete all packets that were originally intended for the soon-disconnected end (compare with call to tcp_write_queue_purge in tcp_disconnect()). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-08-27 07:22:00 +02:00
Gerrit Renker	5a056417e6	dccp: Fill in the Data fields for "Option Error" Resets This updates the use of the `out_invalid_option' label, which produces a Reset (code 5, "Option Error"), to fill in the Data1...Data3 fields as specified in RFC 4340, 5.6. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-08-27 07:22:00 +02:00
Gerrit Renker	1efa6bbac8	dccp: Silently ignore options with nonsensical lengths This updates the option-parsing code with regard to RFC 4340, 5.8: "[..] options with nonsensical lengths (length byte less than two or more than the remaining space in the options portion of the header) MUST be ignored, and any option space following an option with nonsensical length MUST likewise be ignored." Hence in the following cases erratic options will be ignored: 1. The type byte of a multi-byte option is the last byte of the header options (i.e. effective option length of 1). 2. The value of the length byte is less than the minimum 2. This has been changed from previously 3: although no multi-byte option with a length less than 3 yet exists (cf. table 3 in 5.8), a length of 2 is valid. (The switch-statement in dccp_parse has further per-option length checks.) 3. The option length exceeds the length of the remaining option space. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-08-27 07:21:59 +02:00
Wei Yongjun	33c449675c	dccp: Always generate a Reset in response to option errors RFC4340 states that if a packet is received with an option error (such as a Mandatory Option as the last byte of the option list), the endpoint should repond with a Reset. In the LISTEN and RESPOND states, the endpoint correctly reponds with Reset, while in the REQUEST/OPEN states, packets with option errors are just ignored. The packet sequence is as follows: Case 1: Endpoint A Endpoint B (CLOSED) (CLOSED) <---------------- REQUEST RESPONSE -----------------> (1) (with invalid option) <---------------- RESET (with Reset Code 5, "Option Error") (1) currently just ignored, no Reset is sent Case 2: Endpoint A Endpoint B (OPEN) (OPEN) DATA-ACK -----------------> (2) (with invalid option) <---------------- RESET (with Reset Code 5, "Option Error") (2) currently just ignored, no Reset is sent This patch fixes the problem, by generating a Reset instead of silently ignoring option errors. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-08-27 07:21:59 +02:00
Simon Horman	7fd1067851	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-2.6 into lvs-next-2.6	2008-08-27 15:11:37 +10:00
Julius Volz	e3c2ced8d2	IPVS: Rename ip_vs_proto_ah.c to ip_vs_proto_ah_esp.c After integrating ESP into ip_vs_proto_ah, rename it (and the references to it) to ip_vs_proto_ah_esp.c and delete the old ip_vs_proto_esp.c. Signed-off-by: Julius Volz <juliusv@google.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-08-27 13:50:37 +10:00
Julius Volz	409a19669e	IPVS: Integrate ESP protocol into ip_vs_proto_ah.c Rename all ah_* functions to ah_esp_* (and adjust comments). Move ESP protocol definition into ip_vs_proto_ah.c and remove all usage of ip_vs_proto_esp.c. Make the compilation of ip_vs_proto_ah.c dependent on a new config variable, IP_VS_PROTO_AH_ESP, which is selected either by IP_VS_PROTO_ESP or IP_VS_PROTO_AH. Only compile the selected protocols' structures within this file. Signed-off-by: Julius Volz <juliusv@google.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-08-27 13:50:35 +10:00
John W. Linville	576fdeaef6	mac80211: quiet chatty IBSS merge message It seems obvious that this #ifndef should be the opposite polarity... Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-26 20:33:34 -04:00
Jan-Espen Pettersen	8ab65b03b7	mac80211: don't send empty extended rates IE The association request includes a list of supported data rates. 802.11b: 4 supported rates. 802.11g: 12 (8 + 4) supported rates. 802.11a: 8 supported rates. The rates tag of the assoc request has room for only 8 rates. In case of 802.11g an extended rate tag is appended. However in net/wireless/mlme.c an extended (empty) rate tag is also appended if the number of rates is exact 8. This empty (length=0) extended rates tag causes some APs to deny association with code 18 (unsupported rates). These APs include my ZyXEL G-570U, and according to Tomas Winkler som Cisco APs. 'If count == 8' has been used to check for the need for an extended rates tag. But count would also be equal to 8 if the for loop exited because of no more supported rates. Therefore a check for count being less than rates_len would seem more correct. Thanks to: * Dan Williams for newbie guidance * Tomas Winkler for confirming the problem Signed-off-by: Jan-Espen Pettersen <sigsegv@radiotube.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-26 20:06:33 -04:00
Jouni Malinen	93015f0f34	mac80211: Fix debugfs file add/del for netdev Previous version was using incorrect union structures for non-AP interfaces when adding and removing max_ratectrl_rateidx and force_unicast_rateidx entries. Depending on the vif type, this ended up in corrupting debugfs entries since the dentries inside different union structures ended up going being on top of eachother.. As the end result, debugfs files were being left behind with references to freed data (instant kernel oops on access) and directories were not removed properly when unloading mac80211 drivers. This patch fixes those issues by using only a single union structure based on the vif type. Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-26 20:06:33 -04:00
Julia Lawall	667d8af9af	net/mac80211/mesh.c: correct the argument to __mesh_table_free In the function mesh_table_grow, it is the new table not the argument table that should be freed if the function fails (cf commit `bd9b448f4c`) The semantic match that detects this problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @r exists@ local idexpression x; expression E,f; position p1,p2,p3; identifier l; statement S; @@ x = mesh_table_alloc@p1(...) ... if (x == NULL) S ... when != E = x when != mesh_table_free(x) goto@p2 l; ... when != E = x when != f(...,x,...) when any ( return $0\\|x$; \| return@p3 ...; ) @script:python@ p1 << r.p1; p2 << r.p2; p3 << r.p3; @@ print "%s: call on line %s not freed or saved before return on line %s via line %s" % (p1[0].file,p1[0].line,p3[0].line,p2[0].line) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-26 20:06:32 -04:00
Jouni Malinen	087d833e5a	mac80211: Use IWEVASSOCREQIE instead of IWEVCUSTOM The previous code was using IWEVCUSTOM to report IEs from AssocReq and AssocResp frames into user space. This can easily hit the 256 byte limit (IW_CUSTOM_MAX) with APs that include number of vendor IEs in AssocResp. This results in the event message not being sent and dmesg showing "wlan0 (WE) : Wireless Event too big (366)" type of errors. Convert mac80211 to use IWEVASSOCREQIE/IWEVASSOCRESPIE to avoid the issue of being unable to send association IEs as wireless events. These newer event types use binary encoding and larger maximum size (IW_GENERIC_IE_MAX = 1024), so the likelyhood of not being able to send the IEs is much smaller than with IWEVCUSTOM. As an extra benefit, the code is also quite a bit simpler since there is no need to allocate an extra buffer for hex encoding. Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-26 20:06:32 -04:00
Felipe Balbi	988b02f1bf	net: rfkill: add missing line break Trivial patch adding a missing line break on rfkill_claim_show(). Signed-off-by: Felipe Balbi <felipe.balbi@nokia.com> Acked-by: Ivo van Doorn <IvDoorn@gmail.co> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-26 20:06:31 -04:00
Al Viro	ce3113ec57	ipv6: sysctl fixes Braino: net.ipv6 in ipv6 skeleton has no business in rotable class Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-25 15:18:15 -07:00
Al Viro	2f4520d35d	ipv4: sysctl fixes net.ipv4.neigh should be a part of skeleton to avoid ordering problems Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-25 15:17:44 -07:00
Vlad Yasevich	30c2235cbc	sctp: add verification checks to SCTP_AUTH_KEY option The structure used for SCTP_AUTH_KEY option contains a length that needs to be verfied to prevent buffer overflow conditions. Spoted by Eugene Teo <eteo@redhat.com>. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-25 15:16:19 -07:00
Stephen Hemminger	f410a1fba7	ipv6: protocol for address routes This fixes a problem spotted with zebra, but not sure if it is necessary a kernel problem. With IPV6 when an address is added to an interface, Zebra creates a duplicate RIB entry, one as a connected route, and other as a kernel route. When an address is added to an interface the RTN_NEWADDR message causes Zebra to create a connected route. In IPV4 when an address is added to an interface a RTN_NEWROUTE message is set to user space with the protocol RTPROT_KERNEL. Zebra ignores these messages, because it already has the connected route. The problem is that route created in IPV6 has route protocol == RTPROT_BOOT. Was this a design decision or a bug? This fixes it. Same patch applies to both net-2.6 and stable. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-23 05:16:46 -07:00
Ilpo Järvinen	a4356b2920	tcp: Add tcp_parse_aligned_timestamp Some duplicated code lying around. Located with my suffix tree tool. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-23 05:12:29 -07:00
Ilpo Järvinen	2cf46637b5	tcp: Add tcp_collapse_one to eliminate duplicated code Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-23 05:11:41 -07:00
Ilpo Järvinen	cbe2d128a0	tcp: Add tcp_validate_incoming & put duplicated code there Large block of code duplication removed. Sadly, the return value thing is a bit tricky here but it seems the most sensible way to return positive from validator on success rather than negative. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-23 05:10:12 -07:00
Denis V. Lunev	fdc0bde90a	icmp: icmp_sk() should not use smp_processor_id() in preemptible code Pass namespace into icmp_xmit_lock, obtain socket inside and return it as a result for caller. Thanks Alexey Dobryan for this report: Steps to reproduce: CONFIG_PREEMPT=y CONFIG_DEBUG_PREEMPT=y tracepath <something> BUG: using smp_processor_id() in preemptible [00000000] code: tracepath/3205 caller is icmp_sk+0x15/0x30 Pid: 3205, comm: tracepath Not tainted 2.6.27-rc4 #1 Call Trace: [<ffffffff8031af14>] debug_smp_processor_id+0xe4/0xf0 [<ffffffff80409405>] icmp_sk+0x15/0x30 [<ffffffff8040a17b>] icmp_send+0x4b/0x3f0 [<ffffffff8025a415>] ? trace_hardirqs_on_caller+0xd5/0x160 [<ffffffff8025a4ad>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8023a475>] ? local_bh_enable_ip+0x95/0x110 [<ffffffff804285b9>] ? _spin_unlock_bh+0x39/0x40 [<ffffffff8025a26c>] ? mark_held_locks+0x4c/0x90 [<ffffffff8025a4ad>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8025a415>] ? trace_hardirqs_on_caller+0xd5/0x160 [<ffffffff803e91b4>] ip_fragment+0x8d4/0x900 [<ffffffff803e7030>] ? ip_finish_output2+0x0/0x290 [<ffffffff803e91e0>] ? ip_finish_output+0x0/0x60 [<ffffffff803e6650>] ? dst_output+0x0/0x10 [<ffffffff803e922c>] ip_finish_output+0x4c/0x60 [<ffffffff803e92e3>] ip_output+0xa3/0xf0 [<ffffffff803e68d0>] ip_local_out+0x20/0x30 [<ffffffff803e753f>] ip_push_pending_frames+0x27f/0x400 [<ffffffff80406313>] udp_push_pending_frames+0x233/0x3d0 [<ffffffff804067d1>] udp_sendmsg+0x321/0x6f0 [<ffffffff8040d155>] inet_sendmsg+0x45/0x80 [<ffffffff803b967f>] sock_sendmsg+0xdf/0x110 [<ffffffff8024a100>] ? autoremove_wake_function+0x0/0x40 [<ffffffff80257ce5>] ? validate_chain+0x415/0x1010 [<ffffffff8027dc10>] ? __do_fault+0x140/0x450 [<ffffffff802597d0>] ? __lock_acquire+0x260/0x590 [<ffffffff803b9e55>] ? sockfd_lookup_light+0x45/0x80 [<ffffffff803ba50a>] sys_sendto+0xea/0x120 [<ffffffff80428e42>] ? _spin_unlock_irqrestore+0x42/0x80 [<ffffffff803134bc>] ? __up_read+0x4c/0xb0 [<ffffffff8024e0c6>] ? up_read+0x26/0x30 [<ffffffff8020b8bb>] system_call_fastpath+0x16/0x1b icmp6_sk() is similar. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-23 04:43:33 -07:00
Ron Rindjunsky	9859b81eae	mac80211: add direct probe before association This patch adds a direct probe request as first step in the association flow if data we have is not up to date. Motivation of this step is to make sure that the bss information we have is correct, since last scan could have been done a while ago, and beacons do not fully answer this need as there are potential differences between them and probe responses (e.g. WMM parameter element) Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:30:00 -04:00
Ron Rindjunsky	6042a3e3ff	mac80211: change number of pre-assoc scans This patch fixes noticed problem in noisy environments of 50+ APs that scan fails to find the requested AP on first try, which leads to connection refusal. second scan has empirically proven to fix this problem in almost all cases. Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com> Signed-off-by: Esti Kummer <ester.kummer@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:59 -04:00
Tomas Winkler	48c2fc59aa	mac80211: cleanup mlme state namespace This patch move add STA_MLME to station mlme state defines. Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:59 -04:00
Tomas Winkler	8e7cdbb633	mac80211: filter probes in ieee80211_rx_mgmt_probe_resp This patch moves filtering statement from ieee80211_rx_bss_info which is called for both beacon and probe to ieee80211_rx_mgmt_probe_resp and save few cycles in beacon parsing. Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:58 -04:00
Jasper Bryant-Greene	f698d856f6	replace net_device arguments with ieee80211_{local,sub_if_data} as appropriate This patch replaces net_device arguments to mac80211 internal functions with ieee80211_{local,sub_if_data} as appropriate. It also does the same for many 802.11s mesh functions, and changes the mesh path table to be indexed on sub_if_data rather than net_device. If the mesh part needs to be a separate patch let me know, but since mesh uses a lot of mac80211 functions which were being converted anyway, the changes go hand-in-hand somewhat. This patch probably does not convert all the functions which could be converted, but it is a large chunk and followup patches will be provided. Signed-off-by: Jasper Bryant-Greene <jasper@amiton.co.nz> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:58 -04:00
Jasper Bryant-Greene	fef1643bf0	move ETH_P_PAE from ieee80211_i.h to if_ether.h ETH_P_PAE belongs in if_ether.h with the other ETH_P_* definitions. This patch moves it there. Signed-off-by: Jasper Bryant-Greene <jasper@amiton.co.nz> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:57 -04:00
Henrique de Moraes Holschuh	96c87607ac	rfkill: introduce RFKILL_STATE_MAX While it is interesting to not add last-enum-markers because it allows gcc to warn us of switch() statements missing a valid state, we really should be handling memory corruption on a rfkill state with default clauses, anyway. So add RFKILL_STATE_MAX and use it where applicable. It makes for safer code in the long run. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:57 -04:00
Henrique de Moraes Holschuh	77fba13ccc	rfkill: add __must_check annotations rfkill is not a small, mere detail in wireless support. Once it starts supporting rfkill and users start counting on that support, a wireless device is at risk of operating in dangerous conditions should rfkill support fail to properly activate. Therefore, add the required __must_check annotations on some key functions of the rfkill API, for which the wireless drivers absolutely MUST handle the failure mode safely in order to avoid a potentially dangerous situation where the wireless transmitter is left enabled when the user don't want it to. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:57 -04:00
Henrique de Moraes Holschuh	9961920199	rfkill: add default global states (v2) Add a second set of global states, "rfkill_default_states", to track the state that will be used when the first rfkill class of a given type is registered, and also to save "undo" information when rfkill_epo is called. Add a new exported function, rfkill_set_default(), which can be used by platform drivers to restore radio state saved by the platform across reboots or shutdown. Also, fix rfkill_epo to properly update rfkill_states, but still preserve a copy of the state so that we can undo the effect of rfkill_epo later if we want to. Add rfkill_restore_states() to restore rfkill_states from the copy. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:56 -04:00
Henrique de Moraes Holschuh	02589f6051	rfkill: detect bogus double-registering (v2) Detect and abort with -EEXIST if rfkill_register is called twice on the same rfkill struct. And WARN_ON(it) for good measure. While at it, flag when we are adding the first switch of a type, we will need that information later. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:56 -04:00
Luis Carlos Cobo	bdbe819540	mac80211: allow no mac address until firmware load Originally by Johannes Berg. This patch adds support for devices that do not report their MAC address until the firmware is loaded. While the address is not known, a multicast on is used. Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:55 -04:00
Harvey Harrison	4eb2ae9a42	mac80211: remove WLAN_FC_DATA_PRESENT All users are gone now. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:55 -04:00
Harvey Harrison	a4b7d7bda5	mac80211: remove rx/tx_data->fc member Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:54 -04:00
Harvey Harrison	358c8d9d33	mac80211: use ieee80211 frame control directly Remove the last users of the rx/tx_data->fc data members and use the le16 frame_control from the header directly. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:54 -04:00
Harvey Harrison	e7827a7031	mac80211: remove IEEE80211_FC helper Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:54 -04:00
Harvey Harrison	6b644e524b	mac80211: remove ieee80211_get_hdrlen All users have been moved over to the version taking a le16 frame control rather than a cpu-endian value. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:54 -04:00
Harvey Harrison	b73d70ad86	mac80211: rx.c/tx.c remove more users of tx/rx_data->fc Those functions that still use ieee80211_get_hdrlen are moved over to use the little endian frame control. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:53 -04:00
Harvey Harrison	d298487260	mac80211: wep.c replace magic numbers in IV/ICV removal Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:53 -04:00
Harvey Harrison	c44d040e18	mac80211: wme.h remove unused QOS_CONTROL_LEN linux/ieee80211.h now has IEEE80211_QOS_CTL_LEN for this purpose. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:53 -04:00
Harvey Harrison	62bf1d762e	mac80211: explicitly check skb->len ieee80211_get_hdrlen_from_skb internally checks the skb is long enough to hold the full ieee80211_hdr, else it returns zero. Use ieee80211_hdrlen which always returns the hdrlen and check the remaining room in the skb explicitly when removing encryption headers or the qos control field. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:53 -04:00
Bruno Randolf	9deb1ae572	mac80211: radiotap: assume modulation from rates use the rates ERP flag to derive CCK or OFDM modulation for the radiotap header. (it might be more correct to get this information from the hardware itself, but it seems safe to assume this in most practical cases.) Signed-off-by: Bruno Randolf <br1@einfach.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:50 -04:00
Bruno Randolf	b4f28bbb9b	mac80211: add rx status flag for short preamble and use it for the radiotap header Signed-off-by: Bruno Randolf <br1@einfach.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:50 -04:00
Tomas Winkler	92ab853549	mac80211: add ieee80211_queue_stopped) This patch adds ieee80211_queue_stopped that let drivers to query queue status Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:50 -04:00
Robert P. J. Day	5442060c08	WIRELESS: Make wireless one-click selectable. Use "menuconfig" to make wireless support one-click selectable. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:50 -04:00
Julia Lawall	d92a8e81e0	net/ieee80211: adjust error handling Converts a test in error handling code to a sequence of labels. The semantic match that found the problem is: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ expression E,E1,E2; @@ E = alloc_etherdev(...) ... when != E = E1 if (...) { ... free_netdev(E); ... return ...; } ... when != E = E2 ( if (...) { ... when != free_netdev(E); return dev; } \| * if (...) { ... when != free_netdev(E); return ...; } \| register_netdev(E) ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-22 16:29:49 -04:00
Jarek Poplawski	f6e0b239a2	pkt_sched: Fix qdisc list locking Since some qdiscs call qdisc_tree_decrease_qlen() (so qdisc_lookup()) without rtnl_lock(), adding and deleting from a qdisc list needs additional locking. This patch adds global spinlock qdisc_list_lock and wrapper functions for modifying the list. It is considered as a temporary solution until hfsc_dequeue(), netem_dequeue() and tbf_dequeue() (or qdisc_tree_decrease_qlen()) are redone. With feedback from Herbert Xu and David S. Miller. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-22 03:31:39 -07:00
Jarek Poplawski	2540e0511e	pkt_sched: Fix qdisc_watchdog() vs. dev_deactivate() race dev_deactivate() can skip rescheduling of a qdisc by qdisc_watchdog() or other timer calling netif_schedule() after dev_queue_deactivate(). We prevent this checking aliveness before scheduling the timer. Since during deactivation the root qdisc is available only as qdisc_sleeping additional accessor qdisc_root_sleeping() is created. With feedback from Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-21 05:11:14 -07:00
Vlad Yasevich	5e739d1752	sctp: fix potential panics in the SCTP-AUTH API. All of the SCTP-AUTH socket options could cause a panic if the extension is disabled and the API is envoked. Additionally, there were some additional assumptions that certain pointers would always be valid which may not always be the case. This patch hardens the API and address all of the crash scenarios. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-21 03:34:25 -07:00
David S. Miller	195648bbc5	pkt_sched: Prevent livelock in TX queue running. If dev_deactivate() is trying to quiesce the queue, it is theoretically possible for another cpu to livelock trying to process that queue. This happens because dev_deactivate() grabs the queue spinlock as it checks the queue state, whereas net_tx_action() does a trylock and reschedules the qdisc if it hits the lock. This breaks the livelock by adding a check on __QDISC_STATE_DEACTIVATED to net_tx_action() when the trylock fails. Based upon feedback from Herbert Xu and Jarek Poplawski. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-19 04:00:36 -07:00
David S. Miller	d2805395aa	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6	2008-08-19 01:33:25 -07:00
Sven Wegener	f728bafb56	ipvs: Fix race conditions in lblcr scheduler We can't access the cache entry outside of our critical read-locked region, because someone may free that entry. Also getting an entry under read lock, then locking for write and trying to delete that entry looks fishy, but should be no problem here, because we're only comparing a pointer. Also there is no need for our own rwlock, there is already one in the service structure for use in the schedulers. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-08-19 17:37:08 +10:00
Sven Wegener	39ac50d0c7	ipvs: Fix race conditions in lblc scheduler We can't access the cache entry outside of our critical read-locked region, because someone may free that entry. And we also need to check in the critical region wether the destination is still available, i.e. it's not in the trash. If we drop our reference counter, the destination can be purged from the trash at any time. Our caller only guarantees that no destination is moved to the trash, while we are scheduling. Also there is no need for our own rwlock, there is already one in the service structure for use in the schedulers. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-08-19 17:37:04 +10:00
Simon Horman	3f087668c4	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	2008-08-19 17:36:22 +10:00
David S. Miller	f3b9605d74	Revert "pkt_sched: Add BH protection for qdisc_stab_lock." This reverts commit `1cfa26661a`. qdisc_destroy() runs fully under RTNL again and not from softint any longer, so this change is no longer needed. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 22:33:05 -07:00
David S. Miller	deb3abf15f	Revert "pkt_sched: Protect gen estimators under est_lock." This reverts commit `d4766692e7`. qdisc_destroy() now runs in RTNL fully again, so this change is no longer needed. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 22:32:10 -07:00
Ilpo Järvinen	e5befbd952	pkt_sched: remove bogus block (cleanup) ...Last block local var got just deleted. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 22:30:01 -07:00
Stephen Hemminger	9f59365374	nf_nat: use secure_ipv4_port_ephemeral() for NAT port randomization Use incoming network tuple as seed for NAT port randomization. This avoids concerns of leaking net_random() bits, and also gives better port distribution. Don't have NAT server, compile tested only. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> [ added missing EXPORT_SYMBOL_GPL ] Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 21:32:32 -07:00
Pablo Neira Ayuso	fab00c5d15	netfilter: ctnetlink: sleepable allocation with spin lock bh This patch removes a GFP_KERNEL allocation while holding a spin lock with bottom halves disabled in ctnetlink_change_helper(). This problem was introduced in 2.6.23 with the netfilter extension infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 21:31:46 -07:00
Pablo Neira Ayuso	cb1cb5c474	netfilter: ctnetlink: fix sleep in read-side lock section Fix allocation with GFP_KERNEL in ctnetlink_create_conntrack() under read-side lock sections. This problem was introduced in 2.6.25. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 21:31:24 -07:00
Pablo Neira Ayuso	1575e7ea01	netfilter: ctnetlink: fix double helper assignation for NAT'ed conntracks If we create a conntrack that has NAT handlings and a helper, the helper is assigned twice. This happens because nf_nat_setup_info() - via nf_conntrack_alter_reply() - sets the helper before ctnetlink, which indeed does not check if the conntrack already has a helper as it thinks that it is a brand new conntrack. The fix moves the helper assignation before the set of the status flags. This avoids a bogus assertion in __nf_ct_ext_add (if netfilter assertions are enabled) which checks that the conntrack must not be confirmed. This problem was introduced in 2.6.23 with the netfilter extension infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-08-18 21:30:55 -07:00
Anders Grafström	46faec9858	netfilter: ipt_addrtype: Fix matching of inverted destination address type This patch fixes matching of inverted destination address type. Signed-off-by: Anders Grafström <grfstrm@users.sourceforge.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 21:29:57 -07:00
David S. Miller	8e0f36ec37	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2008-08-18 21:15:44 -07:00
Gerrit Renker	d28934ad8a	dccp: Fix panic caused by too early termination of retransmission mechanism Thanks is due to Wei Yongjun for the detailed analysis and description of this bug at http://marc.info/?l=dccp&m=121739364909199&w=2 The problem is that invalid packets received by a client in state REQUEST cause the retransmission timer for the DCCP-Request to be reset. This includes freeing the Request-skb ( in dccp_rcv_request_sent_state_process() ). As a consequence, * the arrival of further packets cause a double-free, triggering a panic(), * the connection then may hang, since further retransmissions are blocked. This patch changes the order of statements so that the retransmission timer is reset, and the pending Request freed, only if a valid Response has arrived (or the number of sysctl-retries has been exhausted). Further changes: ---------------- To be on the safe side, replaced __kfree_skb with kfree_skb so that if due to unexpected circumstances the sk_send_head is NULL the WARN_ON is used instead. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 21:14:20 -07:00
David S. Miller	4d8863a29c	pkt_sched: Don't hold qdisc lock over qdisc_destroy(). Based upon reports by Denys Fedoryshchenko, and feedback and help from Jarek Poplawski and Herbert Xu. We always either: 1) Never made an external reference to this qdisc. or 2) Did a dev_deactivate() which purged all asynchronous references. So do not lock the qdisc when we call qdisc_destroy(), it's illegal anyways as when we drop the lock this is free'd memory. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 21:06:19 -07:00
Jarek Poplawski	25bfcd5a78	pkt_sched: Add lockdep annotation for qdisc locks Qdisc locks are initialized in the same function, qdisc_alloc(), so lockdep can't distinguish tx qdisc lock from rx and reports "possible recursive locking detected" when both these locks are taken eg. while using act_mirred with ifb. This looks like a false positive. Anyway, after this patch these locks will be reported more exactly. Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 21:06:09 -07:00
David S. Miller	8608db031b	pkt_sched: Never schedule non-root qdiscs. Based upon initial discovery and patch by Jarek Poplawski. The qdisc watchdogs can be attached to any qdisc, not just the root, so make sure we schedule the correct one. CBQ has a similar bug. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 21:05:56 -07:00
Ron Rindjunsky	a61dae1f78	mac80211: update new sta's rx timestamp This patch fixes needless probe request caused by zero value in sta->last_rx inside ieee80211_associated flow Signed-off-by: Ron Rindjunsky <ron.rindjunsky@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-18 11:05:13 -04:00
Henrique de Moraes Holschuh	e10e0dfe3b	rfkill: protect suspended rfkill controllers Guard rfkill controllers attached to a rfkill class against state changes after class suspend has been issued. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-18 11:05:12 -04:00
Marcel Holtmann	63fbd24e51	[Bluetooth] Consolidate maintainers information The Bluetooth entries for the MAINTAINERS file are a little bit too much. Consolidate them into two entries. One for Bluetooth drivers and another one for the Bluetooth subsystem. Also the MODULE_AUTHOR should indicate the current maintainer of the module and actually not the original author. Fix all Bluetooth modules to provide current maintainer information. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-08-18 13:23:53 +02:00
Marcel Holtmann	90855d7b72	[Bluetooth] Fix userspace breakage due missing class links The Bluetooth adapters and connections are best presented via a class in sysfs. The removal of the links inside the Bluetooth class broke assumptions by userspace programs on how to find attached adapters. This patch creates adapters and connections as part of the Bluetooth class, but it uses different device types to distinguish them. The userspace programs can now easily navigate in the sysfs device tree. The unused platform device and bus have been removed to keep the code simple and clean. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-08-18 13:23:53 +02:00
David S. Miller	69747650c8	pkt_sched: Fix return value corruption in HTB and TBF. Based upon a bug report by Josip Rodin. Packet schedulers should only return NET_XMIT_DROP iff the packet really was dropped. If the packet does reach the device after we return NET_XMIT_DROP then TCP can crash because it depends upon the enqueue path return values being accurate. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 00:39:41 -07:00
David S. Miller	96d203169d	pkt_sched: Fix missed RCU unlock in dev_queue_xmit() Noticed by Jarek Poplawski. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 23:37:16 -07:00
Yang Hongyang	13601cd8e4	ipv6: Fix the return interface index when get it while no message is received. When get receiving interface index while no message is received, the bounded device's index of the socket should be returned. RFC 3542: Issuing getsockopt() for the above options will return the sticky option value i.e., the value set with setsockopt(). If no sticky option value has been set getsockopt() will return the following values: - For the IPV6_PKTINFO option, it will return an in6_pktinfo structure with ipi6_addr being in6addr_any and ipi6_ifindex being zero. Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 23:21:52 -07:00
David S. Miller	4cf7cb280e	sch_prio: Use NET_XMIT_SUCCESS instead of "0" constant. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 22:45:17 -07:00
Jussi Kivilinna	0d40b6e564	sch_prio: Use return value from inner qdisc requeue Use return value from inner qdisc requeue when value returned isn't NET_XMIT_SUCCESS, instead of always returning NET_XMIT_DROP. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 22:43:56 -07:00
David S. Miller	1e0d5a5747	pkt_sched: No longer destroy qdiscs from RCU. We can now kill them synchronously with all of the previous dev_deactivate() cures. This makes netdev destruction and shutdown saner as the qdiscs hold references to the device. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 22:31:26 -07:00
Jarek Poplawski	3a76e3716b	pkt_sched: Grab correct lock in notify_and_destroy(). From: Jarek Poplawski <jarkao2@gmail.com> When we are destroying non-root qdiscs, we need to lock the root of the qdisc tree not the the qdisc itself. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 22:02:11 -07:00
David S. Miller	4335cd2da1	pkt_sched: Simplify dev_deactivate() polling loop. The condition under which the previous qdisc has no more references after we've attached &noop_qdisc is that both RUNNING and SCHED are both seen clear while holding the root lock. So just make specifically that check in the polling loop, instead of this overly complex "check without then check with lock held" sequence. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 21:58:07 -07:00
Jarek Poplawski	def82a1db1	net: Change handling of the __QDISC_STATE_SCHED flag in net_tx_action(). Change handling of the __QDISC_STATE_SCHED flag in net_tx_action() to enable proper control in dev_deactivate(). Now, if this flag is seen as unset under root_lock means a qdisc can't be netif_scheduled. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 21:54:43 -07:00
David S. Miller	a9312ae893	pkt_sched: Add 'deactivated' state. This new state lets dev_deactivate() mark a qdisc as having been deactivated. dev_queue_xmit() and ing_filter() check for this bit and do not try to process the qdisc if the bit is set. dev_deactivate() polls the qdisc after setting the bit, waiting for both __QDISC_STATE_RUNNING and __QDISC_STATE_SCHED to clear. This isn't perfect yet, but subsequent changesets will make it so. This part is just one piece of the puzzle. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 21:51:03 -07:00
Simon Horman	51df190139	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	2008-08-16 14:44:17 +10:00
Rusty Russell	db543c1f97	net: skb_copy_datagram_from_iovec() There's an skb_copy_datagram_iovec() to copy out of a paged skb, but nothing the other way around (because we don't do that). We want to allocate big skbs in tun.c, so let's add the function. It's a carbon copy of skb_copy_datagram_iovec() with enough changes to be annoying. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-15 19:52:30 -07:00
Herbert Xu	6f85a124d8	net: Preserve netfilter attributes in skb_gso_segment using __copy_skb_header skb_gso_segment didn't preserve some attributes in the original skb such as the netfilter fields. This was harmless until they were used which is the case for packets going through lo. This patch makes it call __copy_skb_header which also picks up some other missing attributes. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-15 19:51:36 -07:00
Stephen Hemminger	e4119a4318	bridge: show offload settings Add more ethtool generic operations to dump the bridge offload settings. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-15 19:51:07 -07:00
Herbert Xu	c6153b5b77	ipv4: Disable route secret interval on zero interval Let me first state that disabling the route cache hash rebuild should not be done without extensive analysis on the risk profile and careful deliberation. However, there are times when this can be done safely or for testing. For example, when you have mechanisms for ensuring that offending parties do not exist in your network. This patch lets the user disable the rebuild if the interval is set to zero. This also incidentally fixes a divide-by-zero error with name-spaces. In addition, this patch makes the effect of an interval change immediate rather than it taking effect at the next rebuild as is currently the case. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-15 13:44:31 -07:00
Jarek Poplawski	323c048836	pkt_sched: Fix unlocking in tc_ctl_tfilter() Fix a bug with spin_lock_bh() inserted instead of spin_unlock_bh() by some recent patch. Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-14 17:01:10 -07:00
Simon Horman	4a031b0e6a	ipvs: rename __ip_vs_wlc_schedule in lblc and lblcr schedulers For the sake of clarity, rename __ip_vs_wlc_schedule() in lblc.c to __ip_vs_lblc_schedule() and the version in lblcr.c to __ip_vs_lblc_schedule(). I guess the original name stuck from a copy and paste. Cc: Sven Wegener <sven.wegener@stealer.net> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-08-15 09:26:15 +10:00
Sven Wegener	a919cf4b6b	ipvs: Create init functions for estimator code Commit `8ab19ea36c` ("ipvs: Fix possible deadlock in estimator code") fixed a deadlock condition, but that condition can only happen during unload of IPVS, because during normal operation there is at least our global stats structure in the estimator list. The mod_timer() and del_timer_sync() calls are actually initialization and cleanup code in disguise. Let's make it explicit and move them to their own init and cleanup function. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-08-15 09:26:15 +10:00
Sven Wegener	82dfb6f322	ipvs: Only call init_service, update_service and done_service for schedulers if defined There are schedulers that only schedule based on data available in the service or destination structures and they don't need any persistent storage or initialization routine. These schedulers currently provide dummy functions for the init_service, update_service and/or done_service functions. For the init_service and done_service cases we already have code that only calls these functions, if the scheduler provides them. Do the same for the update_service case and remove the dummy functions from all schedulers. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-08-15 09:26:14 +10:00
Julius Volz	9a812198ae	IPVS: Add genetlink interface implementation Add the implementation of the new Generic Netlink interface to IPVS and keep the old set/getsockopt interface for userspace backwards compatibility. Signed-off-by: Julius Volz <juliusv@google.com> Acked-by: Sven Wegener <sven.wegener@stealer.net> Signed-off-by: Simon Horman <horms@verge.net.au>	2008-08-15 09:26:14 +10:00
Brian Haley	191cd58250	netns: Add network namespace argument to rt6_fill_node() and ipv6_dev_get_saddr() ipv6_dev_get_saddr() blindly de-references dst_dev to get the network namespace, but some callers might pass NULL. Change callers to pass a namespace pointer instead. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-14 15:33:21 -07:00
Daniel Lezcano	877acedc0d	netns: Fix crash by making igmp per namespace This patch makes the multicast socket to be per namespace. When a network namespace is created, other than the init_net and a multicast packet is received, the kernel goes to a hang or a kernel panic. How to reproduce ? * create a child network namespace * create a pair virtual device veth * ip link add type veth * move one side to the pair network device to the child namespace * ip link set netns <childpid> dev veth1 * ping -I veth0 224.0.0.1 The bug appears because the function ip_mc_init_dev does not initialize the different multicast fields as it exits because it is not the init_net. BUG: soft lockup - CPU#0 stuck for 61s! [avahi-daemon:2695] Modules linked in: irq event stamp: 50350 hardirqs last enabled at (50349): [<c03ee949>] _spin_unlock_irqrestore+0x34/0x39 hardirqs last disabled at (50350): [<c03ec639>] schedule+0x9f/0x5ff softirqs last enabled at (45712): [<c0374d4b>] ip_setsockopt+0x8e7/0x909 softirqs last disabled at (45710): [<c03ee682>] _spin_lock_bh+0x8/0x27 Pid: 2695, comm: avahi-daemon Not tainted (2.6.27-rc2-00029-g0872073 #3) EIP: 0060:[<c03ee47c>] EFLAGS: 00000297 CPU: 0 EIP is at __read_lock_failed+0x8/0x10 EAX: c4f38810 EBX: c4f38810 ECX: 00000000 EDX: c04cc22e ESI: fb0000e0 EDI: 00000011 EBP: 0f02000a ESP: c4e3faa0 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 8005003b CR2: 44618a40 CR3: 04e37000 CR4: 000006d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c02311f8>] ? _raw_read_lock+0x23/0x25 [<c0390666>] ? ip_check_mc+0x1c/0x83 [<c036d478>] ? ip_route_input+0x229/0xe92 [<c022e2e4>] ? trace_hardirqs_on_thunk+0xc/0x10 [<c0104c9c>] ? do_IRQ+0x69/0x7d [<c0102e64>] ? restore_nocheck_notrace+0x0/0xe [<c036fdba>] ? ip_rcv+0x227/0x505 [<c0358764>] ? netif_receive_skb+0xfe/0x2b3 [<c03588d2>] ? netif_receive_skb+0x26c/0x2b3 [<c035af31>] ? process_backlog+0x73/0xbd [<c035a8cd>] ? net_rx_action+0xc1/0x1ae [<c01218a8>] ? __do_softirq+0x7b/0xef [<c0121953>] ? do_softirq+0x37/0x4d [<c035b50d>] ? dev_queue_xmit+0x3d4/0x40b [<c0122037>] ? local_bh_enable+0x96/0xab [<c035b50d>] ? dev_queue_xmit+0x3d4/0x40b [<c012181e>] ? _local_bh_enable+0x79/0x88 [<c035fcb8>] ? neigh_resolve_output+0x20f/0x239 [<c0373118>] ? ip_finish_output+0x1df/0x209 [<c0373364>] ? ip_dev_loopback_xmit+0x62/0x66 [<c0371db5>] ? ip_local_out+0x15/0x17 [<c0372013>] ? ip_push_pending_frames+0x25c/0x2bb [<c03891b8>] ? udp_push_pending_frames+0x2bb/0x30e [<c038a189>] ? udp_sendmsg+0x413/0x51d [<c038a1a9>] ? udp_sendmsg+0x433/0x51d [<c038f927>] ? inet_sendmsg+0x35/0x3f [<c034f092>] ? sock_sendmsg+0xb8/0xd1 [<c012d554>] ? autoremove_wake_function+0x0/0x2b [<c022e6de>] ? copy_from_user+0x32/0x5e [<c022e6de>] ? copy_from_user+0x32/0x5e [<c034f238>] ? sys_sendmsg+0x18d/0x1f0 [<c0175e90>] ? pipe_write+0x3cb/0x3d7 [<c0170347>] ? do_sync_write+0xbe/0x105 [<c012d554>] ? autoremove_wake_function+0x0/0x2b [<c03503b2>] ? sys_socketcall+0x176/0x1b0 [<c01085ea>] ? syscall_trace_enter+0x6c/0x7b [<c0102e1a>] ? syscall_call+0x7/0xb Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 16:15:57 -07:00
Jarek Poplawski	d4766692e7	pkt_sched: Protect gen estimators under est_lock. gen_kill_estimator() required rtnl_lock() protection, but since it is moved to an RCU callback __qdisc_destroy() let's use est_lock instead. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 15:20:24 -07:00
David S. Miller	b9a3b1102b	pkt_sched: Fix queue quiescence testing in dev_deactivate(). Based upon discussions with Jarek P. and Herbert Xu. First, we're testing the wrong qdisc. We just reset the device queue qdiscs to &noop_qdisc and checking it's state is completely pointless here. We want to wait until the previous qdisc that was sitting at the ->qdisc pointer is not busy any more. And that would be ->qdisc_sleeping. Because of how we propagate the samples qdisc pointer down into qdisc_run and friends via per-cpu ->output_queue and netif_schedule, we have to wait also for the __QDISC_STATE_SCHED bit to clear as well. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 15:18:38 -07:00
Jarek Poplawski	26b284de54	pkt_sched: Fix oops in htb_delete. Recent changes introduced a bug in htb_delete(): cl->parent->children counter update misses checking cl->parent for NULL, which is used for root classes, so deleting them causes an oops. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 15:16:43 -07:00
Andrew Gallatin	64c00d81b5	pktgen: prevent pktgen from using bad tx queue With the new multi-queue transmit code, it is possible to accidentally make pktgen pick a non-existing tx queue simply by using a stale script to drive pktgen. Access to this non-existing tx queue will then trigger a bad memory access and kill the machine. For example, setting "queue_map_max 2" will cause my machine to die when accessing a garbage spinlock in the non-existing tx queue: BUG: spinlock bad magic on CPU#0, kpktgend_0/564 lock: ffff88001ddf6718, .magic: ffffffff, .owner: /-1, .owner_cpu: 0 Pid: 564, comm: kpktgend_0 Not tainted 2.6.27-rc3 #35 Call Trace: [<ffffffff803a1228>] spin_bug+0xa4/0xac [<ffffffff803a1253>] _raw_spin_lock+0x23/0x123 [<ffffffff8055b06f>] _spin_lock_bh+0x17/0x1b [<ffffffff804cb57d>] pktgen_thread_worker+0xa97/0x1002 [<ffffffff8022874d>] ? finish_task_switch+0x38/0x97 [<ffffffff80242077>] ? autoremove_wake_function+0x0/0x36 [<ffffffff80242077>] ? autoremove_wake_function+0x0/0x36 [<ffffffff804caae6>] ? pktgen_thread_worker+0x0/0x1002 [<ffffffff80241a40>] kthread+0x44/0x6d [<ffffffff8020c399>] child_rip+0xa/0x11 [<ffffffff802419fc>] ? kthread+0x0/0x6d [<ffffffff8020c38f>] ? child_rip+0x0/0x11 The attached patch adds some sanity checking to prevent these sorts of configuration errors. Signed-off-by: Andrew Gallatin <gallatin@myri.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 15:16:00 -07:00
Tom Tucker	24b8b44780	svcrdma: Fix race between svc_rdma_recvfrom thread and the dto_tasklet RDMA_READ completions are kept on a separate queue from the general I/O request queue. Since a separate lock is used to protect the RDMA_READ completion queue, a race exists between the dto_tasklet and the svc_rdma_recvfrom thread where the dto_tasklet sets the XPT_DATA bit and adds I/O to the read-completion queue. Concurrently, the recvfrom thread checks the generic queue, finds it empty and resets the XPT_DATA bit. A subsequent svc_xprt_enqueue will fail to enqueue the transport for I/O and cause the transport to "stall". The fix is to protect both lists with the same lock and set the XPT_DATA bit with this lock held. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2008-08-13 16:57:31 -04:00
Arnaldo Carvalho de Melo	3e8a0a559c	dccp: change L/R must have at least one byte in the dccpsf_val field Thanks to Eugene Teo for reporting this problem. Signed-off-by: Eugene Teo <eugenete@kernel.sg> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 13:48:39 -07:00
Jean-Christophe DUBOIS	c1e24df27f	xfrm: remove unnecessary variable in xfrm_output_resume() 2nd try Small fix removing an unnecessary intermediate variable. Signed-off-by: Jean-Christophe DUBOIS <jcd@tribudubois.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 13:35:37 -07:00
Jamal Hadi Salim	36723873b6	net-sched: fix Action flushing return code Flushing must consistently return ENOMEM on failure of any allocation Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 02:41:45 -07:00
Jamal Hadi Salim	f97017cdef	net-sched: Fix actions flushing Flushing of actions has been broken since we changed the semantics of netlink parsed tb[X] to mean X is an attribute type. This makes the flushing work. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 02:41:22 -07:00
Julien Brunel	34093d055e	net/rxrpc: Use an IS_ERR test rather than a NULL test In case of error, the function rxrpc_get_transport returns an ERR pointer, but never returns a NULL pointer. So after a call to this function, a NULL test should be replaced by an IS_ERR test. A simplified version of the semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @correct_null_test@ expression x,E; statement S1, S2; @@ x = rxrpc_get_transport(...) <... when != x = E if ( ( - x@p2 != NULL + ! IS_ERR ( x ) \| - x@p2 == NULL + IS_ERR( x ) ) ) S1 else S2 ...> ? x = E; // </smpl> Signed-off-by: Julien Brunel <brunel@diku.dk> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 02:40:48 -07:00
Jamal Hadi Salim	317900cb01	wext: Send name on events In the minimal the wireless extensions oughta send at least the name in addition to the ifindex. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 02:39:56 -07:00
Andrew Morton	6ced0b3f1e	net/tipc/subscr.c: don't use ___constant_swab32 It's an internal implementation detail which we _should_ be free to change. So we did, and it promptly broke. The compiler shold be able to work out when to use the __constant version anyway. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 02:32:06 -07:00
Brian Haley	5e0115e500	ipv6: Fix OOPS, ip -f inet6 route get fec0::1, linux-2.6.26, ip6_route_output, rt6_fill_node+0x175 Alexey Dobriyan wrote: > On Thu, Aug 07, 2008 at 07:00:56PM +0200, John Gumb wrote: >> Scenario: no ipv6 default route set. > >> # ip -f inet6 route get fec0::1 >> >> BUG: unable to handle kernel NULL pointer dereference at 00000000 >> IP: [<c0369b85>] rt6_fill_node+0x175/0x3b0 >> EIP is at rt6_fill_node+0x175/0x3b0 > > 0xffffffff80424dd3 is in rt6_fill_node (net/ipv6/route.c:2191). > 2186 } else > 2187 #endif > 2188 NLA_PUT_U32(skb, RTA_IIF, iif); > 2189 } else if (dst) { > 2190 struct in6_addr saddr_buf; > 2191 ====> if (ipv6_dev_get_saddr(ip6_dst_idev(&rt->u.dst)->dev, > ^^^^^^^^^^^^^^^^^^^^^^^^ > NULL > > 2192 dst, 0, &saddr_buf) == 0) > 2193 NLA_PUT(skb, RTA_PREFSRC, 16, &saddr_buf); > 2194 } The commit that changed this can't be reverted easily, but the patch below works for me. Fix NULL de-reference in rt6_fill_node() when there's no IPv6 input device present in the dst entry. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 01:58:57 -07:00
Jarek Poplawski	1cfa26661a	pkt_sched: Add BH protection for qdisc_stab_lock. Since qdisc_stab_lock is used in qdisc_put_stab(), which is called in BH context from __qdisc_destroy() RCU callback, softirq safe locking is needed. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-11 18:11:06 -07:00
David S. Miller	0a37c10ed4	Merge branch 'stealer/ipvs/for-davem' of git://git.stealer.net/linux-2.6	2008-08-11 18:04:35 -07:00
Simon Horman	e93615d086	ipvs: Explictly clear ip_vs_stats members In order to align the coding styles of ip_vs_zero_stats() and its child-function ip_vs_zero_estimator(), clear ip_vs_stats members explicitlty rather than doing a limited memset(). This was chosen over modifying ip_vs_zero_estimator() to use memset() as it is more robust against changes in members in the relevant structures. memset() would be prefered if all members of the structure were to be cleared. Cc: Sven Wegener <sven.wegener@stealer.net> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Sven Wegener <sven.wegener@stealer.net>	2008-08-11 14:00:55 +02:00
Sven Wegener	519e49e888	ipvs: No need to zero out ip_vs_stats during initialization It's a global variable and automatically initialized to zero. And now we can also initialize the lock at compile time. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Simon Horman <horms@verge.net.au>	2008-08-11 14:00:46 +02:00
Sven Wegener	3a14a313f9	ipvs: Embed estimator object into stats object There's no reason for dynamically allocating an estimator object for every stats object. Directly embed an estimator object into every stats object and switch to using the kernel-provided list implementation. This makes the code much simpler and faster, as we do not need to traverse the list of all estimators to find the one belonging to a stats object. There's no need to use an rwlock, as we only have one reader. Also reorder the members of the estimator structure slightly to avoid padding overhead. This can't be done with the stats object as the members are currently copied to our user space object via memcpy() and changing it would break ABI. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Simon Horman <horms@verge.net.au>	2008-08-11 14:00:43 +02:00
Sven Wegener	5587da55fb	ipvs: Mark net_vs_ctl_path const Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Simon Horman <horms@verge.net.au>	2008-08-11 11:46:27 +02:00
Sven Wegener	048cf48b89	ipvs: Annotate init functions with __init Being able to discard these functions saves a couple of bytes at runtime. The cleanup functions can't be annotated with __exit as they are also called from init functions. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Simon Horman <horms@verge.net.au>	2008-08-11 11:46:18 +02:00
Sven Wegener	d149ccc9cf	ipvs: Initialize schedulers' struct list_head at compile time No need to do it at runtime and this saves a couple of bytes in the text section. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Simon Horman <horms@verge.net.au>	2008-08-11 11:46:06 +02:00
Sven Wegener	66a0be4720	ipvs: Use list_empty() instead of open-coding the same functionality Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Simon Horman <horms@verge.net.au>	2008-08-11 11:45:57 +02:00
Sven Wegener	8ab19ea36c	ipvs: Fix possible deadlock in estimator code There is a slight chance for a deadlock in the estimator code. We can't call del_timer_sync() while holding our lock, as the timer might be active and spinning for the lock on another cpu. Work around this issue by using try_to_del_timer_sync() and releasing the lock. We could actually delete the timer outside of our lock, as the add and kill functions are only every called from userspace via [gs]etsockopt() and are serialized by a mutex, but better make this explicit. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Cc: stable <stable@kernel.org> Acked-by: Simon Horman <horms@verge.net.au>	2008-08-11 11:45:40 +02:00
Sven Wegener	bc0fde2fad	ipvs: Fix possible deadlock in sync code Commit `998e7a7680` ("ipvs: Use kthread_run() instead of doing a double-fork via kernel_thread()") introduced a possible deadlock in the sync code. We need to use the _bh versions for the lock, as the lock is also accessed from a bottom half. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Simon Horman <horms@verge.net.au>	2008-08-11 11:44:38 +02:00
Herbert Xu	d97106ea52	udp: Drop socket lock for encapsulated packets The socket lock is there to protect the normal UDP receive path. Encapsulation UDP sockets don't need that protection. In fact the locking is deadly for them as they may contain another UDP packet within, possibly with the same addresses. Also the nested bit was copied from TCP. TCP needs it because of accept(2) spawning sockets. This simply doesn't apply to UDP so I've removed it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-09 00:35:05 -07:00
David S. Miller	8123b421e8	pkt_sched: Fix ingress deletion and filter attachment. Based upon bug reports by Stephen Hemminger. We still had some cases using ->qdisc instead of ->qdisc_sleeping. Also, qdisc_lookup() should return ingress qdiscs. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-08 23:23:39 -07:00
Jamal Hadi Salim	76aab2c1ea	pkt_sched: Fix actions referencing When an action is added several times with the same exact index it gets deleted on every even-numbered attempt. This fixes that issue. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-07 20:37:22 -07:00
David S. Miller	c1c6c11a68	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6	2008-08-07 20:28:46 -07:00
Adam Langley	2aaab9a0cc	tcp: (whitespace only) fix confusing indentation The indentation in part of tcp_minisocks makes it look like one of the if statements is much more important than it actually is. Signed-off-by: Adam Langley <agl@imperialviolet.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-07 20:27:45 -07:00
David S. Miller	827ebd6410	pkt_sched: Fix qdisc config when link is down. Bug reported by Stephen Hemminger. We need to fetch the root from ->qdisc_sleeping not ->qdisc. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-07 20:26:40 -07:00
Marcel Holtmann	28111eb2f5	[Bluetooth] Add parameters to control BNEP header compression The Bluetooth qualification for PAN demands testing with BNEP header compression disabled. This is actually pretty stupid and the Linux implementation outsmarts the test system since it compresses whenever possible. So to pass qualification two need parameters have been added to control the compression of source and destination headers. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-08-07 22:26:54 +02:00
Luis Carlos Cobo	8dbc1722a7	mac80211: keep mesh ifaces in allmulti mode Currently a mesh node will not forward a multicast frame if it is not subscribed to the specific multicast address. This patch addresses the issue and fixes mesh multicast forwarding. Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-07 09:49:04 -04:00
Luis Carlos Cobo	e32f85f7b9	mac80211: fix use of skb->cb for mesh forwarding Now we deal with mesh forwarding before the 802.11->802.3 conversion, thus eliminating a few unnecessary steps. The next hop lookup is called from ieee80211_master_start_xmit() instead of subif_start_xmit(). Until the next hop is found, RA in the frame will be all zeroes for frames originating from the device. For forwarded frames, RA will contain the TA of the received frame, which will be necessary to send a path error if a next hop is not found. Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-07 09:49:04 -04:00
Robert Olsson	e6fce5b916	pktgen: multiqueue etc. Sofar far pktgen have had a restriction to only use one device per kernel thread. With the new multiqueue architecture this is no longer adequate. The patch below is an effort to remove this by in pktgen configuration adding a tag to the device name a la eth0@0 etc. The tag is used for usual device config just as before. Also a new flag is introduced to mirror queue_map with sending threads smp_processor_id() QUEUE_MAP_CPU. An example: We use 4 CPU's to send to one 10g interface (eth0) and we use the new tagging to send a mix of packet sizes, 64, 576 and 1500 bytes. Also we use TX queues according to smp_processor_id() PGDEV=/proc/net/pktgen/kpktgend_0 pgset "add_device eth0@0" PGDEV=/proc/net/pktgen/kpktgend_1 pgset "add_device eth0@1" PGDEV=/proc/net/pktgen/kpktgend_2 pgset "add_device eth0@2" PGDEV=/proc/net/pktgen/kpktgend_3 pgset "add_device eth0@3" .... PGDEV=/proc/net/pktgen/eth0@0 pgset "pkt_size 64" pgset "flag QUEUE_MAP_CPU" PGDEV=/proc/net/pktgen/eth0@1 pgset "pkt_size 572" pgset "flag QUEUE_MAP_CPU" PGDEV=/proc/net/pktgen/eth0@2 pgset "pkt_size 1496" PGDEV=/proc/net/pktgen/eth0@3 pgset "pkt_size 1496" pgset "flag QUEUE_MAP_CPU" Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-07 02:23:01 -07:00
David S. Miller	32bb93b02d	Merge branch 'upstream-davem' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6	2008-08-07 02:10:27 -07:00
Jeff Garzik	3859069bc3	Merge branch 'for-jeff' of git://git.kernel.org/pub/scm/linux/kernel/git/chris/linux-2.6 into tmp	2008-08-07 04:05:46 -04:00
Joe Eykholt	f982307f22	net/core: Allow receive on active slaves. If a packet_type specifies an active slave to bonding and not just any interface, allow it to receive frames that came in on that interface. Signed-off-by: Joe Eykholt <jre@nuovasystems.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-08-07 04:00:01 -04:00
Joe Eykholt	0d7a368123	net/core: Allow certain receives on inactive slave. Allow a packet_type that specifies the exact device to receive even on an inactive bonding slave devices. This is important for some L2 protocols such as LLDP and FCoE. This can eventually be used for the bonding special cases as well. Signed-off-by: Joe Eykholt <jre@nuovasystems.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-08-07 03:59:59 -04:00
Joe Eykholt	cc9bd5cebc	net/core: Uninline skb_bond(). Otherwise subsequent changes need multiple return values. Signed-off-by: Joe Eykholt <jre@nuovasystems.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-08-07 03:59:58 -04:00
Gui Jianfeng	6edafaaf6f	tcp: Fix kernel panic when calling tcp_v(4/6)_md5_do_lookup If the following packet flow happen, kernel will panic. MathineA MathineB SYN ----------------------> SYN+ACK <---------------------- ACK(bad seq) ----------------------> When a bad seq ACK is received, tcp_v4_md5_do_lookup(skb->sk, ip_hdr(skb)->daddr)) is finally called by tcp_v4_reqsk_send_ack(), but the first parameter(skb->sk) is NULL at that moment, so kernel panic happens. This patch fixes this bug. OOPS output is as following: [ 302.812793] IP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42 [ 302.817075] Oops: 0000 [#1] SMP [ 302.819815] Modules linked in: ipv6 loop dm_multipath rtc_cmos rtc_core rtc_lib pcspkr pcnet32 mii i2c_piix4 parport_pc i2c_core parport ac button ata_piix libata dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan] [ 302.849946] [ 302.851198] Pid: 0, comm: swapper Not tainted (2.6.27-rc1-guijf #5) [ 302.855184] EIP: 0060:[<c05cfaa6>] EFLAGS: 00010296 CPU: 0 [ 302.858296] EIP is at tcp_v4_md5_do_lookup+0x12/0x42 [ 302.861027] EAX: 0000001e EBX: 00000000 ECX: 00000046 EDX: 00000046 [ 302.864867] ESI: ceb69e00 EDI: 1467a8c0 EBP: cf75f180 ESP: c0792e54 [ 302.868333] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 302.871287] Process swapper (pid: 0, ti=c0792000 task=c0712340 task.ti=c0746000) [ 302.875592] Stack: c06f413a 00000000 cf75f180 ceb69e00 00000000 c05d0d86 000016d0 ceac5400 [ 302.883275] c05d28f8 000016d0 ceb69e00 ceb69e20 681bf6e3 00001000 00000000 0a67a8c0 [ 302.890971] ceac5400 c04250a3 c06f413a c0792eb0 c0792edc cf59a620 cf59a620 cf59a634 [ 302.900140] Call Trace: [ 302.902392] [<c05d0d86>] tcp_v4_reqsk_send_ack+0x17/0x35 [ 302.907060] [<c05d28f8>] tcp_check_req+0x156/0x372 [ 302.910082] [<c04250a3>] printk+0x14/0x18 [ 302.912868] [<c05d0aa1>] tcp_v4_do_rcv+0x1d3/0x2bf [ 302.917423] [<c05d26be>] tcp_v4_rcv+0x563/0x5b9 [ 302.920453] [<c05bb20f>] ip_local_deliver_finish+0xe8/0x183 [ 302.923865] [<c05bb10a>] ip_rcv_finish+0x286/0x2a3 [ 302.928569] [<c059e438>] dev_alloc_skb+0x11/0x25 [ 302.931563] [<c05a211f>] netif_receive_skb+0x2d6/0x33a [ 302.934914] [<d0917941>] pcnet32_poll+0x333/0x680 [pcnet32] [ 302.938735] [<c05a3b48>] net_rx_action+0x5c/0xfe [ 302.941792] [<c042856b>] __do_softirq+0x5d/0xc1 [ 302.944788] [<c042850e>] __do_softirq+0x0/0xc1 [ 302.948999] [<c040564b>] do_softirq+0x55/0x88 [ 302.951870] [<c04501b1>] handle_fasteoi_irq+0x0/0xa4 [ 302.954986] [<c04284da>] irq_exit+0x35/0x69 [ 302.959081] [<c0405717>] do_IRQ+0x99/0xae [ 302.961896] [<c040422b>] common_interrupt+0x23/0x28 [ 302.966279] [<c040819d>] default_idle+0x2a/0x3d [ 302.969212] [<c0402552>] cpu_idle+0xb2/0xd2 [ 302.972169] ======================= [ 302.974274] Code: fc ff 84 d2 0f 84 df fd ff ff e9 34 fe ff ff 83 c4 0c 5b 5e 5f 5d c3 90 90 57 89 d7 56 53 89 c3 50 68 3a 41 6f c0 e8 e9 55 e5 ff <8b> 93 9c 04 00 00 58 85 d2 59 74 1e 8b 72 10 31 db 31 c9 85 f6 [ 303.011610] EIP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42 SS:ESP 0068:c0792e54 [ 303.018360] Kernel panic - not syncing: Fatal exception in interrupt Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-06 23:50:04 -07:00
David S. Miller	ee7af8264d	pkt_sched: Fix "parent is root" test in qdisc_create(). As noticed by Stephen Hemminger, the root qdisc is denoted by TC_H_ROOT, not zero. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-06 23:35:59 -07:00
David S. Miller	11d46123bf	ipv4: Fix over-ifdeffing of ip_static_sysctl_init. Noticed by Paulius Zaleckas. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-06 18:30:43 -07:00
Joakim Koskela	abf5cdb89d	ipsec: Interfamily IPSec BEET, ipv4-inner ipv6-outer Here's a revised version, based on Herbert's comments, of a fix for the ipv4-inner, ipv6-outer interfamily ipsec beet mode. It fixes the network header adjustment during interfamily, as well as makes sure that we reserve enough room for the new ipv6 header if we might have something else as the inner family. Also, the ipv4 pseudo header construction was added. Signed-off-by: Joakim Koskela <jookos@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-06 02:40:25 -07:00
Joakim Koskela	eb49e63093	ipsec: Interfamily IPSec BEET Here's a revised version, based on Herbert's comments, of a fix for the ipv6-inner, ipv4-outer interfamily ipsec beet mode. It fixes the network header adjustment in interfamily, and doesn't reserve space for the pseudo header anymore when we have ipv6 as the inner family. Signed-off-by: Joakim Koskela <jookos@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-06 02:39:30 -07:00
Krzysztof Piotr Oledzki	9714be7da8	netfilter: fix two recent sysctl problems Starting with `9043476f72` ("[PATCH] sanitize proc_sysctl") we have two netfilter releated problems: - WARNING: at kernel/sysctl.c:1966 unregister_sysctl_table+0xcc/0x103(), caused by wrong order of ini/fini calls - net.netfilter is duplicated and has truncated set of records Thanks to very useful guidelines from Al Viro, this patch fixes both of them. Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-06 02:35:44 -07:00
Rami Rosen	1ca615fb81	ipv6: replace dst_metric() with dst_mtu() in net/ipv6/route.c. This patch replaces dst_metric() with dst_mtu() in net/ipv6/route.c. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-06 02:34:21 -07:00
Rami Rosen	6d273f8d01	ipv4: replace dst_metric() with dst_mtu() in net/ipv4/route.c. This patch replaces dst_metric() with dst_mtu() in net/ipv4/route.c. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-06 02:33:49 -07:00
Ralf Baechle	ffb208479b	AX.25: Fix sysctl registration if !CONFIG_AX25_DAMA_SLAVE Since `49ffcf8f99` ("sysctl: update sysctl_check_table") setting struct ctl_table.procname = NULL does no longer work as it used to the way the AX.25 code is expecting it to resulting in the AX.25 sysctl registration code to break if CONFIG_AX25_DAMA_SLAVE was not set as in some distribution kernels. Kernel releases from 2.6.24 are affected. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-05 18:46:57 -07:00
Robert Olsson	ff2a79a5a9	pktgen: mac count dst_mac_count and src_mac_count patch from Eneas Hunguana We have sent one mac address to much. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-05 18:45:05 -07:00
Robert Olsson	1211a64554	pktgen: random flow Random flow generation has not worked. This fixes it. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-05 18:44:26 -07:00
Stephen Hemminger	ef647f1300	bridge: Eliminate unnecessary forward delay From: Stephen Hemminger <shemminger@vyatta.com> Based upon original patch by Herbert Xu, which contained the following problem description: -------------------- When the forward delay is set to zero, we still delay the setting of the forwarding state by one or possibly two timers depending on whether STP is enabled. This could either turn out to be instantaneous, or horribly slow depending on the load of the machine. As there is nothing preventing us from enabling forwarding straight away, this patch eliminates this potential delay by executing the code directly if the forward delay is zero. The effect of this problem is that immediately after the carrier comes on a port, the bridge will drop all packets received from that port until it enters forwarding mode, thus causing unnecessary packet loss. Note that this patch doesn't fully remove the delay due to the link watcher. We should also check the carrier state when we are about to drop an incoming packet because the port is disabled. But that's for another patch. -------------------- This version of the fix takes a different approach, in that it just does the state change directly. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-05 18:42:51 -07:00
David S. Miller	33e334950a	Merge branch 'no-ath9k' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2008-08-05 01:28:35 -07:00
Rami Rosen	ad619800e4	bridge: fix compile warning in net/bridge/br_netfilter.c This patch fixes the following warning due to incompatible pointer assignment: net/bridge/br_netfilter.c: In function 'br_netfilter_rtable_init': net/bridge/br_netfilter.c:116: warning: assignment from incompatible pointer type This warning is due to commit `4adf0af681` from July 30 (send correct MTU value in PMTU (revised)). Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-05 01:21:22 -07:00
Jarek Poplawski	c27f339af9	net_sched: Add qdisc __NET_XMIT_BYPASS flag Patrick McHardy <kaber@trash.net> noticed that it would be nice to handle NET_XMIT_BYPASS by NET_XMIT_SUCCESS with an internal qdisc flag __NET_XMIT_BYPASS and to remove the mapping from dev_queue_xmit(). David Miller <davem@davemloft.net> spotted a serious bug in the first version of this patch. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-04 22:39:11 -07:00
Jarek Poplawski	378a2f090f	net_sched: Add qdisc __NET_XMIT_STOLEN flag Patrick McHardy <kaber@trash.net> noticed: "The other problem that affects all qdiscs supporting actions is TC_ACT_QUEUED/TC_ACT_STOLEN getting mapped to NET_XMIT_SUCCESS even though the packet is not queued, corrupting upper qdiscs' qlen counters." and later explained: "The reason why it translates it at all seems to be to not increase the drops counter. Within a single qdisc this could be avoided by other means easily, upper qdiscs would still increase the counter when we return anything besides NET_XMIT_SUCCESS though. This means we need a new NET_XMIT return value to indicate this to the upper qdiscs. So I'd suggest to introduce NET_XMIT_STOLEN, return that to upper qdiscs and translate it to NET_XMIT_SUCCESS in dev_queue_xmit, similar to NET_XMIT_BYPASS." David Miller <davem@davemloft.net> noticed: "Maybe these NET_XMIT_* values being passed around should be a set of bits. They could be composed of base meanings, combined with specific attributes. So you could say "NET_XMIT_DROP \| __NET_XMIT_NO_DROP_COUNT" The attributes get masked out by the top-level ->enqueue() caller, such that the base meanings are the only thing that make their way up into the stack. If it's only about communication within the qdisc tree, let's simply code it that way." This patch is trying to realize these ideas. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-04 22:31:03 -07:00
Tomas Winkler	4c43e0d0ec	iwlwifi: HW bug fixes This patch adds few HW bug fixes. Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-04 15:09:12 -04:00
Daniel Drake	80693ceb78	mac80211: automatic IBSS channel selection When joining an ad-hoc network, the user is currently required to specify the channel. The network will not be joined otherwise, unless it happens to be sitting on the currently active channel. This patch implements automatic channel selection when the user has not locked the interface onto a specific channel. Signed-off-by: Daniel Drake <dsd@gentoo.org> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-04 15:09:10 -04:00
Tomas Winkler	ea95bba41e	mac80211: make listen_interval be limited by low level driver This patch makes possible for a driver to specify maximal listen interval The possibility for user to configure listen interval is not implemented yet, currently the maximum provided by the driver or 1 is used. Mac80211 uses config handler to set listen interval for to the driver. Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-04 15:09:07 -04:00
Emmanuel Grumbach	98f7dfd86c	mac80211: pass dtim_period to low level driver This patch adds the dtim_period in ieee80211_bss_conf, this allows the low level driver to know the dtim_period, and to plan power save accordingly. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-04 15:09:07 -04:00
Stephen Hemminger	6e583ce524	net: eliminate refcounting in backlog queue Avoid the overhead of atomic increment/decrement on each received packet. This helps performance of non-NAPI devices (like loopback). Use cleanup function to walk queue on each cpu and clean out any left over packets. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-03 21:29:57 -07:00
Wei Yongjun	283d07ac20	ipv6: Do not drop packet if skb->local_df is set to true The old code will drop IPv6 packet if ipfragok is not set, since ipfragok is obsoleted, will be instead by used skb->local_df, so this check must be changed to skb->local_df. This patch fix this problem and not drop packet if skb->local_df is set to true. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-03 21:15:59 -07:00
Herbert Xu	f880374c2f	sctp: Drop ipfargok in sctp_xmit function The ipfragok flag controls whether the packet may be fragmented either on the local host on beyond. The latter is only valid on IPv4. In fact, we never want to do the latter even on IPv4 when PMTU is enabled. This is because even though we can't fragment packets within SCTP due to the prtocol's inherent faults, we can still fragment it at IP layer. By setting the DF bit we will improve the PMTU process. RFC 2960 only says that we SHOULD clear the DF bit in this case, so we're compliant even if we set the DF bit. In fact RFC 4960 no longer has this statement. Once we make this change, we only need to control the local fragmentation. There is already a bit in the skb which controls that, local_df. So this patch sets that instead of using the ipfragok argument. The only complication is that there isn't a struct sock object per transport, so for IPv4 we have to resort to changing the pmtudisc field for every packet. This should be safe though as the protocol is single-threaded. Note that after this patch we can remove ipfragok from the rest of the stack too. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-03 21:15:08 -07:00
Yang Hongyang	cfb266c0ee	ipv6: Fix the return value of Set Hop-by-Hop options header with NULL data pointer When Set Hop-by-Hop options header with NULL data pointer and optlen is not zero use setsockopt(), the kernel successfully return 0 instead of return error EINVAL or EFAULT. This patch fix the problem. Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-03 18:16:15 -07:00
Florian Westphal	1730554f25	ipv6: syncookies: free reqsk on xfrm_lookup error cookie_v6_check() did not call reqsk_free() if xfrm_lookup() fails, leaking the request sock. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-03 18:13:44 -07:00
Sven Wegener	adf044c877	net: Add missing extra2 parameter for ip_default_ttl sysctl Commit `76e6ebfb40` ("netns: add namespace parameter to rt_cache_flush") acceses the extra2 parameter of the ip_default_ttl ctl_table, but it is never set to a meaningful value. When `e84f84f276` ("netns: place rt_genid into struct net") is applied, we'll oops in rt_cache_invalidate(). Set extra2 to init_net, to avoid that. Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Tested-by: Marcin Slusarz <marcin.slusarz@gmail.com> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-03 14:06:44 -07:00
Lennert Buytenhek	e5a4a72d4f	net: use software GSO for SG+CSUM capable netdevices If a netdevice does not support hardware GSO, allowing the stack to use GSO anyway and then splitting the GSO skb into MSS-sized pieces as it is handed to the netdevice for transmitting is likely still a win as far as throughput and/or CPU usage are concerned, since it reduces the number of trips through the output path. This patch enables the use of GSO on any netdevice that supports SG. If a GSO skb is then sent to a netdevice that supports SG but does not support hardware GSO, net/core/dev.c:dev_hard_start_xmit() will take care of doing the necessary GSO segmentation in software. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-03 01:23:10 -07:00
Chris Larson	745e203164	net: fix missing pneigh entries in the neighbor seq_file code When pneigh entries exist, but the user's read buffer isn't sufficient to hold them all, one of the pneigh entries will be missing from the results. In neigh_get_idx_any, the number of elements which neigh_get_idx encountered is not correctly subtracted from the position number before the call to pneigh_get_idx. neigh_get_idx reduces the position by 1 for each call to neigh_get_next, but it does not reduce it by one for the first element (neigh_get_first). The patch alters the neigh_get_idx and pneigh_get_idx functions to subtract one from pos, for the first element, when pos is non-zero. Signed-off-by: Chris Larson <clarson@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-03 01:10:55 -07:00
Chris Larson	bff69732c9	net: in the first call to neigh_seq_next, call neigh_get_first, not neigh_get_idx. neigh_seq_next won't be called both with *pos > 0 && v == SEQ_START_TOKEN, so there's no point calling neigh_get_idx when we're on the start token, just call neigh_get_first directly. Signed-off-by: Chris Larson <clarson@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-03 01:02:41 -07:00
David S. Miller	35ed4e7598	mac80211: Use queue_lock() in ieee80211_ht_agg_queue_remove(). qdisc_root_lock() is only %100 safe to use when the RTNL semaphore is held. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-02 23:25:50 -07:00
David S. Miller	5fb662297b	pkt_sched: Use qdisc_lock() on already sampled root qdisc. Based upon a bug report by Jeff Kirsher. Don't use qdisc_root_lock() in these cases as the root qdisc could have been changed, and we'd thus lock the wrong object. Tested by Emil S Tantilov who confirms that this seems to fix the problem. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-02 20:02:43 -07:00
David S. Miller	e9e80ea5f2	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2008-08-01 22:08:51 -07:00
Tomas Winkler	f8e79ddd31	mac80211: fix fragmentation kludge This patch make mac80211 transmit correctly fragmented packet after queue was stopped Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-01 15:31:33 -04:00
Dmitry Baryshkov	96185664f1	RFKILL: set the status of the leds on activation. Provide default activate function to set the state of the led when the led becomes bound to the trigger Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-01 15:31:33 -04:00
Dmitry Baryshkov	7c4f4578fc	RFKILL: allow one to specify led trigger name Allow the rfkill driver to specify led trigger name. By default it still defaults to the name of rfkill switch. Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-01 15:31:33 -04:00
Henrique de Moraes Holschuh	6e28fbef0f	rfkill: query EV_SW states when rfkill-input (re)?connects to a input device Every time a new input device that is capable of one of the rfkill EV_SW events (currently only SW_RFKILL_ALL) is connected to rfkill-input, we must check the states of the input EV_SW switches and take action. Otherwise, we will ignore the initial switch state. We also need to re-check the states of the EV_SW switches after a device that was under an exclusive grab is released back to us, since we got no input events from that device while it was grabbed. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-08-01 15:31:32 -04:00
Linus Torvalds	9a5467fd60	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (46 commits) tcp: MD5: Fix IPv6 signatures skbuff: add missing kernel-doc for do_not_encrypt net/ipv4/route.c: fix build error tcp: MD5: Fix MD5 signatures on certain ACK packets ipv6: Fix ip6_xmit to send fragments if ipfragok is true ipvs: Move userspace definitions to include/linux/ip_vs.h netdev: Fix lockdep warnings in multiqueue configurations. netfilter: xt_hashlimit: fix race between htable_destroy and htable_gc netfilter: ipt_recent: fix race between recent_mt_destroy and proc manipulations netfilter: nf_conntrack_tcp: decrease timeouts while data in unacknowledged irda: replace __FUNCTION__ with __func__ nsc-ircc: default to dongle type 9 on IBM hardware bluetooth: add quirks for a few hci_usb devices hysdn: remove the packed attribute from PofTimStamp_tag isdn: use the common ascii hex helpers tg3: adapt tg3 to use reworked PCI PM code atm: fix direct casts of pointers to u32 in the InterPhase driver atm: fix const assignment/discard warnings in the ATM networking driver net: use the common ascii hex helpers random32: seeding improvement ...	2008-08-01 11:35:16 -07:00
Al Viro	a1bc6eb4b4	[PATCH] ipv4_static_sysctl_init() should be under CONFIG_SYSCTL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-08-01 11:25:22 -04:00
Adam Langley	00b1304c4c	tcp: MD5: Fix IPv6 signatures Reported by Stefanos Harhalakis; although 2.6.27-rc1 talks to itself using IPv6 TCP MD5 packets just fine, Stefanos noted that tcpdump claimed that the signatures were invalid. I broke this in `49a72dfb88` ("tcp: Fix MD5 signatures for non-linear skbs"), it was just a typo. Note that tcpdump will still sometimes claim that the signatures are incorrect. A patch to tcpdump has been submitted for this[1]. [1] http://tinyurl.com/6a4fl2 Signed-off-by: Adam Langley <agl@imperialviolet.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 21:36:07 -07:00
Ingo Molnar	8a9204db66	net/ipv4/route.c: fix build error fix: net/ipv4/route.c: In function 'ip_static_sysctl_init': net/ipv4/route.c:3225: error: 'ipv4_route_path' undeclared (first use in this function) net/ipv4/route.c:3225: error: (Each undeclared identifier is reported only once net/ipv4/route.c:3225: error: for each function it appears in.) net/ipv4/route.c:3225: error: 'ipv4_route_table' undeclared (first use in this function) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 20:51:22 -07:00
Adam Langley	90b7e1120b	tcp: MD5: Fix MD5 signatures on certain ACK packets I noticed, looking at tcpdumps, that timewait ACKs were getting sent with an incorrect MD5 signature when signatures were enabled. I broke this in `49a72dfb88` ("tcp: Fix MD5 signatures for non-linear skbs"). I didn't take into account that the skb passed to tcp_*_send_ack was the inbound packet, thus the source and dest addresses need to be swapped when calculating the MD5 pseudoheader. Signed-off-by: Adam Langley <agl@imperialviolet.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 20:49:48 -07:00
Wei Yongjun	77e2f14f71	ipv6: Fix ip6_xmit to send fragments if ipfragok is true SCTP used ip6_xmit() to send fragments after received ICMP packet too big message. But while send packet used ip6_xmit, the skb->local_df is not initialized. So when skb if enter ip6_fragment(), the following code will discard the skb. ip6_fragment(...) { if (!skb->local_df) { ... return -EMSGSIZE; } ... } SCTP do the following step: 1. send packet ip6_xmit(skb, ipfragok=0) 2. received ICMP packet too big message 3. if PMTUD_ENABLE: ip6_xmit(skb, ipfragok=1) This patch fixed the problem by set local_df if ipfragok is true. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 20:46:47 -07:00
David S. Miller	c3f26a269c	netdev: Fix lockdep warnings in multiqueue configurations. When support for multiple TX queues were added, the netif_tx_lock() routines we converted to iterate over all TX queues and grab each queue's spinlock. This causes heartburn for lockdep and it's not a healthy thing to do with lots of TX queues anyways. So modify this to use a top-level lock and a "frozen" state for the individual TX queues. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 16:58:50 -07:00
Pavel Emelyanov	967ab999a0	netfilter: xt_hashlimit: fix race between htable_destroy and htable_gc Deleting a timer with del_timer doesn't guarantee, that the timer function is not running at the moment of deletion. Thus in the xt_hashlimit case we can get into a ticklish situation when the htable_gc rearms the timer back and we'll actually delete an entry with a pending timer. Fix it with using del_timer_sync(). AFAIK del_timer_sync checks for the timer to be pending by itself, so I remove the check. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 00:38:52 -07:00
Pavel Emelyanov	a8ddc9163c	netfilter: ipt_recent: fix race between recent_mt_destroy and proc manipulations The thing is that recent_mt_destroy first flushes the entries from table with the recent_table_flush and only after this removes the proc file, corresponding to that table. Thus, if we manage to write to this file the '+XXX' command we will leak some entries. If we manage to write there a 'clean' command we'll race in two recent_table_flush flows, since the recent_mt_destroy calls this outside the recent_lock. The proper solution as I see it is to remove the proc file first and then go on with flushing the table. This flushing becomes safe w/o the lock, since the table is already inaccessible from the outside. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 00:38:31 -07:00
Patrick McHardy	ae375044d3	netfilter: nf_conntrack_tcp: decrease timeouts while data in unacknowledged In order to time out dead connections quicker, keep track of outstanding data and cap the timeout. Suggested by Herbert Xu. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 00:38:01 -07:00
David Howells	cba5cbd155	atm: fix const assignment/discard warnings in the ATM networking driver Fix const assignment/discard warnings in the ATM networking driver. The lane2_assoc_ind() function needed its arguments changing to match changes in the lane2_ops struct (patch `61c33e0129` "atm: use const where reasonable"). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-30 16:31:46 -07:00
Harvey Harrison	6a8341b68b	net: use the common ascii hex helpers Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-30 16:30:15 -07:00
Simon Wunderlich	4adf0af681	bridge: send correct MTU value in PMTU (revised) When bridging interfaces with different MTUs, the bridge correctly chooses the minimum of the MTUs of the physical devices as the bridges MTU. But when a frame is passed which fits through the incoming, but not through the outgoing interface, a "Fragmentation Needed" packet is generated. However, the propagated MTU is hardcoded to 1500, which is wrong in this situation. The sender will repeat the packet again with the same frame size, and the same problem will occur again. Instead of sending 1500, the (correct) MTU value of the bridge is now sent via PMTU. To achieve this, the corresponding rtable structure is stored in its net_bridge structure. Modified to get rid of fake_net_device as well. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-30 16:27:55 -07:00
Robert P. J. Day	031cf19e6f	net: Make "networking" one-click deselectable. Use a menuconfig directive to make all of networking support one-click deselectable from the top-level menu. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-30 03:27:53 -07:00
Daniel Lezcano	17ef51fce0	ipv6: Fix useless proc net sockstat6 removal This call is no longer needed, sockstat6 is per namespace so it is removed at the namespace subsystem destruction. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-30 03:27:52 -07:00
David S. Miller	785957d3e8	tcp: MD5: Use MIB counter instead of warning for MD5 mismatch. From a report by Matti Aarnio, and preliminary patch by Adam Langley. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-30 03:27:25 -07:00
David S. Miller	8d50b53d66	pkt_sched: Fix OOPS on ingress qdisc add. Bug report from Steven Jan Springl: Issuing the following command causes a kernel oops: tc qdisc add dev eth0 handle ffff: ingress The problem mostly stems from all of the special case handling of ingress qdiscs. So, to fix this, do the grafting operation the same way we do for TX qdiscs. Which means that dev_activate() and dev_deactivate() now do the "qdisc_sleeping <--> qdisc" transitions on dev->rx_queue too. Future simplifications are possible now, mainly because it is impossible for dev_queue->{qdisc,qdisc_sleeping} to be NULL. There are NULL checks all over to handle the ingress qdisc special case that used to exist before this commit. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-30 02:44:25 -07:00
Miao Xie	4a36702e01	IPv6: datagram_send_ctl() should exit immediately when an error occured When an error occured, datagram_send_ctl() should exit immediately rather than continue to run the for loop. Otherwise, the variable err might be changed and the error might be hidden. Fix this bug by using "goto" instead of "break". Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-29 23:57:58 -07:00
David S. Miller	e93dc4891d	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2008-07-29 21:51:00 -07:00
Luis Carlos Cobo	56a6d13dfd	mac80211: fix mesh beaconing This patch fixes mesh beaconing, which was broken by "mac80211: revamp beacon configuration". Signed-off-by: Luis Carlos Cobo <luisca@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:55:09 -04:00
Johannes Berg	14db74bcc3	mac80211: fix cfg80211 hooks for master interface The master interface is a virtual interface that is registered to mac80211, changing that does not seem like a good idea at the moment. However, since it has no sdata, we cannot accept any configuration for it. This patch makes the cfg80211 hooks reject any such attempt. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:55:08 -04:00
Johannes Berg	bba95fefb8	nl80211: fix dump callbacks Julius Volz pointed out that the dump callbacks in nl80211 were broken and fixed one of them. This patch fixes the other three and also addresses the TODOs there. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: Julius Volz <juliusv@google.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:55:08 -04:00
Johannes Berg	d0f0980414	mac80211: partially fix skb->cb use This patch fixes mac80211 to not use the skb->cb over the queue step from virtual interfaces to the master. The patch also, for now, disables aggregation because that would still require requeuing, will fix that in a separate patch. There are two other places (software requeue and powersaving stations) where requeue can happen, but that is not currently used by any drivers/not possible to use respectively. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:55:08 -04:00
Rami Rosen	5422399518	mac80211: append CONFIG_ to MAC80211_VERBOSE_PS_DEBUG in net/mac80211/tx.c. In net/mac80211/tx.c, there are some #ifdef which checks MAC80211_VERBOSE_PS_DEBUG (which in fact is never set) instead of CONFIG_MAC80211_VERBOSE_PS_DEBUG, as should be. This patch replaces MAC80211_VERBOSE_PS_DEBUG with CONFIG_MAC80211_VERBOSE_PS_DEBUG in these #ifdef commands in net/mac80211/tx.c. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:55:07 -04:00
Jeremy Fitzhardinge	023a04bebe	mac80211: return correct error return from ieee80211_wep_init Return the proper error code rather than a hard-coded ENOMEM from ieee80211_wep_init. Also, print the error code on failure. Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:55:07 -04:00
Jiri Slaby	1b0241656b	mac80211: tx, use dev_kfree_skb_any for beacon_get Use dev_kfree_skb_any(); instead of dev_kfree_skb();, since ieee80211_beacon_get function might be called from atomic. (It's in a fail path.) Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Michael Wu <flamingice@sourmilk.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:55:07 -04:00
Henrique de Moraes Holschuh	435307a365	rfkill: yet more minor kernel-doc fixes For some stupid reason, I sent and old version of the patch minor kernel doc-fix patch, and it got merged before I noticed the problem. This is an incremental fix on top. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:55:03 -04:00
Henrique de Moraes Holschuh	064af1117b	rfkill: mutex fixes There are two mutexes in rfkill: rfkill->mutex, which protects some of the fields of a rfkill struct, and is also used for callback serialization. rfkill_mutex, which protects the global state, the list of registered rfkill structs and rfkill->claim. Make sure to use the correct mutex, and to not miss locking rfkill->mutex even when we already took rfkill_mutex. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:36:35 -04:00
Henrique de Moraes Holschuh	37f55e9d78	rfkill: fix led-trigger unregister order in error unwind rfkill needs to unregister the led trigger AFTER a call to rfkill_remove_switch(), otherwise it will not update the LED state, possibly leaving it ON when it should be OFF. To make led-trigger unregistering safer, guard against unregistering a trigger twice, and also against issuing trigger events to a led trigger that was unregistered. This makes the error unwind paths more resilient. Refer to "rfkill: Register LED triggers before registering switch". Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Cc: Michael Buesch <mb@bu3sch.de> Cc: Dmitry Baryshkov <dbaryshkov@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:36:32 -04:00
Henrique de Moraes Holschuh	2fd9b2212e	rfkill: document rfkill_force_state as required (v2) While the rfkill class does work with just get_state(), it doesn't work well on devices that are subject to external events that cause rfkill state changes. Document that rfkill_force_state() is required in those cases. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:36:32 -04:00
Ingo Molnar	9e3ee1c39c	Merge branch 'linus' into cpus4096 Conflicts: kernel/stop_machine.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-28 23:32:00 +02:00
Ingo Molnar	414f746d23	Merge branch 'linus' into cpus4096	2008-07-28 21:14:43 +02:00
Al Viro	eeb61f719c	missing bits of net-namespace / sysctl Piss-poor sysctl registration API strikes again, film at 11... What we really need is _pathname_ required to be present in already registered table, so that kernel could warn about bad order. That's the next target for sysctl stuff (and generally saner and more explicit order of initialization of ipv[46] internals wouldn't hurt either). For the time being, here are full fixups required by ..._rotable() stuff; we make per-net sysctl sets descendents of "ro" one and make sure that sufficient skeleton is there before we start registering per-net sysctls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-27 09:45:34 -07:00
David S. Miller	2ab61b0111	Merge branch 'master' of git://eden-feed.erg.abdn.ac.uk/net-2.6	2008-07-27 05:00:25 -07:00
Al Viro	6f9f489a4e	net: missing bits of net-namespace / sysctl Piss-poor sysctl registration API strikes again, film at 11... What we really need is _pathname_ required to be present in already registered table, so that kernel could warn about bad order. That's the next target for sysctl stuff (and generally saner and more explicit order of initialization of ipv[46] internals wouldn't hurt either). For the time being, here are full fixups required by ..._rotable() stuff; we make per-net sysctl sets descendents of "ro" one and make sure that sufficient skeleton is there before we start registering per-net sysctls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-27 04:40:51 -07:00
David S. Miller	15d3b4a262	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-07-27 04:40:08 -07:00
David S. Miller	2c3abab7c9	ipcomp: Fix warnings after ipcomp consolidation. net/ipv4/ipcomp.c: In function ‘ipcomp4_init_state’: net/ipv4/ipcomp.c:109: warning: unused variable ‘calg_desc’ net/ipv4/ipcomp.c:108: warning: unused variable ‘ipcd’ net/ipv4/ipcomp.c:107: warning: ‘err’ may be used uninitialized in this function net/ipv6/ipcomp6.c: In function ‘ipcomp6_init_state’: net/ipv6/ipcomp6.c:139: warning: unused variable ‘calg_desc’ net/ipv6/ipcomp6.c:138: warning: unused variable ‘ipcd’ net/ipv6/ipcomp6.c:137: warning: ‘err’ may be used uninitialized in this function Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-27 03:59:24 -07:00
Linus Torvalds	4836e30078	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (39 commits) [PATCH] fix RLIM_NOFILE handling [PATCH] get rid of corner case in dup3() entirely [PATCH] remove remaining namei_{32,64}.h crap [PATCH] get rid of indirect users of namei.h [PATCH] get rid of __user_path_lookup_open [PATCH] f_count may wrap around [PATCH] dup3 fix [PATCH] don't pass nameidata to __ncp_lookup_validate() [PATCH] don't pass nameidata to gfs2_lookupi() [PATCH] new (local) helper: user_path_parent() [PATCH] sanitize __user_walk_fd() et.al. [PATCH] preparation to __user_walk_fd cleanup [PATCH] kill nameidata passing to permission(), rename to inode_permission() [PATCH] take noexec checks to very few callers that care Re: [PATCH 3/6] vfs: open_exec cleanup [patch 4/4] vfs: immutable inode checking cleanup [patch 3/4] fat: dont call notify_change [patch 2/4] vfs: utimes cleanup [patch 1/4] vfs: utimes: move owner check into inode_change_ok() [PATCH] vfs: use kstrdup() and check failing allocation ...	2008-07-26 20:23:44 -07:00
Linus Torvalds	2284284281	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: netns: fix ip_rt_frag_needed rt_is_expired netfilter: nf_conntrack_extend: avoid unnecessary "ct->ext" dereferences netfilter: fix double-free and use-after free netfilter: arptables in netns for real netfilter: ip{,6}tables_security: fix future section mismatch selinux: use nf_register_hooks() netfilter: ebtables: use nf_register_hooks() Revert "pkt_sched: sch_sfq: dump a real number of flows" qeth: use dev->ml_priv instead of dev->priv syncookies: Make sure ECN is disabled net: drop unused BUG_TRAP() net: convert BUG_TRAP to generic WARN_ON drivers/net: convert BUG_TRAP to generic WARN_ON	2008-07-26 20:17:56 -07:00
Al Viro	516e0cc564	[PATCH] f_count may wrap around make it atomic_long_t; while we are at it, get rid of useless checks in affs, hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:40 -04:00
Al Viro	bd7b1533cd	[PATCH] sysctl: make sure that /proc/sys/net/ipv4 appears before per-ns ones Massage ipv4 initialization - make sure that net.ipv4 appears as non-per-net-namespace before it shows up in per-net-namespace sysctls. That's the only change outside of sysctl.c needed to get sane ordering rules and data structures for sysctls (esp. for procfs side of that mess). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:10 -04:00
Al Viro	734550921e	[PATCH] beginning of sysctl cleanup - ctl_table_set New object: set of sysctls [currently - root and per-net-ns]. Contains: pointer to parent set, list of tables and "should I see this set?" method (->is_seen(set)). Current lists of tables are subsumed by that; net-ns contains such a beast. ->lookup() for ctl_table_root returns pointer to ctl_table_set instead of that to ->list of that ctl_table_set. [folded compile fixes by rdd for configs without sysctl] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:08 -04:00
Hugh Dickins	6c3b8fc618	netns: fix ip_rt_frag_needed rt_is_expired Running recent kernels, and using a particular vpn gateway, I've been having to edit my mails down to get them accepted by the smtp server. Git bisect led to commit `e84f84f276` - netns: place rt_genid into struct net. The conversion from a != test to rt_is_expired() put one negative too many: and now my mail works. Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 17:51:06 -07:00
Patrick McHardy	6c64825bf4	netfilter: nf_conntrack_extend: avoid unnecessary "ct->ext" dereferences As Linus points out, "ct->ext" and "new" are always equal, avoid unnecessary dereferences and use "new" directly. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 17:50:05 -07:00
Pekka Enberg	93bc4e89c2	netfilter: fix double-free and use-after free As suggested by Patrick McHardy, introduce a __krealloc() that doesn't free the original buffer to fix a double-free and use-after-free bug introduced by me in netfilter that uses RCU. Reported-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Tested-by: Dieter Ries <clip2@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 17:49:33 -07:00
Alexey Dobriyan	3918fed5f3	netfilter: arptables in netns for real IN, FORWARD -- grab netns from in device, OUT -- from out device. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 17:48:59 -07:00
Alexey Dobriyan	f858b4869a	netfilter: ip{,6}tables_security: fix future section mismatch Currently not visible, because NET_NS is mutually exclusive with SYSFS which is required by SECURITY. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 17:48:38 -07:00
Alexey Dobriyan	e40f51a36a	netfilter: ebtables: use nf_register_hooks() Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 17:47:53 -07:00
Alexey Dobriyan	51cc50685a	SL*B: drop kmem cache argument from constructor Kmem cache passed to constructor is only needed for constructors that are themselves multiplexeres. Nobody uses this "feature", nor does anybody uses passed kmem cache in non-trivial way, so pass only pointer to object. Non-trivial places are: arch/powerpc/mm/init_64.c arch/powerpc/mm/hugetlbpage.c This is flag day, yes. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Matt Mackall <mpm@selenic.com> [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c] [akpm@linux-foundation.org: fix mm/slab.c] [akpm@linux-foundation.org: fix ubifs] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:07 -07:00
FUJITA Tomonori	8d8bb39b9e	dma-mapping: add the device argument to dma_mapping_error() Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER architecture does: This enables us to cleanly fix the Calgary IOMMU issue that some devices are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423). I think that per-device dma_mapping_ops support would be also helpful for KVM people to support PCI passthrough but Andi thinks that this makes it difficult to support the PCI passthrough (see the above thread). So I CC'ed this to KVM camp. Comments are appreciated. A pointer to dma_mapping_ops to struct dev_archdata is added. If the pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's NULL, the system-wide dma_ops pointer is used as before. If it's useful for KVM people, I plan to implement a mechanism to register a hook called when a new pci (or dma capable) device is created (it works with hot plugging). It enables IOMMUs to set up an appropriate dma_mapping_ops per device. The major obstacle is that dma_mapping_error doesn't take a pointer to the device unlike other DMA operations. So x86 can't have dma_mapping_ops per device. Note all the POWER IOMMUs use the same dma_mapping_error function so this is not a problem for POWER but x86 IOMMUs use different dma_mapping_error functions. The first patch adds the device argument to dma_mapping_error. The patch is trivial but large since it touches lots of drivers and dma-mapping.h in all the architecture. This patch: dma_mapping_error() doesn't take a pointer to the device unlike other DMA operations. So we can't have dma_mapping_ops per device. Note that POWER already has dma_mapping_ops per device but all the POWER IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device argument. [akpm@linux-foundation.org: fix sge] [akpm@linux-foundation.org: fix svc_rdma] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix bnx2x] [akpm@linux-foundation.org: fix s2io] [akpm@linux-foundation.org: fix pasemi_mac] [akpm@linux-foundation.org: fix sdhci] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix sparc] [akpm@linux-foundation.org: fix ibmvscsi] Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:03 -07:00
Mike Travis	0bc3cc03fa	cpumask: change cpumask_of_cpu_ptr to use new cpumask_of_cpu * Replace previous instances of the cpumask_of_cpu_ptr* macros with a the new (lvalue capable) generic cpumask_of_cpu(). Signed-off-by: Mike Travis <travis@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-26 16:40:33 +02:00
Wei Yongjun	860239c56b	dccp: Add check for truncated ICMPv6 DCCP error packets This patch adds a minimum-length check for ICMPv6 packets, as per the previous patch for ICMPv4 payloads. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-07-26 11:59:11 +01:00
Wei Yongjun	18e1d83600	dccp: Fix incorrect length check for ICMPv4 packets Unlike TCP, which only needs 8 octets of original packet data, DCCP requires minimally 12 or 16 bytes for ICMP-payload sequence number checks. This patch replaces the insufficient length constant of 8 with a two-stage test, making sure that 12 bytes are available, before computing the basic header length required for sequence number checks. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-07-26 11:59:10 +01:00
Wei Yongjun	e0bcfb0c6a	dccp: Add check for sequence number in ICMPv6 message This adds a sequence number check for ICMPv6 DCCP error packets, in the same manner as it has been done for ICMPv4 in the previous patch. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-07-26 11:59:10 +01:00
Wei Yongjun	d68f0866f7	dccp: Fix sequence number check for ICMPv4 packets The payload of ICMP message is a part of the packet sent by ourself, so the sequence number check must use AWL and AWH, not SWL and SWH. For example: Endpoint A Endpoint B DATA-ACK --------> (SEQ=X) <-------- ICMP (Fragmentation Needed) (SEQ=X) Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-07-26 11:59:10 +01:00
Gerrit Renker	73f18fdbca	dccp: Bug-Fix - AWL was never updated The AWL lower Ack validity window advances in proportion to GSS, the greatest sequence number sent. Updating AWL other than at connection setup (in the DCCP-Request sent by dccp_v{4,6}_connect()) was missing in the DCCP code. This bug lead to syslog messages such as "kernel: dccp_check_seqno: DCCP: Step 6 failed for DATAACK packet, [...] P.ackno exists or LAWL(82947089) <= P.ackno(82948208) <= S.AWH(82948728), sending SYNC..." The difference between AWL/AWH here is 1639 packets, while the expected value (the Sequence Window) would have been 100 (the default). A closer look showed that LAWL = AWL = 82947089 equalled the ISS on the Response. The patch now updates AWL with each increase of GSS. Further changes: ---------------- The patch also enforces more stringent checks on the ISS sequence number: * AWL is initialised to ISS at connection setup and remains at this value; * AWH is then always set to GSS (via dccp_update_gss()); * so on the first Request: AWL = AWH = ISS, and on the n-th Request: AWL = ISS, AWH = ISS + n. As a consequence, only Response packets that refer to Requests sent by this host will pass, all others are discarded. This is the intention and in effect implements the initial adjustments for AWL as specified in RFC 4340, 7.5.1. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>	2008-07-26 11:59:10 +01:00
Gerrit Renker	59435444a1	dccp: Allow to distinguish original and retransmitted packets This patch allows the sender to distinguish original and retransmitted packets, which is in particular needed for the retransmission of DCCP-Requests: * the first Request uses ISS (generated in net/dccp/ip.c), and sets GSS = ISS; all retransmitted Requests use GSS' = GSS + 1, so that the n-th retransmitted Request has sequence number ISS + n (mod 48). To add generic support, the patch reorganises existing code so that: * icsk_retransmits == 0 for the original packet and * icsk_retransmits = n > 0 for the n-th retransmitted packet at the time dccp_transmit_skb() is called, via dccp_retransmit_skb(). Thanks to Wei Yongjun for pointing this problem out. Further changes: ---------------- * removed the `skb' argument from dccp_retransmit_skb(), since sk_send_head is used for all retransmissions (the exception is client-Acks in PARTOPEN state, but these do not use sk_send_head); * since sk_send_head always contains the original skb (via dccp_entail()), skb_cloned() never evaluated to true and thus pskb_copy() was never used. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-07-26 11:59:09 +01:00
David S. Miller	cdec7e50a4	Revert "pkt_sched: sch_sfq: dump a real number of flows" This reverts commit `f867e6af94`. Based upon discussions between Jarek and Patrick McHardy this is field being set is more a config parameter than a statistic. And we should add a true statistic to provide this information if we really want it. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 02:28:09 -07:00
Florian Westphal	16df845f45	syncookies: Make sure ECN is disabled ecn_ok is not initialized when a connection is established by cookies. The cookie syn-ack never sets ECN, so ecn_ok must be set to 0. Spotted using ns-3/network simulation cradle simulator and valgrind. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 02:21:54 -07:00
Ilpo Järvinen	547b792cac	net: convert BUG_TRAP to generic WARN_ON Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), and is unambiguous due to better naming. Non-intuively BUG_TRAP() is actually equal to WARN_ON() rather than BUG_ON() though some might actually be promoted to BUG_ON() but I left that to future. I could make at least one BUILD_BUG_ON conversion. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-25 21:43:18 -07:00
Linus Torvalds	1ff8419871	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: ipsec: ipcomp - Decompress into frags if necessary ipsec: ipcomp - Merge IPComp implementations pkt_sched: Fix locking in shutdown_scheduler_queue()	2008-07-25 17:40:16 -07:00
Stephen Hemminger	4ecb90090c	sysctl: allow override of /proc/sys/net with CAP_NET_ADMIN Extend the permission check for networking sysctl's to allow modification when current process has CAP_NET_ADMIN capability and is not root. This version uses the until now unused permissions hook to override the mode value for /proc/sys/net if accessed by a user with capabilities. Found while working with Quagga. It is impossible to turn forwarding on/off through the command interface because Quagga uses secure coding practice of dropping privledges during initialization and only raising via capabilities when necessary. Since the dameon has reset real/effective uid after initialization, all attempts to access /proc/sys/net variables will fail. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morgan <morgan@kernel.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:45 -07:00
Dave Young	717115e1a5	printk ratelimiting rewrite All ratelimit user use same jiffies and burst params, so some messages (callbacks) will be lost. For example: a call printk_ratelimit(5 * HZ, 1) b call printk_ratelimit(5 * HZ, 1) before the 5*HZ timeout of a, then b will will be supressed. - rewrite __ratelimit, and use a ratelimit_state as parameter. Thanks for hints from andrew. - Add WARN_ON_RATELIMIT, update rcupreempt.h - remove __printk_ratelimit - use __ratelimit in net_ratelimit Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:29 -07:00
Paul E. McKenney	696adfe84c	list_for_each_rcu must die: networking All uses of list_for_each_rcu() can be profitably replaced by the easier-to-use list_for_each_entry_rcu(). This patch makes this change for networking, in preparation for removing the list_for_each_rcu() API entirely. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:27 -07:00
Herbert Xu	7d7e5a60c6	ipsec: ipcomp - Decompress into frags if necessary When decompressing extremely large packets allocating them through kmalloc is prone to failure. Therefore it's better to use page frags instead. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-25 02:55:33 -07:00
Herbert Xu	6fccab671f	ipsec: ipcomp - Merge IPComp implementations This patch merges the IPv4/IPv6 IPComp implementations since most of the code is identical. As a result future enhancements will no longer need to be duplicated. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-25 02:54:40 -07:00
David S. Miller	cffe1c5d7a	pkt_sched: Fix locking in shutdown_scheduler_queue() Qdisc locks need to be held with BH disabled. Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-25 01:25:04 -07:00
Linus Torvalds	c3c2233d84	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: pkt_sched: sch_sfq: dump a real number of flows atm: [fore200e] use MODULE_FIRMWARE() and other suggested cleanups netfilter: make security table depend on NETFILTER_ADVANCED tcp: Clear probes_out more aggressively in tcp_ack(). e1000e: fix e1000_netpoll(), remove extraneous e1000_clean_tx_irq() call net: Update entry in af_family_clock_key_strings netdev: Remove warning from __netif_schedule(). sky2: don't stop queue on shutdown	2008-07-24 12:14:58 -07:00
Ulrich Drepper	e38b36f325	flag parameters: check magic constants This patch adds test that ensure the boundary conditions for the various constants introduced in the previous patches is met. No code is generated. [akpm@linux-foundation.org: fix alpha] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:29 -07:00
Ulrich Drepper	77d2720059	flag parameters: NONBLOCK in socket and socketpair This patch introduces support for the SOCK_NONBLOCK flag in socket, socketpair, and paccept. To do this the internal function sock_attach_fd gets an additional parameter which it uses to set the appropriate flag for the file descriptor. Given that in modern, scalable programs almost all socket connections are non-blocking and the minimal additional cost for the new functionality I see no reason not to add this code. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <pthread.h> #include <stdio.h> #include <unistd.h> #include <netinet/in.h> #include <sys/socket.h> #include <sys/syscall.h> #ifndef __NR_paccept # ifdef __x86_64__ # define __NR_paccept 288 # elif defined __i386__ # define SYS_PACCEPT 18 # define USE_SOCKETCALL 1 # else # error "need __NR_paccept" # endif #endif #ifdef USE_SOCKETCALL # define paccept(fd, addr, addrlen, mask, flags) \ ({ long args[6] = { \ (long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \ syscall (__NR_socketcall, SYS_PACCEPT, args); }) #else # define paccept(fd, addr, addrlen, mask, flags) \ syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags) #endif #define PORT 57392 #define SOCK_NONBLOCK O_NONBLOCK static pthread_barrier_t b; static void * tf (void arg) { pthread_barrier_wait (&b); int s = socket (AF_INET, SOCK_STREAM, 0); struct sockaddr_in sin; sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr ) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); pthread_barrier_wait (&b); s = socket (AF_INET, SOCK_STREAM, 0); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr ) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); return NULL; } int main (void) { int fd; fd = socket (PF_INET, SOCK_STREAM, 0); if (fd == -1) { puts ("socket(0) failed"); return 1; } int fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if (fl & O_NONBLOCK) { puts ("socket(0) set non-blocking mode"); return 1; } close (fd); fd = socket (PF_INET, SOCK_STREAM\|SOCK_NONBLOCK, 0); if (fd == -1) { puts ("socket(SOCK_NONBLOCK) failed"); return 1; } fl = fcntl (fd, F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if ((fl & O_NONBLOCK) == 0) { puts ("socket(SOCK_NONBLOCK) does not set non-blocking mode"); return 1; } close (fd); int fds[2]; if (socketpair (PF_UNIX, SOCK_STREAM, 0, fds) == -1) { puts ("socketpair(0) failed"); return 1; } for (int i = 0; i < 2; ++i) { fl = fcntl (fds[i], F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if (fl & O_NONBLOCK) { printf ("socketpair(0) set non-blocking mode for fds[%d]\n", i); return 1; } close (fds[i]); } if (socketpair (PF_UNIX, SOCK_STREAM\|SOCK_NONBLOCK, 0, fds) == -1) { puts ("socketpair(SOCK_NONBLOCK) failed"); return 1; } for (int i = 0; i < 2; ++i) { fl = fcntl (fds[i], F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if ((fl & O_NONBLOCK) == 0) { printf ("socketpair(SOCK_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i); return 1; } close (fds[i]); } pthread_barrier_init (&b, NULL, 2); struct sockaddr_in sin; pthread_t th; if (pthread_create (&th, NULL, tf, NULL) != 0) { puts ("pthread_create failed"); return 1; } int s = socket (AF_INET, SOCK_STREAM, 0); int reuse = 1; setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse)); sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); bind (s, (struct sockaddr ) &sin, sizeof (sin)); listen (s, SOMAXCONN); pthread_barrier_wait (&b); int s2 = paccept (s, NULL, 0, NULL, 0); if (s2 < 0) { puts ("paccept(0) failed"); return 1; } fl = fcntl (s2, F_GETFL); if (fl & O_NONBLOCK) { puts ("paccept(0) set non-blocking mode"); return 1; } close (s2); close (s); pthread_barrier_wait (&b); s = socket (AF_INET, SOCK_STREAM, 0); sin.sin_port = htons (PORT); setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse)); bind (s, (struct sockaddr *) &sin, sizeof (sin)); listen (s, SOMAXCONN); pthread_barrier_wait (&b); s2 = paccept (s, NULL, 0, NULL, SOCK_NONBLOCK); if (s2 < 0) { puts ("paccept(SOCK_NONBLOCK) failed"); return 1; } fl = fcntl (s2, F_GETFL); if ((fl & O_NONBLOCK) == 0) { puts ("paccept(SOCK_NONBLOCK) does not set non-blocking mode"); return 1; } close (s2); close (s); pthread_barrier_wait (&b); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:29 -07:00
Ulrich Drepper	c019bbc612	flag parameters: paccept w/out set_restore_sigmask Some platforms do not have support to restore the signal mask in the return path from a syscall. For those platforms syscalls like pselect are not defined at all. This is, I think, not a good choice for paccept() since paccept() adds more value on top of accept() than just the signal mask handling. Therefore this patch defines a scaled down version of the sys_paccept function for those platforms. It returns -EINVAL in case the signal mask is non-NULL but behaves the same otherwise. Note that I explicitly included <linux/thread_info.h>. I saw that it is currently included but indirectly two levels down. There is too much risk in relying on this. The header might change and then suddenly the function definition would change without anyone immediately noticing. Signed-off-by: Ulrich Drepper <drepper@redhat.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:27 -07:00
Ulrich Drepper	aaca0bdca5	flag parameters: paccept This patch is by far the most complex in the series. It adds a new syscall paccept. This syscall differs from accept in that it adds (at the userlevel) two additional parameters: - a signal mask - a flags value The flags parameter can be used to set flag like SOCK_CLOEXEC. This is imlpemented here as well. Some people argued that this is a property which should be inherited from the file desriptor for the server but this is against POSIX. Additionally, we really want the signal mask parameter as well (similar to pselect, ppoll, etc). So an interface change in inevitable. The flag value is the same as for socket and socketpair. I think diverging here will only create confusion. Similar to the filesystem interfaces where the use of the O_* constants differs, it is acceptable here. The signal mask is handled as for pselect etc. The mask is temporarily installed for the thread and removed before the call returns. I modeled the code after pselect. If there is a problem it's likely also in pselect. For architectures which use socketcall I maintained this interface instead of adding a system call. The symmetry shouldn't be broken. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <errno.h> #include <fcntl.h> #include <pthread.h> #include <signal.h> #include <stdio.h> #include <unistd.h> #include <netinet/in.h> #include <sys/socket.h> #include <sys/syscall.h> #ifndef __NR_paccept # ifdef __x86_64__ # define __NR_paccept 288 # elif defined __i386__ # define SYS_PACCEPT 18 # define USE_SOCKETCALL 1 # else # error "need __NR_paccept" # endif #endif #ifdef USE_SOCKETCALL # define paccept(fd, addr, addrlen, mask, flags) \ ({ long args[6] = { \ (long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \ syscall (__NR_socketcall, SYS_PACCEPT, args); }) #else # define paccept(fd, addr, addrlen, mask, flags) \ syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags) #endif #define PORT 57392 #define SOCK_CLOEXEC O_CLOEXEC static pthread_barrier_t b; static void * tf (void arg) { pthread_barrier_wait (&b); int s = socket (AF_INET, SOCK_STREAM, 0); struct sockaddr_in sin; sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr ) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); s = socket (AF_INET, SOCK_STREAM, 0); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr ) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); pthread_barrier_wait (&b); sleep (2); pthread_kill ((pthread_t) arg, SIGUSR1); return NULL; } static void handler (int s) { } int main (void) { pthread_barrier_init (&b, NULL, 2); struct sockaddr_in sin; pthread_t th; if (pthread_create (&th, NULL, tf, (void ) pthread_self ()) != 0) { puts ("pthread_create failed"); return 1; } int s = socket (AF_INET, SOCK_STREAM, 0); int reuse = 1; setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse)); sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); bind (s, (struct sockaddr *) &sin, sizeof (sin)); listen (s, SOMAXCONN); pthread_barrier_wait (&b); int s2 = paccept (s, NULL, 0, NULL, 0); if (s2 < 0) { puts ("paccept(0) failed"); return 1; } int coe = fcntl (s2, F_GETFD); if (coe & FD_CLOEXEC) { puts ("paccept(0) set close-on-exec-flag"); return 1; } close (s2); pthread_barrier_wait (&b); s2 = paccept (s, NULL, 0, NULL, SOCK_CLOEXEC); if (s2 < 0) { puts ("paccept(SOCK_CLOEXEC) failed"); return 1; } coe = fcntl (s2, F_GETFD); if ((coe & FD_CLOEXEC) == 0) { puts ("paccept(SOCK_CLOEXEC) does not set close-on-exec flag"); return 1; } close (s2); pthread_barrier_wait (&b); struct sigaction sa; sa.sa_handler = handler; sa.sa_flags = 0; sigemptyset (&sa.sa_mask); sigaction (SIGUSR1, &sa, NULL); sigset_t ss; pthread_sigmask (SIG_SETMASK, NULL, &ss); sigaddset (&ss, SIGUSR1); pthread_sigmask (SIG_SETMASK, &ss, NULL); sigdelset (&ss, SIGUSR1); alarm (4); pthread_barrier_wait (&b); errno = 0 ; s2 = paccept (s, NULL, 0, &ss, 0); if (s2 != -1 \|\| errno != EINTR) { puts ("paccept did not fail with EINTR"); return 1; } close (s); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [akpm@linux-foundation.org: make it compile] [akpm@linux-foundation.org: add sys_ni stub] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Roland McGrath <roland@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:27 -07:00
Ulrich Drepper	a677a039be	flag parameters: socket and socketpair This patch adds support for flag values which are ORed to the type passwd to socket and socketpair. The additional code is minimal. The flag values in this implementation can and must match the O_* flags. This avoids overhead in the conversion. The internal functions sock_alloc_fd and sock_map_fd get a new parameters and all callers are changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <netinet/in.h> #include <sys/socket.h> #define PORT 57392 /* For Linux these must be the same. */ #define SOCK_CLOEXEC O_CLOEXEC int main (void) { int fd; fd = socket (PF_INET, SOCK_STREAM, 0); if (fd == -1) { puts ("socket(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("socket(0) set close-on-exec flag"); return 1; } close (fd); fd = socket (PF_INET, SOCK_STREAM\|SOCK_CLOEXEC, 0); if (fd == -1) { puts ("socket(SOCK_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("socket(SOCK_CLOEXEC) does not set close-on-exec flag"); return 1; } close (fd); int fds[2]; if (socketpair (PF_UNIX, SOCK_STREAM, 0, fds) == -1) { puts ("socketpair(0) failed"); return 1; } for (int i = 0; i < 2; ++i) { coe = fcntl (fds[i], F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { printf ("socketpair(0) set close-on-exec flag for fds[%d]\n", i); return 1; } close (fds[i]); } if (socketpair (PF_UNIX, SOCK_STREAM\|SOCK_CLOEXEC, 0, fds) == -1) { puts ("socketpair(SOCK_CLOEXEC) failed"); return 1; } for (int i = 0; i < 2; ++i) { coe = fcntl (fds[i], F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { printf ("socketpair(SOCK_CLOEXEC) does not set close-on-exec flag for fds[%d]\n", i); return 1; } close (fds[i]); } puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:27 -07:00
Jarek Poplawski	f867e6af94	pkt_sched: sch_sfq: dump a real number of flows Dump the "flows" number according to the number of active flows instead of repeating the "limit". Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-23 21:34:27 -07:00
Linus Torvalds	26dcce0fab	Merge branch 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits) NR_CPUS: Replace NR_CPUS in speedstep-centrino.c cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP NR_CPUS: Replace NR_CPUS in cpufreq userspace routines NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target cpumask: Provide a generic set of CPUMASK_ALLOC macros cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr Revert "cpumask: introduce new APIs" cpumask: make for_each_cpu_mask a bit smaller net: Pass reference to cpumask variable in net/sunrpc/svc.c ... Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually	2008-07-23 18:37:44 -07:00
Patrick McHardy	70eed75d76	netfilter: make security table depend on NETFILTER_ADVANCED Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-23 16:42:42 -07:00
David S. Miller	4b53fb67e3	tcp: Clear probes_out more aggressively in tcp_ack(). This is based upon an excellent bug report from Eric Dumazet. tcp_ack() should clear ->icsk_probes_out even if there are packets outstanding. Otherwise if we get a sequence of ACKs while we do have packets outstanding over and over again, we'll never clear the probes_out value and eventually think the connection is too sick and we'll reset it. This appears to be some "optimization" added to tcp_ack() in the 2.4.x timeframe. In 2.2.x, probes_out is pretty much always cleared by tcp_ack(). Here is Eric's original report: ---------------------------------------- Apparently, we can in some situations reset TCP connections in a couple of seconds when some frames are lost. In order to reproduce the problem, please try the following program on linux-2.6.25.* Setup some iptables rules to allow two frames per second sent on loopback interface to tcp destination port 12000 iptables -N SLOWLO iptables -A SLOWLO -m hashlimit --hashlimit 2 --hashlimit-burst 1 --hashlimit-mode dstip --hashlimit-name slow2 -j ACCEPT iptables -A SLOWLO -j DROP iptables -A OUTPUT -o lo -p tcp --dport 12000 -j SLOWLO Then run the attached program and see the output : # ./loop State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,1) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,3) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,5) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,7) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,9) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,11) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,201ms,13) State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,188ms,15) write(): Connection timed out wrote 890 bytes but was interrupted after 9 seconds ESTAB 0 0 127.0.0.1:12000 127.0.0.1:54455 Exiting read() because no data available (4000 ms timeout). read 860 bytes While this tcp session makes progress (sending frames with 50 bytes of payload, every 500ms), linux tcp stack decides to reset it, when tcp_retries 2 is reached (default value : 15) tcpdump : 15:30:28.856695 IP 127.0.0.1.56554 > 127.0.0.1.12000: S 33788768:33788768(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7> 15:30:28.856711 IP 127.0.0.1.12000 > 127.0.0.1.56554: S 33899253:33899253(0) ack 33788769 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7> 15:30:29.356947 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 1:61(60) ack 1 win 257 15:30:29.356966 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 61 win 257 15:30:29.866415 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 61:111(50) ack 1 win 257 15:30:29.866427 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 111 win 257 15:30:30.366516 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 111:161(50) ack 1 win 257 15:30:30.366527 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 161 win 257 15:30:30.876196 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 161:211(50) ack 1 win 257 15:30:30.876207 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 211 win 257 15:30:31.376282 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 211:261(50) ack 1 win 257 15:30:31.376290 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 261 win 257 15:30:31.885619 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 261:311(50) ack 1 win 257 15:30:31.885631 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 311 win 257 15:30:32.385705 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 311:361(50) ack 1 win 257 15:30:32.385715 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 361 win 257 15:30:32.895249 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 361:411(50) ack 1 win 257 15:30:32.895266 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 411 win 257 15:30:33.395341 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 411:461(50) ack 1 win 257 15:30:33.395351 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 461 win 257 15:30:33.918085 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 461:511(50) ack 1 win 257 15:30:33.918096 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 511 win 257 15:30:34.418163 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 511:561(50) ack 1 win 257 15:30:34.418172 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 561 win 257 15:30:34.927685 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 561:611(50) ack 1 win 257 15:30:34.927698 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 611 win 257 15:30:35.427757 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 611:661(50) ack 1 win 257 15:30:35.427766 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 661 win 257 15:30:35.937359 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 661:711(50) ack 1 win 257 15:30:35.937376 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 711 win 257 15:30:36.437451 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 711:761(50) ack 1 win 257 15:30:36.437464 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 761 win 257 15:30:36.947022 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 761:811(50) ack 1 win 257 15:30:36.947039 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 811 win 257 15:30:37.447135 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 811:861(50) ack 1 win 257 15:30:37.447203 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 861 win 257 15:30:41.448171 IP 127.0.0.1.12000 > 127.0.0.1.56554: F 1:1(0) ack 861 win 257 15:30:41.448189 IP 127.0.0.1.56554 > 127.0.0.1.12000: R 33789629:33789629(0) win 0 Source of program : /* * small producer/consumer program. * setup a listener on 127.0.0.1:12000 * Forks a child * child connect to 127.0.0.1, and sends 10 bytes on this tcp socket every 100 ms * Father accepts connection, and read all data / #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <unistd.h> #include <stdio.h> #include <time.h> #include <sys/poll.h> int port = 12000; char buffer[4096]; int main(int argc, char argv[]) { int lfd = socket(AF_INET, SOCK_STREAM, 0); struct sockaddr_in socket_address; time_t t0, t1; int on = 1, sfd, res; unsigned long total = 0; socklen_t alen = sizeof(socket_address); pid_t pid; time(&t0); socket_address.sin_family = AF_INET; socket_address.sin_port = htons(port); socket_address.sin_addr.s_addr = htonl(INADDR_LOOPBACK); if (lfd == -1) { perror("socket()"); return 1; } setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(int)); if (bind(lfd, (struct sockaddr )&socket_address, sizeof(socket_address)) == -1) { perror("bind"); close(lfd); return 1; } if (listen(lfd, 1) == -1) { perror("listen()"); close(lfd); return 1; } pid = fork(); if (pid == 0) { int i, cfd = socket(AF_INET, SOCK_STREAM, 0); close(lfd); if (connect(cfd, (struct sockaddr )&socket_address, sizeof(socket_address)) == -1) { perror("connect()"); return 1; } for (i = 0 ; ;) { res = write(cfd, "blablabla\n", 10); if (res > 0) total += res; else if (res == -1) { perror("write()"); break; } else break; usleep(100000); if (++i == 10) { system("ss -on dst 127.0.0.1:12000"); i = 0; } } time(&t1); fprintf(stderr, "wrote %lu bytes but was interrupted after %g seconds\n", total, difftime(t1, t0)); system("ss -on \| grep 127.0.0.1:12000"); close(cfd); return 0; } sfd = accept(lfd, (struct sockaddr *)&socket_address, &alen); if (sfd == -1) { perror("accept"); return 1; } close(lfd); while (1) { struct pollfd pfd[1]; pfd[0].fd = sfd; pfd[0].events = POLLIN; if (poll(pfd, 1, 4000) == 0) { fprintf(stderr, "Exiting read() because no data available (4000 ms timeout).\n"); break; } res = read(sfd, buffer, sizeof(buffer)); if (res > 0) total += res; else if (res == 0) break; else perror("read()"); } fprintf(stderr, "read %lu bytes\n", total); close(sfd); return 0; } ---------------------------------------- Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-23 16:38:45 -07:00
Oliver Hartkopp	b4942af650	net: Update entry in af_family_clock_key_strings In the merge phase of the CAN subsystem the af_family_clock_key_strings[] have been added to sock.c in commit `443aef0edd` (lockdep: fixup sk_callback_lock annotation). This trivial patch adds the missing name for address family 29 (AF_CAN). Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-23 14:06:04 -07:00
David S. Miller	5b3ab1dbd4	netdev: Remove warning from __netif_schedule(). It isn't helping anything and we aren't going to be able to change all the drivers that do queue wakeups in strange situations. Just letting a noop_qdisc get scheduled will work because when qdisc_run() executes via net_tx_work() it will simply find no packets pending when it makes the ->dequeue() call in qdisc_restart. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-23 14:01:29 -07:00
Krzysztof Hałasa	a8817d2f6d	wanmain.c doesn't need syncppp.h Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>	2008-07-23 23:00:36 +02:00
Krzysztof Hałasa	a24e202e3f	Remove dead code from wanmain.c, CONFIG_WANPIPE_MULTPPP doesn't exist Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>	2008-07-23 23:00:36 +02:00
Linus Torvalds	5554b35933	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (24 commits) I/OAT: I/OAT version 3.0 support I/OAT: tcp_dma_copybreak default value dependent on I/OAT version I/OAT: Add watchdog/reset functionality to ioatdma iop_adma: cleanup iop_chan_xor_slot_count iop_adma: document how to calculate the minimum descriptor pool size iop_adma: directly reclaim descriptors on allocation failure async_tx: make async_tx_test_ack a boolean routine async_tx: remove depend_tx from async_tx_sync_epilog async_tx: export async_tx_quiesce async_tx: fix handling of the "out of descriptor" condition in async_xor async_tx: ensure the xor destination buffer remains dma-mapped async_tx: list_for_each_entry_rcu() cleanup dmaengine: Driver for the Synopsys DesignWare DMA controller dmaengine: Add slave DMA interface dmaengine: add DMA_COMPL_SKIP_{SRC,DEST}_UNMAP flags to control dma unmap dmaengine: Add dma_client parameter to device_alloc_chan_resources dmatest: Simple DMA memcpy test client dmaengine: DMA engine driver for Marvell XOR engine iop-adma: fix platform driver hotplug/coldplug dmaengine: track the number of clients using a channel ... Fixed up conflict in drivers/dca/dca-sysfs.c manually	2008-07-23 12:03:18 -07:00
Linus Torvalds	c010b2f76c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (82 commits) ipw2200: Call netif_*_queue() interfaces properly. netxen: Needs to include linux/vmalloc.h [netdrvr] atl1d: fix !CONFIG_PM build r6040: rework init_one error handling r6040: bump release number to 0.18 r6040: handle RX fifo full and no descriptor interrupts r6040: change the default waiting time r6040: use definitions for magic values in descriptor status r6040: completely rework the RX path r6040: call napi_disable when puting down the interface and set lp->dev accordingly. mv643xx_eth: fix NETPOLL build r6040: rework the RX buffers allocation routine r6040: fix scheduling while atomic in r6040_tx_timeout r6040: fix null pointer access and tx timeouts r6040: prefix all functions with r6040 rndis_host: support WM6 devices as modems at91_ether: use netstats in net_device structure sfc: Create one RX queue and interrupt per CPU package by default sfc: Use a separate workqueue for resets sfc: I2C adapter initialisation fixes ...	2008-07-22 19:09:51 -07:00
Maciej Sosnowski	16a37acaaf	I/OAT: tcp_dma_copybreak default value dependent on I/OAT version I/OAT DMA performance tuning showed different optimal values of tcp_dma_copybreak for different I/OAT versions (4096 for 1.2 and 2048 for 2.0). This patch lets ioatdma driver set tcp_dma_copybreak value according to these results. [dan.j.williams@intel.com: remove some ifdefs] Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2008-07-22 17:30:57 -07:00
Stephen Hemminger	3d0f24a74e	ipv6: icmp6_dst_gc return change Change icmp6_dst_gc to return the one value the caller cares about rather than using call by reference. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:35:50 -07:00
Stephen Hemminger	75307c0fe7	ipv6: use kcalloc Th fib_table_hash is an array, so use kcalloc. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:35:07 -07:00
Stephen Hemminger	a76d7345a3	ipv6: use spin_trylock_bh Now there is spin_trylock_bh, use it rather than open coding. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:34:35 -07:00
Stephen Hemminger	c8a4522245	ipv6: use round_jiffies This timer normally happens once a minute, there is no need to cause an early wakeup for it, so align it to next second boundary to safe power. It can't be deferred because then it could take too long on cleanup or DoS. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:34:09 -07:00
Stephen Hemminger	417f28bb34	netns: dont alloc ipv6 fib timer list FIB timer list is a trivial size structure, avoid indirection and just put it in existing ns. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:33:45 -07:00
Adrian Bunk	888c848ed3	ipv6: make struct ipv6_devconf static struct ipv6_devconf can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:21:58 -07:00
Adrian Bunk	4d6971e909	sctp: remove sctp_assoc_proc_exit() Commit `20c2c1fd6c` (sctp: add sctp/remaddr table to complete RFC remote address table OID) added an unused sctp_assoc_proc_exit() function that seems to have been unintentionally created when copying the assocs code. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:21:30 -07:00
Adrian Bunk	abd0b198ea	sctp: make sctp_outq_flush() static sctp_outq_flush() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:20:45 -07:00
Adrian Bunk	a94f779f9d	pkt_sched: make qdisc_class_hash_alloc() static This patch makes the needlessly global qdisc_class_hash_alloc() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:20:11 -07:00
David S. Miller	cf508b1211	netdev: Handle ->addr_list_lock just like ->_xmit_lock for lockdep. The new address list lock needs to handle the same device layering issues that the _xmit_lock one does. This integrates work done by Patrick McHardy. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:16:42 -07:00
Dave Jones	d29f749e25	net: Fix build failure with 'make mandocs'. The function header comments have to go with the functions they are documenting, or things go horribly wrong when we try to process them with the docbook tools. Warning(include/linux/netdevice.h:1006): No description found for parameter 'dev_queue' Warning(include/linux/netdevice.h:1033): No description found for parameter 'dev_queue' Warning(include/linux/netdevice.h:1067): No description found for parameter 'dev_queue' Warning(include/linux/netdevice.h:1093): No description found for parameter 'dev_queue' Warning(include/linux/netdevice.h:1474): No description found for parameter 'txq' Error(net/core/dev.c:1674): cannot understand prototype: 'u32 simple_tx_hashrnd; ' Signed-off-by: Dave Jones <davej@redhat.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:09:06 -07:00
Greg Kroah-Hartman	16be63fd16	bluetooth: remove improper bluetooth class symlinks. Don't create symlinks in a class to a device that is not owned by the class. If the bluetooth subsystem really wants to point to all of the devices it controls, it needs to create real devices, not fake symlinks. Cc: Maxim Krasnyansky <maxk@qualcomm.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-07-21 21:54:51 -07:00
David S. Miller	b32d13102d	tcp: Fix bitmask test in tcp_syn_options() As reported by Alexey Dobriyan: CHECK net/ipv4/tcp_output.c net/ipv4/tcp_output.c:475:7: warning: dubious: !x & y And sparse is damn right! if (unlikely(!OPTION_TS & opts->options)) ^^^ size += TCPOLEN_SACKPERM_ALIGNED; OPTION_TS is (1 << 1), so condition will never trigger. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 18:45:34 -07:00
Gerrit Renker	47112e25da	udplite: Protection against coverage value wrap-around This patch clamps the cscov setsockopt values to a maximum of 0xFFFF. Setsockopt values greater than 0xffff can cause an unwanted wrap-around. Further, IPv6 jumbograms are not supported (RFC 3838, 3.5), so that values greater than 0xffff are not even useful. Further changes: fixed a typo in the documentation. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 13:35:08 -07:00
Arjan van de Ven	6579e57b31	net: Print the module name as part of the watchdog message As suggested by Dave: This patch adds a function to get the driver name from a struct net_device, and consequently uses this in the watchdog timeout handler to print as part of the message. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 13:31:48 -07:00
Stephen Hemminger	7943986ca1	net: use kcalloc in netdev_queue alloc Minor nit, use size_t for allocation size and kcalloc to allocate an array. Probably makes no actual code difference. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 13:28:44 -07:00
Stephen Hemminger	847499ce71	ipv6: use timer pending This fixes the bridge reference count problem and cleanups ipv6 FIB timer management. Don't use expires field, because it is not a proper way to test, instead use timer_pending(). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 13:21:35 -07:00
Patrick McHardy	5547cd0ae8	netfilter: nf_conntrack_sctp: fix sparse warnings Introduced by `a258860e` (netfilter: ctnetlink: add full support for SCTP to ctnetlink): net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: incorrect type in argument 1 (different base types) net/netfilter/nf_conntrack_proto_sctp.c:483:2: expected unsigned int [unsigned] [usertype] x net/netfilter/nf_conntrack_proto_sctp.c:483:2: got restricted unsigned int const <noident> net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: incorrect type in argument 1 (different base types) net/netfilter/nf_conntrack_proto_sctp.c:487:2: expected unsigned int [unsigned] [usertype] x net/netfilter/nf_conntrack_proto_sctp.c:487:2: got restricted unsigned int const <noident> net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type net/netfilter/nf_conntrack_proto_sctp.c:532:42: warning: incorrect type in assignment (different base types) net/netfilter/nf_conntrack_proto_sctp.c:532:42: expected restricted unsigned int <noident> net/netfilter/nf_conntrack_proto_sctp.c:532:42: got unsigned int net/netfilter/nf_conntrack_proto_sctp.c:534:39: warning: incorrect type in assignment (different base types) net/netfilter/nf_conntrack_proto_sctp.c:534:39: expected restricted unsigned int <noident> net/netfilter/nf_conntrack_proto_sctp.c:534:39: got unsigned int Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:11:02 -07:00
Herbert Xu	c71529e42c	netfilter: nf_nat_sip: c= is optional for session According to RFC2327, the connection information is optional in the session description since it can be specified in the media description instead. My provider does exactly that and does not provide any connection information in the session description. As a result the new kernel drops all invite responses. This patch makes it optional as documented. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:11:02 -07:00
Jan Engelhardt	db1a75bdcc	netfilter: xt_TCPMSS: collapse tcpmss_reverse_mtu{4,6} into one function Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:11:01 -07:00
Eric Leblond	72961ecf84	netfilter: nfnetlink_log: send complete hardware header This patch adds some fields to NFLOG to be able to send the complete hardware header with all necessary informations. It sends to userspace: * the type of hardware link * the lenght of hardware header * the hardware header Signed-off-by: Eric Leblond <eric@inl.fr> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:11:00 -07:00
David Howells	280763c053	netfilter: xt_time: fix time's time_mt()'s use of do_div() Fix netfilter xt_time's time_mt()'s use of do_div() on an s64 by using div_s64() instead. This was introduced by patch `ee4411a1b1` ("[NETFILTER]: x_tables: add xt_time match"). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:10:59 -07:00
Krzysztof Piotr Oledzki	584015727a	netfilter: accounting rework: ct_extend + 64bit counters (v4) Initially netfilter has had 64bit counters for conntrack-based accounting, but it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit counters are still required, for example for "connbytes" extension. However, 64bit counters waste a lot of memory and it was not possible to enable/disable it runtime. This patch: - reimplements accounting with respect to the extension infrastructure, - makes one global version of seq_print_acct() instead of two seq_print_counters(), - makes it possible to enable it at boot time (for CONFIG_SYSCTL/CONFIG_SYSFS=n), - makes it possible to enable/disable it at runtime by sysctl or sysfs, - extends counters from 32bit to 64bit, - renames ip_conntrack_counter -> nf_conn_counter, - enables accounting code unconditionally (no longer depends on CONFIG_NF_CT_ACCT), - set initial accounting enable state based on CONFIG_NF_CT_ACCT - removes buggy IPCT_COUNTER_FILLING event handling. If accounting is enabled newly created connections get additional acct extend. Old connections are not changed as it is not possible to add a ct_extend area to confirmed conntrack. Accounting is performed for all connections with acct extend regardless of a current state of "net.netfilter.nf_conntrack_acct". Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:10:58 -07:00
Changli Gao	0dbff689c2	netfilter: nf_nat_core: eliminate useless find_appropriate_src for IP_NAT_RANGE_PROTO_RANDOM Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:10:57 -07:00
David S. Miller	d3678b463d	Revert "pkt_sched: Make default qdisc nonshared-multiqueue safe." This reverts commit `a0c80b80e0`. After discussions with Jamal and Herbert on netdev, we should provide at least minimal prioritization at the qdisc level even in multiqueue situations. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:10:50 -07:00
Linus Torvalds	867d79fb9a	net: In __netif_schedule() use WARN_ON instead of BUG_ON Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:10:49 -07:00
David S. Miller	b6b2fed1f4	net: Improve simple_tx_hash(). Based upon feedback from Eric Dumazet and Andi Kleen. Cure several deficiencies in simple_tx_hash() by using jhash + reciprocol multiply. 1) Eliminates expensive modulus operation. 2) Makes hash less attackable by using random seed. 3) Eliminates endianness hash distribution issues. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:10:48 -07:00
Daniel Lezcano	c3ee84163e	pkt_sched: Remove unused variable skb in dev_deactivate_queue function. Removed unused variable 'skb' in the dev_deactivate_queue function Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 09:18:07 -07:00
Ingo Molnar	eb6a12c242	Merge branch 'linus' into cpus4096-for-linus Conflicts: net/sunrpc/svc.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-21 17:19:50 +02:00
Linus Torvalds	14b395e35d	Merge branch 'for-2.6.27' of git://linux-nfs.org/~bfields/linux * 'for-2.6.27' of git://linux-nfs.org/~bfields/linux: (51 commits) nfsd: nfs4xdr.c do-while is not a compound statement nfsd: Use C99 initializers in fs/nfsd/nfs4xdr.c lockd: Pass "struct sockaddr *" to new failover-by-IP function lockd: get host reference in nlmsvc_create_block() instead of callers lockd: minor svclock.c style fixes lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_lock lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_testlock lockd: nlm_release_host() checks for NULL, caller needn't file lock: reorder struct file_lock to save space on 64 bit builds nfsd: take file and mnt write in nfs4_upgrade_open nfsd: document open share bit tracking nfsd: tabulate nfs4 xdr encoding functions nfsd: dprint operation names svcrdma: Change WR context get/put to use the kmem cache svcrdma: Create a kmem cache for the WR contexts svcrdma: Add flush_scheduled_work to module exit function svcrdma: Limit ORD based on client's advertised IRD svcrdma: Remove unused wait q from svcrdma_xprt structure svcrdma: Remove unneeded spin locks from __svc_rdma_free svcrdma: Add dma map count and WARN_ON ...	2008-07-20 21:21:46 -07:00
David Miller	702beb87d6	ipv6: Fix warning in addrconf code. Reported by Linus. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-20 21:18:26 -07:00
Linus Torvalds	d1671a9c15	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: pkt_sched: Fix build with NET_SCHED disabled.	2008-07-20 21:17:20 -07:00
David S. Miller	3a682fbd73	pkt_sched: Fix build with NET_SCHED disabled. The stab bits can't be referenced uniless the full packet scheduler layer is enabled. Reported by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-20 18:13:01 -07:00
Linus Torvalds	db6d8c7a40	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (1232 commits) iucv: Fix bad merging. net_sched: Add size table for qdiscs net_sched: Add accessor function for packet length for qdiscs net_sched: Add qdisc_enqueue wrapper highmem: Export totalhigh_pages. ipv6 mcast: Omit redundant address family checks in ip6_mc_source(). net: Use standard structures for generic socket address structures. ipv6 netns: Make several "global" sysctl variables namespace aware. netns: Use net_eq() to compare net-namespaces for optimization. ipv6: remove unused macros from net/ipv6.h ipv6: remove unused parameter from ip6_ra_control tcp: fix kernel panic with listening_get_next tcp: Remove redundant checks when setting eff_sacks tcp: options clean up tcp: Fix MD5 signatures for non-linear skbs sctp: Update sctp global memory limit allocations. sctp: remove unnecessary byteshifting, calculate directly in big-endian sctp: Allow only 1 listening socket with SO_REUSEADDR sctp: Do not leak memory on multiple listen() calls sctp: Support ipv6only AF_INET6 sockets. ...	2008-07-20 17:43:29 -07:00
Alan Cox	a352def21a	tty: Ldisc revamp Move the line disciplines towards a conventional ->ops arrangement. For the moment the actual 'tty_ldisc' struct in the tty is kept as part of the tty struct but this can then be changed if it turns out that when it all settles down we want to refcount ldiscs separately to the tty. Pull the ldisc code out of /proc and put it with our ldisc code. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-20 17:12:34 -07:00
David S. Miller	fb65a7c091	iucv: Fix bad merging. Noticed by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-20 10:18:44 -07:00
Jussi Kivilinna	175f9c1bba	net_sched: Add size table for qdiscs Add size table functions for qdiscs and calculate packet size in qdisc_enqueue(). Based on patch by Patrick McHardy http://marc.info/?l=linux-netdev&m=115201979221729&w=2 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-20 00:08:47 -07:00
Jussi Kivilinna	0abf77e55a	net_sched: Add accessor function for packet length for qdiscs Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-20 00:08:27 -07:00
Jussi Kivilinna	5f86173bdf	net_sched: Add qdisc_enqueue wrapper Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-20 00:08:04 -07:00
YOSHIFUJI Hideaki	a6ffb404dc	ipv6 mcast: Omit redundant address family checks in ip6_mc_source(). The caller has alredy checked for them. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 22:36:07 -07:00
YOSHIFUJI Hideaki	230b183921	net: Use standard structures for generic socket address structures. Use sockaddr_storage{} for generic socket address storage and ensures proper alignment. Use sockaddr{} for pointers to omit several casts. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 22:35:47 -07:00
YOSHIFUJI Hideaki	53b7997fd5	ipv6 netns: Make several "global" sysctl variables namespace aware. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 22:35:03 -07:00
YOSHIFUJI Hideaki	721499e893	netns: Use net_eq() to compare net-namespaces for optimization. Without CONFIG_NET_NS, namespace is always &init_net. Compiler will be able to omit namespace comparisons with this patch. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 22:34:43 -07:00
David S. Miller	407d819cf0	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6	2008-07-19 00:30:39 -07:00
Denis V. Lunev	725a8ff04a	ipv6: remove unused parameter from ip6_ra_control Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 00:28:58 -07:00
Daniel Lezcano	bdccc4ca13	tcp: fix kernel panic with listening_get_next # BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 IP: [<ffffffff821ed01e>] listening_get_next+0x50/0x1b3 PGD 11e4b9067 PUD 11d16c067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map CPU 3 Modules linked in: bridge ipv6 button battery ac loop dm_mod tg3 ext3 jbd edd fan thermal processor thermal_sys hwmon sg sata_svw libata dock serverworks sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table] Pid: 3368, comm: slpd Not tainted 2.6.26-rc2-mm1-lxc4 #1 RIP: 0010:[<ffffffff821ed01e>] [<ffffffff821ed01e>] listening_get_next+0x50/0x1b3 RSP: 0018:ffff81011e1fbe18 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8100be0ad3c0 RCX: ffff8100619f50c0 RDX: ffffffff82475be0 RSI: ffff81011d9ae6c0 RDI: ffff8100be0ad508 RBP: ffff81011f4f1240 R08: 00000000ffffffff R09: ffff8101185b6780 R10: 000000000000002d R11: ffffffff820fdbfa R12: ffff8100be0ad3c8 R13: ffff8100be0ad6a0 R14: ffff8100be0ad3c0 R15: ffffffff825b8ce0 FS: 00007f6a0ebd16d0(0000) GS:ffff81011f424540(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000038 CR3: 000000011dc20000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process slpd (pid: 3368, threadinfo ffff81011e1fa000, task ffff81011f4b8660) Stack: 00000000000002ee ffff81011f5a57c0 ffff81011f4f1240 ffff81011e1fbe90 0000000000001000 0000000000000000 00007fff16bf2590 ffffffff821ed9c8 ffff81011f5a57c0 ffff81011d9ae6c0 000000000000041a ffffffff820b0abd Call Trace: [<ffffffff821ed9c8>] ? tcp_seq_next+0x34/0x7e [<ffffffff820b0abd>] ? seq_read+0x1aa/0x29d [<ffffffff820d21b4>] ? proc_reg_read+0x73/0x8e [<ffffffff8209769c>] ? vfs_read+0xaa/0x152 [<ffffffff82097a7d>] ? sys_read+0x45/0x6e [<ffffffff8200bd2b>] ? system_call_after_swapgs+0x7b/0x80 Code: 31 a9 25 00 e9 b5 00 00 00 ff 45 20 83 7d 0c 01 75 79 4c 8b 75 10 48 8b 0e eb 1d 48 8b 51 20 0f b7 45 08 39 02 75 0e 48 8b 41 28 <4c> 39 78 38 0f 84 93 00 00 00 48 8b 09 48 85 c9 75 de 8b 55 1c RIP [<ffffffff821ed01e>] listening_get_next+0x50/0x1b3 RSP <ffff81011e1fbe18> CR2: 0000000000000038 This kernel panic appears with CONFIG_NET_NS=y. How to reproduce ? On the buggy host (host A) * ip addr add 1.2.3.4/24 dev eth0 On a remote host (host B) * ip addr add 1.2.3.5/24 dev eth0 * iptables -A INPUT -p tcp -s 1.2.3.4 -j DROP * ssh 1.2.3.4 On host A: * netstat -ta or cat /proc/net/tcp This bug happens when reading /proc/net/tcp[6] when there is a req_sock at the SYN_RECV state. When a SYN is received the minisock is created and the sk field is set to NULL. In the listening_get_next function, we try to look at the field req->sk->sk_net. When looking at how to fix this bug, I noticed that is useless to do the check for the minisock belonging to the namespace. A minisock belongs to a listen point and this one is per namespace, so when browsing the minisock they are always per namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 00:15:13 -07:00
Adam Langley	4389dded77	tcp: Remove redundant checks when setting eff_sacks Remove redundant checks when setting eff_sacks and make the number of SACKs a compile time constant. Now that the options code knows how many SACK blocks can fit in the header, we don't need to have the SACK code guessing at it. Signed-off-by: Adam Langley <agl@imperialviolet.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 00:07:02 -07:00
Adam Langley	33ad798c92	tcp: options clean up This should fix the following bugs: * Connections with MD5 signatures produce invalid packets whenever SACK options are included * MD5 signatures are counted twice in the MSS calculations Behaviour changes: * A SYN with MD5 + SACK + TS elicits a SYNACK with MD5 + SACK This is because we can't fit any SACK blocks in a packet with MD5 + TS options. There was discussion about disabling SACK rather than TS in order to fit in better with old, buggy kernels, but that was deemed to be unnecessary. * SYNs with MD5 don't include a TS option See above. Additionally, it removes a bunch of duplicated logic for calculating options, which should help avoid these sort of issues in the future. Signed-off-by: Adam Langley <agl@imperialviolet.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 00:04:31 -07:00
Adam Langley	49a72dfb88	tcp: Fix MD5 signatures for non-linear skbs Currently, the MD5 code assumes that the SKBs are linear and, in the case that they aren't, happily goes off and hashes off the end of the SKB and into random memory. Reported by Stephen Hemminger in [1]. Advice thanks to Stephen and Evgeniy Polyakov. Also includes a couple of missed route_caps from Stephen's patch in [2]. [1] http://marc.info/?l=linux-netdev&m=121445989106145&w=2 [2] http://marc.info/?l=linux-netdev&m=121459157816964&w=2 Signed-off-by: Adam Langley <agl@imperialviolet.org> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-19 00:01:42 -07:00
Vlad Yasevich	845525a642	sctp: Update sctp global memory limit allocations. Update sctp global memory limit allocations to be the same as TCP. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:08:21 -07:00
Harvey Harrison	336d3262df	sctp: remove unnecessary byteshifting, calculate directly in big-endian Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:07:09 -07:00
Vlad Yasevich	4e54064e0a	sctp: Allow only 1 listening socket with SO_REUSEADDR When multiple socket bind to the same port with SO_REUSEADDR, only 1 can be listining. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:06:32 -07:00
Vlad Yasevich	23b29ed80b	sctp: Do not leak memory on multiple listen() calls SCTP permits multiple listen call and on subsequent calls we leak he memory allocated for the crypto transforms. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:06:07 -07:00
Vlad Yasevich	7dab83de50	sctp: Support ipv6only AF_INET6 sockets. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:05:40 -07:00
Florian Westphal	6d0ccbac68	sctp: Prevent uninitialized memory access valgrind reports uninizialized memory accesses when running sctp inside the network simulation cradle simulator: Conditional jump or move depends on uninitialised value(s) at 0x570E34A: sctp_assoc_sync_pmtu (associola.c:1324) by 0x57427DA: sctp_packet_transmit (output.c:403) by 0x5710EFF: sctp_outq_flush (outqueue.c:824) by 0x5710B88: sctp_outq_uncork (outqueue.c:701) by 0x5745262: sctp_cmd_interpreter (sm_sideeffect.c:1548) by 0x57444B7: sctp_side_effects (sm_sideeffect.c:976) by 0x5744460: sctp_do_sm (sm_sideeffect.c:945) by 0x572157D: sctp_primitive_ASSOCIATE (primitive.c:94) by 0x5725C04: __sctp_connect (socket.c:1094) by 0x57297DC: sctp_connect (socket.c:3297) Conditional jump or move depends on uninitialised value(s) at 0x575D3A5: mod_timer (timer.c:630) by 0x5752B78: sctp_cmd_hb_timers_start (sm_sideeffect.c:555) by 0x5754133: sctp_cmd_interpreter (sm_sideeffect.c:1448) by 0x5753607: sctp_side_effects (sm_sideeffect.c:976) by 0x57535B0: sctp_do_sm (sm_sideeffect.c:945) by 0x571E9AE: sctp_endpoint_bh_rcv (endpointola.c:474) by 0x573347F: sctp_inq_push (inqueue.c:104) by 0x572EF93: sctp_rcv (input.c:256) by 0x5689623: ip_local_deliver_finish (ip_input.c:230) by 0x5689759: ip_local_deliver (ip_input.c:268) by 0x5689CAC: ip_rcv_finish (dst.h:246) #1 is due to "if (t->pmtu_pending)". `8a4794914f` "[SCTP] Flag a pmtu change request" suggests it should be initialized to 0. #2 is the heartbeat timer 'expires' value, which is uninizialised, but test by mod_timer(). T3_rtx_timer seems to be affected by the same problem, so initialize it, too. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:04:39 -07:00
Florian Westphal	c4e85f82ed	sctp: Don't abort initialization when CONFIG_PROC_FS=n This puts CONFIG_PROC_FS defines around the proc init/exit functions and also avoids compiling proc.c if procfs is not supported. Also make SCTP_DBG_OBJCNT depend on procfs. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:03:44 -07:00
Stephen Hemminger	c1e20f7c8b	tcp: RTT metrics scaling Some of the metrics (RTT, RTTVAR and RTAX_RTO_MIN) are stored in kernel units (jiffies) and this leaks out through the netlink API to user space where the units for jiffies are unknown. This patches changes the kernel to convert to/from milliseconds. This changes the ABI, but milliseconds seemed like the most natural unit for these parameters. Values available via syscall in /proc/net/rt_cache and netlink will be in milliseconds. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:02:15 -07:00
David S. Miller	30ee42be00	pkt_sched: Fix noqueue_qdisc initialization. Like noop_qdisc, it needs a dummy backpointer and explicit qdisc->q.lock initialization. Based upon a report by Stephen Hemminger. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:00:11 -07:00
David S. Miller	3072367300	pkt_sched: Manage qdisc list inside of root qdisc. Idea is from Patrick McHardy. Instead of managing the list of qdiscs on the device level, manage it in the root qdisc of a netdev_queue. This solves all kinds of visibility issues during qdisc destruction. The way to iterate over all qdiscs of a netdev_queue is to visit the netdev_queue->qdisc, and then traverse it's list. The only special case is to ignore builting qdiscs at the root when dumping or doing a qdisc_lookup(). That was not needed previously because builtin qdiscs were not added to the device's qdisc_list. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 22:50:15 -07:00
David S. Miller	72b25a913e	pkt_sched: Get rid of u32_list. The u32_list is just an indirect way of maintaining a reference to a U32 node on a per-qdisc basis. Just add an explicit node pointer for u32 to struct Qdisc an do away with this global list. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 20:54:17 -07:00
Patrick McHardy	8913336a7e	packet: add PACKET_RESERVE sockopt Add new sockopt to reserve some headroom in the mmaped ring frames in front of the packet payload. This can be used f.i. when the VLAN header needs to be (re)constructed to avoid moving the entire payload. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 18:05:19 -07:00
Mike Travis	65c0118453	cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr * This patch replaces the dangerous lvalue version of cpumask_of_cpu with new cpumask_of_cpu_ptr macros. These are patterned after the node_to_cpumask_ptr macros. In general terms, if there is a cpumask_of_cpu_map[] then a pointer to the cpumask_of_cpu_map[cpu] entry is used. The cpumask_of_cpu_map is provided when there is a large NR_CPUS count, reducing greatly the amount of code generated and stack space used for cpumask_of_cpu(). The pointer to the cpumask_t value is needed for calling set_cpus_allowed_ptr() to reduce the amount of stack space needed to pass the cpumask_t value. If there isn't a cpumask_of_cpu_map[], then a temporary variable is declared and filled in with value from cpumask_of_cpu(cpu) as well as a pointer variable pointing to this temporary variable. Afterwards, the pointer is used to reference the cpumask value. The compiler will optimize out the extra dereference through the pointer as well as the stack space used for the pointer, resulting in identical code. A good example of the orthogonal usages is in net/sunrpc/svc.c: case SVC_POOL_PERCPU: { unsigned int cpu = m->pool_to[pidx]; cpumask_of_cpu_ptr(cpumask, cpu); oldmask = current->cpus_allowed; set_cpus_allowed_ptr(current, cpumask); return 1; } case SVC_POOL_PERNODE: { unsigned int node = m->pool_to[pidx]; node_to_cpumask_ptr(nodecpumask, node); oldmask = current->cpus_allowed; set_cpus_allowed_ptr(current, nodecpumask); return 1; } Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 22:02:57 +02:00
Ingo Molnar	bb2c018b09	Merge branch 'linus' into cpus4096 Conflicts: drivers/acpi/processor_throttling.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-18 22:00:54 +02:00
Pavel Emelyanov	b6fcbdb4f2	proc: consolidate per-net single-release callers They are symmetrical to single_open ones :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:07:44 -07:00
Pavel Emelyanov	de05c557b2	proc: consolidate per-net single_open callers There are already 7 of them - time to kill some duplicate code. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:07:21 -07:00
Pavel Emelyanov	60bdde9580	proc: clean the ip_misc_proc_init and ip_proc_init_net error paths After all this stuff is moved outside, this function can look better. Besides, I tuned the error path in ip_proc_init_net to make it have only 2 exit points, not 3. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:06:50 -07:00
Pavel Emelyanov	8e3461d01b	proc: show per-net ip_devconf.forwarding in /proc/net/snmp This one has become per-net long ago, but the appropriate file is per-net only now. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:06:26 -07:00
Pavel Emelyanov	229bf0cbaa	proc: create /proc/net/snmp file in each net All the statistics shown in this file have been made per-net already. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:06:04 -07:00
Pavel Emelyanov	7b7a9dfdf6	proc: create /proc/net/netstat file in each net Now all the shown in it statistics is netnsizated, time to show it in appropriate net. The appropriate net init/exit ops already exist - they make the sockstat file per net - so just extend them. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:05:17 -07:00
Pavel Emelyanov	d89cbbb1e6	ipv4: clean the init_ipv4_mibs error paths After moving all the stuff outside this function it looks a bit ugly - make it look better. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:04:51 -07:00
Pavel Emelyanov	923c6586b0	mib: put icmpmsg statistics on struct net Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:04:22 -07:00
Pavel Emelyanov	b60538a0d7	mib: put icmp statistics on struct net Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:04:02 -07:00
Pavel Emelyanov	386019d351	mib: put udplite statistics on struct net Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:03:45 -07:00
Pavel Emelyanov	2f275f91a4	mib: put udp statistics on struct net Similar to... ouch, I repeat myself. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:03:27 -07:00
Pavel Emelyanov	61a7e26028	mib: put net statistics on struct net Similar to ip and tcp ones :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:03:08 -07:00
Pavel Emelyanov	a20f5799ca	mib: put ip statistics on struct net Similar to tcp one. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:02:42 -07:00
Pavel Emelyanov	57ef42d59d	mib: put tcp statistics on struct net Proc temporary uses stats from init_net. BTW, TCP_XXX_STATS are beautiful (w/o do { } while (0) facing) again :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:02:08 -07:00
Pavel Emelyanov	9b4661bd6e	ipv4: add pernet mib operations These ones are currently empty, but stuff from init_ipv4_mibs will sequentially migrate there. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 04:01:44 -07:00
David S. Miller	49997d7515	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: Documentation/powerpc/booting-without-of.txt drivers/atm/Makefile drivers/net/fs_enet/fs_enet-main.c drivers/pci/pci-acpi.c net/8021q/vlan.c net/iucv/iucv.c	2008-07-18 02:39:39 -07:00
David S. Miller	a0c80b80e0	pkt_sched: Make default qdisc nonshared-multiqueue safe. Instead of 'pfifo_fast' we have just plain 'fifo_fast'. No priority queues, just a straight FIFO. This is necessary in order to legally have a seperate qdisc per queue in multi-TX-queue setups, and thus get full parallelization. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:33 -07:00
David S. Miller	99194cff39	pkt_sched: Add multiqueue handling to qdisc_graft(). Move the destruction of the old queue into qdisc_graft(). When operating on a root qdisc (ie. "parent == NULL"), apply the operation to all queues. The caller has grabbed a single implicit reference for this graft, therefore when we apply the change to more than one queue we must grab additional qdisc references. Otherwise, we are operating on a class of a specific parent qdisc, and therefore no multiqueue handling is necessary. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:30 -07:00
David S. Miller	8387400092	pkt_sched: Kill netdev_queue lock. We can simply use the qdisc->q.lock for all of the qdisc tree synchronization. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:30 -07:00
David S. Miller	c7e4f3bbb4	pkt_sched: Kill qdisc_lock_tree and qdisc_unlock_tree. No longer used. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:29 -07:00
David S. Miller	53049978df	pkt_sched: Make qdisc grafting locking more specific. Lock the root of the qdisc being operated upon. All explicit references to qdisc_tree_lock() are now gone. The only remaining uses are via the sch_tree_{lock,unlock}() and tcf_tree_{lock,unlock}() macros. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:27 -07:00
David S. Miller	ead81cc5fc	netdevice: Move qdisc_list back into net_device proper. And give it it's own lock. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:26 -07:00
David S. Miller	15b458fa65	pkt_sched: Kill qdisc_lock_tree usage in cls_route.c It just wants the qdisc tree to be synchronized, so grabbing qdisc_root_lock() is sufficient. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:25 -07:00
David S. Miller	55dbc640c3	pkt_sched: Remove qdisc_lock_tree usage in cls_api.c It just wants the qdisc tree for the filter to be synchronized. So just BH lock qdisc_root_lock(q) instead. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:24 -07:00
David S. Miller	17715e62a5	pkt_sched: Use per-queue locking in shutdown_scheduler_queue. This eliminates another qdisc_lock_tree user. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:23 -07:00
David S. Miller	8a34c5dc3a	pkt_sched: Perform bulk of qdisc destruction in RCU. This allows less strict control of access to the qdisc attached to a netdev_queue. It is even allowed to enqueue into a qdisc which is in the process of being destroyed. The RCU handler will toss out those packets. We will need this to handle sharing of a qdisc amongst multiple TX queues. In such a setup the lock has to be shared, so will be inside of the qdisc itself. At which point the netdev_queue lock cannot be used to hard synchronize access to the ->qdisc pointer. One operation we have to keep inside of qdisc_destroy() is the list deletion. It is the only piece of state visible after the RCU quiesce period, so we have to undo it early and under the appropriate locking. The operations in the RCU handler do not need any looking because the qdisc tree is no longer visible to anything at that point. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:22 -07:00
David S. Miller	16361127eb	pkt_sched: dev_init_scheduler() does not need to lock qdisc tree. We are registering the device, there is no way anyone can get at this object's qdiscs yet in any meaningful way. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:21 -07:00
David S. Miller	37437bb2e1	pkt_sched: Schedule qdiscs instead of netdev_queue. When we have shared qdiscs, packets come out of the qdiscs for multiple transmit queues. Therefore it doesn't make any sense to schedule the transmit queue when logically we cannot know ahead of time the TX queue of the SKB that the qdisc->dequeue() will give us. Just for sanity I added a BUG check to make sure we never get into a state where the noop_qdisc is scheduled. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:20 -07:00
David S. Miller	7698b4fcab	pkt_sched: Add and use qdisc_root() and qdisc_root_lock(). When code wants to lock the qdisc tree state, the logic operation it's doing is locking the top-level qdisc that sits of the root of the netdev_queue. Add qdisc_root_lock() to represent this and convert the easiest cases. In order for this to work out in all cases, we have to hook up the noop_qdisc to a dummy netdev_queue. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:19 -07:00
David S. Miller	e2627c8c22	pkt_sched: Make QDISC_RUNNING a qdisc state. Currently it is associated with a netdev_queue, but when we have qdisc sharing that no longer makes any sense. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:18 -07:00
David S. Miller	d3b753db7c	pkt_sched: Move gso_skb into Qdisc. We liberate any dangling gso_skb during qdisc destruction. It really only matters for the root qdisc. But when qdiscs can be shared by multiple netdev_queue objects, we can't have the gso_skb in the netdev_queue any more. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:18 -07:00
David S. Miller	8f0f2223cc	net: Implement simple sw TX hashing. It just xor hashes over IPv4/IPv6 addresses and ports of transport. The only assumption it makes is that skb_network_header() is set correctly. With bug fixes from Eric Dumazet. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:13 -07:00
David S. Miller	51cb6db0f5	mac80211: Reimplement WME using ->select_queue(). The only behavior change is that we do not drop packets under any circumstances. If that is absolutely needed, we could easily add it back. With cleanups and help from Johannes Berg. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:12 -07:00
David S. Miller	eae792b722	netdev: Add netdev->select_queue() method. Devices or device layers can set this to control the queue selection performed by dev_pick_tx(). This function runs under RCU protection, which allows overriding functions to have some way of synchronizing with things like dynamic ->real_num_tx_queues adjustments. This makes the spinlock prefetch in dev_queue_xmit() a little bit less effective, but that's the price right now for correctness. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:10 -07:00
David S. Miller	fd2ea0a79f	net: Use queue aware tests throughout. This effectively "flips the switch" by making the core networking and multiqueue-aware drivers use the new TX multiqueue structures. Non-multiqueue drivers need no changes. The interfaces they use such as netif_stop_queue() degenerate into an operation on TX queue zero. So everything "just works" for them. Code that really wants to do "X" to all TX queues now invokes a routine that does so, such as netif_tx_wake_all_queues(), netif_tx_stop_all_queues(), etc. pktgen and netpoll required a little bit more surgery than the others. In particular the pktgen changes, whilst functional, could be largely improved. The initial check in pktgen_xmit() will sometimes check the wrong queue, which is mostly harmless. The thing to do is probably to invoke fill_packet() earlier. The bulk of the netpoll changes is to make the code operate solely on the TX queue indicated by by the SKB queue mapping. Setting of the SKB queue mapping is entirely confined inside of net/core/dev.c:dev_pick_tx(). If we end up needing any kind of special semantics (drops, for example) it will be implemented here. Finally, we now have a "real_num_tx_queues" which is where the driver indicates how many TX queues are actually active. With IGB changes from Jeff Kirsher. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:07 -07:00
David S. Miller	24344d2600	mac80211: Temporarily mark QoS support BROKEN. We will undo this after a few changsets. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:05 -07:00
David S. Miller	1d8ae3fdeb	pkt_sched: Remove RR scheduler. This actually fixes a bug added by the RR scheduler changes. The ->bands and ->prio2band parameters were being set outside of the sch_tree_lock() and thus could result in strange behavior and inconsistencies. It might be possible, in the new design (where there will be one qdisc per device TX queue) to allow similar functionality via a TX hash algorithm for RR but I really see no reason to export this aspect of how these multiqueue cards actually implement the scheduling of the the individual DMA TX rings and the single physical MAC/PHY port. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:04 -07:00
David S. Miller	09e83b5d7d	netdev: Kill NETIF_F_MULTI_QUEUE. There is no need for a feature bit for something that can be tested by simply checking the TX queue count. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:03 -07:00
David S. Miller	e8a0464cc9	netdev: Allocate multiple queues for TX. alloc_netdev_mq() now allocates an array of netdev_queue structures for TX, based upon the queue_count argument. Furthermore, all accesses to the TX queues are now vectored through the netdev_get_tx_queue() and netdev_for_each_tx_queue() interfaces. This makes it easy to grep the tree for all things that want to get to a TX queue of a net device. Problem spots which are not really multiqueue aware yet, and only work with one queue, can easily be spotted by grepping for all netdev_get_tx_queue() calls that pass in a zero index. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:00 -07:00
Patrick McHardy	51ce7ec921	garp: retry sending JoinIn messages after allocation failures Increase reliability by retrying to send JoinIn messages after memory allocation failures on each TRANSMIT_PDU event until it succeeds. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:51:47 -07:00
Neil Horman	9a6d276e85	core: add stat to track unresolved discards in neighbor cache in __neigh_event_send, if we have a neighbour entry which is in NUD_INCOMPLETE state, we enqueue any outbound frames to that neighbour to the neighbours arp_queue, which is default capped to a length of 3 skbs. If that queue exceeds its set length, it will drop an skb on the queue to enqueue the newly arrived skb. This results in a drop for which we have no statistics incremented. This patch adds an unresolved_discards stat to /proc/net/stat/ndisc_cache to track these lost frames. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:50:49 -07:00
Pavel Emelyanov	ed88098e25	mib: add net to NET_ADD_STATS_USER Done with NET_XXX_STATS macros :) To be continued... Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:32:45 -07:00
Pavel Emelyanov	f2bf415cfe	mib: add net to NET_ADD_STATS_BH This one is tricky. The thing is that this macro is only used when killing tw buckets, but since this killer is promiscuous wrt to which net each particular tw belongs to, I have to use it only when NET_NS is off. When the net namespaces are on, I use the INET_INC_STATS_BH for each bucket. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:32:25 -07:00
Pavel Emelyanov	6f67c817fc	mib: add net to NET_INC_STATS_USER Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:31:39 -07:00
Pavel Emelyanov	de0744af1f	mib: add net to NET_INC_STATS_BH Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:31:16 -07:00
Pavel Emelyanov	4e6734447d	mib: add net to NET_INC_STATS Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:30:14 -07:00
Pavel Emelyanov	1ed834655a	tcp: replace tcp_sock argument with sock in some places These places have a tcp_sock, but we'd prefer the sock itself to get net from it. Fortunately, tcp_sk macro is just a type cast, so this replace is really cheap. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:29:51 -07:00
Pavel Emelyanov	ca12a1a443	inet: prepare net on the stack for NET accounting macros Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:28:42 -07:00
Pavel Emelyanov	5c52ba170f	sock: add net to prot->enter_memory_pressure callback The tcp_enter_memory_pressure calls NET_INC_STATS, but doesn't have where to get the net from. I decided to add a sk argument, not the net itself, only to factor all the required sock_net(sk) calls inside the enter_memory_pressure callback itself. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:28:10 -07:00
Pavel Emelyanov	74688e487a	mib: add net to TCP_DEC_STATS Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:22:46 -07:00
Pavel Emelyanov	63231bddf6	mib: add net to TCP_INC_STATS_BH Same as before - the sock is always there to get the net from, but there are also some places with the net already saved on the stack. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:22:25 -07:00
Pavel Emelyanov	81cc8a75d9	mib: add net to TCP_INC_STATS Fortunately (almost) all the TCP code has a sock to get the net from :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:22:04 -07:00
Pavel Emelyanov	a9c19329ec	tcp: add net to tcp_mib_init This one sets TCP MIBs after zeroing them, and thus requires the net. The existing single caller can use init_net (temporarily). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:21:42 -07:00
Pavel Emelyanov	a86b1e3019	inet: prepare struct net for TCP MIB accounting This is the same as the first patch in the set, but preparing the net for TCP_XXX_STATS - save the struct net on the stack where required and possible. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:20:58 -07:00
Pavel Emelyanov	c5346fe396	mib: add net to IP_ADD_STATS_BH Very simple - only ip_evictor (fragments) requires such. This patch ends up the IP_XXX_STATS patching. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:20:33 -07:00
Pavel Emelyanov	7c73a6faff	mib: add net to IP_INC_STATS_BH Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:20:11 -07:00
Pavel Emelyanov	5e38e27044	mib: add net to IP_INC_STATS All the callers already have either the net itself, or the place where to get it from. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:19:49 -07:00
Pavel Emelyanov	84a3aa000e	ipv4: prepare net initialization for IP accounting Some places, that deal with IP statistics already have where to get a struct net from, but use it directly, without declaring a separate variable on the stack. So, save this net on the stack for future IP_XXX_STATS macros. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:19:08 -07:00
Will Newton	70efce27fc	net/ipv4/tcp.c: Fix use of PULLHUP instead of POLLHUP in comments. Change PULLHUP to POLLHUP in tcp_poll comments and clean up another comment for grammar and coding style. Signed-off-by: Will Newton <will.newton@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:13:43 -07:00
Harvey Harrison	7b1c65faa2	net: make __skb_splice_bits static net/core/skbuff.c:1335:5: warning: symbol '__skb_splice_bits' was not declared. Should it be static? Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:12:30 -07:00
David S. Miller	885a4c966b	Merge branch 'stealer/ipvs/sync-daemon-cleanup-for-next' of git://git.stealer.net/linux-2.6	2008-07-16 20:07:06 -07:00
Rumen G. Bogdanovski	9d3a0de7dc	ipvs: More reliable synchronization on connection close This patch enhances the synchronization of the closing connections between the master and the backup director. It prevents the closed connections to expire with the 15 min timeout of the ESTABLISHED state on the backup and makes them expire as they would do on the master with much shorter timeouts. Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-16 20:04:23 -07:00
Sven Wegener	375c6bbabf	ipvs: Use schedule_timeout_interruptible() instead of msleep_interruptible() So that kthread_stop() can wake up the thread and we don't have to wait one second in the worst case for the daemon to actually stop. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Simon Horman <horms@verge.net.au>	2008-07-16 22:33:20 +00:00
Sven Wegener	ba6fd85021	ipvs: Put backup thread on mcast socket wait queue Instead of doing an endless loop with sleeping for one second, we now put the backup thread onto the mcast socket wait queue and it gets woken up as soon as we have data to process. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Simon Horman <horms@verge.net.au>	2008-07-16 22:33:20 +00:00
Sven Wegener	998e7a7680	ipvs: Use kthread_run() instead of doing a double-fork via kernel_thread() This also moves the setup code out of the daemons, so that we're able to return proper error codes to user space. The current code will return success to user space when the daemon is started with an invald mcast interface. With these changes we get an appropriate "No such device" error. We longer need our own completion to be sure the daemons are actually running, because they no longer contain code that can fail and kthread_run() takes care of the rest. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Simon Horman <horms@verge.net.au>	2008-07-16 22:33:20 +00:00
Sven Wegener	e6dd731c75	ipvs: Use ERR_PTR for returning errors from make_receive_sock() and make_send_sock() The additional information we now return to the caller is currently not used, but will be used to return errors to user space. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Simon Horman <horms@verge.net.au>	2008-07-16 22:33:19 +00:00
Sven Wegener	d56400504a	ipvs: Initialize mcast addr at compile time There's no need to do it at runtime, the values are constant. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Acked-by: Simon Horman <horms@verge.net.au>	2008-07-16 22:33:19 +00:00
Trond Myklebust	cadc723cc1	Merge branch 'bkl-removal' into next	2008-07-15 18:34:58 -04:00
Trond Myklebust	e89e896d31	Merge branch 'devel' into next Conflicts: fs/nfs/file.c Fix up the conflict with Jon Corbet's bkl-removal tree	2008-07-15 18:34:16 -04:00
Ingo Molnar	82638844d9	Merge branch 'linus' into cpus4096 Conflicts: arch/x86/xen/smp.c kernel/sched_rt.c net/iucv/iucv.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 00:29:07 +02:00
Trond Myklebust	a86dc496b7	SUNRPC: Remove the BKL from the callback functions Push it into those callback functions that actually need it. Note that all the NFS operations use their own locking, so don't need the BKL. Ditto for the rpcbind client. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-07-15 18:10:57 -04:00
Chuck Lever	c2e1b09ff2	SUNRPC: Support registering IPv6 interfaces with local rpcbind daemon Introduce a new API to register RPC services on IPv6 interfaces to allow the NFS server and lockd to advertise on IPv6 networks. Unlike rpcb_register(), the new rpcb_v4_register() function uses rpcbind protocol version 4 to contact the local rpcbind daemon. The version 4 SET/UNSET procedures allow services to register address families besides AF_INET, register at specific network interfaces, and register transport protocols besides UDP and TCP. All of this functionality is exposed via the new rpcb_v4_register() kernel API. A user-space rpcbind daemon implementation that supports version 4 of the rpcbind protocol is required in order to make use of this new API. Note that rpcbind version 3 is sufficient to support the new rpcbind facilities listed above, but most extant implementations use version 4. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-07-15 18:08:55 -04:00
Chuck Lever	babe80eb49	SUNRPC: Refactor rpcb_register to make rpcbindv4 support easier rpcbind version 4 registration will reuse part of rpcb_register, so just split it out into a separate function now. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-07-15 18:08:51 -04:00
Chuck Lever	423d8b0647	SUNRPC: None of rpcb_create's callers wants a privileged source port Clean up: Callers that required a privileged source port now use rpcb_create_local(), so we can remove the @privileged argument from rpcb_create(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-07-15 18:08:47 -04:00
Chuck Lever	cc5598b78f	SUNRPC: Introduce a specific rpcb_create for contacting localhost Add rpcb_create_local() for use by rpcb_register() and upcoming IPv6 registration functions. Ensure any errors encountered by rpcb_create_local() are properly reported. We can also use a statically allocated constant loopback socket address instead of one allocated on the stack and initialized every time the function is called. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-07-15 18:08:44 -04:00
Chuck Lever	166b88d755	SUNRPC: Use correct XDR encoding procedure for rpcbind SET/UNSET The rpcbind versions 3 and 4 SET and UNSET procedures use the same arguments as the GETADDR procedure. While definitely a bug, this hasn't been a problem so far since the kernel hasn't used version 3 or 4 SET and UNSET. But this will change in just a moment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-07-15 18:08:40 -04:00
Linus Torvalds	59190f4213	Merge branch 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits) generic-ipi: more merge fallout generic-ipi: merge fix x86, visws: use mach-default/entry_arch.h x86, visws: fix generic-ipi build generic-ipi: fixlet generic-ipi: fix s390 build bug generic-ipi: fix linux-next tree build failure fix: "smp_call_function: get rid of the unused nonatomic/retry argument" fix: "smp_call_function: get rid of the unused nonatomic/retry argument" fix "smp_call_function: get rid of the unused nonatomic/retry argument" on_each_cpu(): kill unused 'retry' parameter smp_call_function: get rid of the unused nonatomic/retry argument sh: convert to generic helpers for IPI function calls parisc: convert to generic helpers for IPI function calls mips: convert to generic helpers for IPI function calls m32r: convert to generic helpers for IPI function calls arm: convert to generic helpers for IPI function calls alpha: convert to generic helpers for IPI function calls ia64: convert to generic helpers for IPI function calls powerpc: convert to generic helpers for IPI function calls ... Fix trivial conflicts due to rcu updates in kernel/rcupdate.c manually	2008-07-15 14:12:03 -07:00
Ingo Molnar	1a781a777b	Merge branch 'generic-ipi' into generic-ipi-for-linus Conflicts: arch/powerpc/Kconfig arch/s390/kernel/time.c arch/x86/kernel/apic_32.c arch/x86/kernel/cpu/perfctr-watchdog.c arch/x86/kernel/i8259_64.c arch/x86/kernel/ldt.c arch/x86/kernel/nmi_64.c arch/x86/kernel/smpboot.c arch/x86/xen/smp.c include/asm-x86/hw_irq_32.h include/asm-x86/hw_irq_64.h include/asm-x86/mach-default/irq_vectors.h include/asm-x86/mach-voyager/irq_vectors.h include/asm-x86/smp.h kernel/Makefile Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-15 21:55:59 +02:00
Ingo Molnar	6c9fcaf2ee	Merge branch 'core/rcu' into core/rcu-for-linus	2008-07-15 21:10:12 +02:00
Akinobu Mita	d0236f8f82	iucv: fix memory leak in cpu hotplug error path. Fix memory leak in error path in CPU_UP_PREPARE notifier. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-15 02:09:53 -07:00
Octavian Purdila	2870c43d17	net: refactor tcp splice receive path to improve readability - move all of the details on offsets, lengths and buffers into a single function instead of doing these operation from multiple places - use a bottom up approach: try to avoid details in the high level functions, introduce them gradually as we go deeper in the function call stack With helpful feedback from Jarek Poplawski. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Acked-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-15 00:49:11 -07:00
David S. Miller	b9e4085768	netdev: Do not use TX lock to protect address lists. Now that we have a specific lock to protect the network device unicast and multicast lists, remove extraneous grabs of the TX lock in cases where the code only needs address list protection. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-15 00:15:08 -07:00
David S. Miller	e308a5d806	netdev: Add netdev->addr_list_lock protection. Add netif_addr_{lock,unlock}{,_bh}() helpers. Use them to protect operations that operate on or read the network device unicast and multicast address lists. Also use them in cases where the code simply wants to block calls into the driver's ->set_rx_mode() and ->set_multicast_list() methods. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-15 00:13:44 -07:00
David S. Miller	f1f28aa351	netdev: Add addr_list_lock to struct net_device. This will be used to protect the per-device unicast and multicast address lists, as well as the callbacks into the drivers which configure such state such as ->set_rx_mode() and ->set_multicast_list(). Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-15 00:08:33 -07:00
Pavel Emelyanov	f66ac03d49	mib: add struct net to ICMPMSGIN_INC_STATS_BH Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 23:05:31 -07:00
Pavel Emelyanov	903fc1964e	mib: add struct net to ICMPMSGOUT_INC_STATS Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 23:05:30 -07:00
Pavel Emelyanov	dcfc23cac1	mib: add struct net to ICMP_INC_STATS_BH Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 23:05:29 -07:00
Pavel Emelyanov	75c939bb4d	mib: add struct net to ICMP_INC_STATS Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 23:05:28 -07:00
Pavel Emelyanov	fd54d716b1	inet: toss struct net initialization around Some places, that deal with ICMP statistics already have where to get a struct net from, but use it directly, without declaring a separate variable on the stack. Since I will need this net soon, I declare a struct net on the stack and use it in the existing places in a separate patch not to spoil the future ones. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 23:05:26 -07:00
Pavel Emelyanov	0388b00426	icmp: add struct net argument to icmp_out_count This routine deals with ICMP statistics, but doesn't have a struct net at hands, so add one. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 23:05:13 -07:00
Patrick McHardy	61362766d7	vlan: remove unnecessary include statements Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:51:55 -07:00
Patrick McHardy	d80aa31bbf	vlan: clean up hard_start_xmit functions Remove excessive comments and debugging, use NETDEV_TX codes, remove some empty lines. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:51:39 -07:00
Patrick McHardy	1349fe9a6b	vlan: clean up vlan_dev_hard_header() Remove some debugging and excessive comments, merge the two dev_hard_header calls into one. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:51:19 -07:00
Patrick McHardy	19b9a4e256	vlan: ethtool ->get_flags support Allow to query LRO settings of underlying device when VLAN RX acceleration is used. Suggested by Ben Hutchings <bhutchings@solarflare.com>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:51:01 -07:00
Patrick McHardy	393e52e33c	packet: deliver VLAN TCI to userspace Store the VLAN tag in the auxillary data/tpacket2_hdr so userspace can properly deal with hardware VLAN tagging/stripping. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:50:39 -07:00
Patrick McHardy	bbd6ef87c5	packet: support extensible, 64 bit clean mmaped ring structure The tpacket_hdr is not 64 bit clean due to use of an unsigned long and can't be extended because the following struct sockaddr_ll needs to be at a fixed offset. Add support for a version 2 tpacket protocol that removes these limitations. Userspace can query the header size through a new getsockopt option and change the protocol version through a setsockopt option. The changes needed to switch to the new protocol version are: 1. replace struct tpacket_hdr by struct tpacket2_hdr 2. query header len and save 3. set protocol version to 2 - set up ring as usual 4. for getting the sockaddr_ll, use (void )hdr + TPACKET_ALIGN(hdrlen) instead of (void )hdr + TPACKET_ALIGN(sizeof(struct tpacket_hdr)) Steps 2 and 4 can be omitted if the struct sockaddr_ll isn't needed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:50:15 -07:00
Patrick McHardy	bc1d0411b8	vlan: deliver packets received with VLAN acceleration to network taps When VLAN header stripping is used, packets currently bypass packet sockets (and other network taps) completely. For locally existing VLANs, they appear directly on the VLAN device, for unknown VLANs they are silently dropped. Add a new function netif_nit_deliver() to deliver incoming packets to all network interface taps and use it in __vlan_hwaccel_rx() to make VLAN packets visible on the underlying device. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:49:30 -07:00
Patrick McHardy	6aa895b047	vlan: Don't store VLAN tag in cb Use a real skb member to store the skb to avoid clashes with qdiscs, which are allowed to use the cb area themselves. As currently only real devices that consume the skb set the NETIF_F_HW_VLAN_TX flag, no explicit invalidation is neccessary. The new member fills a hole on 64 bit, the skb layout changes from: __u32 mark; /* 172 4 / sk_buff_data_t transport_header; / 176 4 / sk_buff_data_t network_header; / 180 4 / sk_buff_data_t mac_header; / 184 4 / sk_buff_data_t tail; / 188 4 / / --- cacheline 3 boundary (192 bytes) --- / sk_buff_data_t end; / 192 4 / / XXX 4 bytes hole, try to pack / to __u32 mark; / 172 4 / __u16 vlan_tci; / 176 2 / / XXX 2 bytes hole, try to pack / sk_buff_data_t transport_header; / 180 4 / sk_buff_data_t network_header; / 184 4 */ Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:49:06 -07:00
Allan Stephens	968edbe1c8	tipc: Optimization to multicast name lookup algorithm This patch simplifies and speeds up TIPC's algorithm for identifying on-node and off-node destinations that overlap a multicast name sequence range. Rather than traversing the list of all known name publications within the cluster, it now traverses the (potentially much shorter) list of name publications made by the node itself, and determines if any off-node destinations exist by comparing the sizes of the two lists. (Since the node list must be a subset of the cluster list, a difference in sizes means that at least one off-node destination must exist.) Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:45:33 -07:00
Allan Stephens	1aad72d6cd	tipc: Add missing locks when inspecting node list & link list This patch ensures that TIPC configuration commands that display info about neighboring nodes and their links take the spinlocks that protect the node list and link lists from changing while the lists are being traversed. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:44:58 -07:00
Allan Stephens	08d2cf0f74	tipc: Fix bug in scope checking for multicast messages This patch ensures that TIPC's multicast message name lookup algorithm does individualized scope checking for each published name it examines. Previously, scope checking was only done for the first name in a name table publication list, which could result in incoming multicast messages being delivered to ports publishing names with "node" scope, or not being delivered to ports publishing names with "cluster" or "zone" scope. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:44:32 -07:00
Allan Stephens	0e35fd5e52	tipc: Eliminate improper use of TIPC_OK error code This patch corrects many places where TIPC routines indicated successful completion by returning TIPC_OK instead of 0. (The TIPC_OK symbol has the value 0, but it should only be used in contexts that deal with the error code field of a TIPC message header.) Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:44:01 -07:00
Allan Stephens	2da59918e2	tipc: Fix race condition that could cause accept() to fail This patch ensurs that accept() returns successfully even when the newly created socket is immediately disconnected by its peer. Previously, accept() would fail if it was unable to pass back the optional address info for the socket's peer before the socket became disconnected; TIPC now allows accept() to gather peer address information after disconnection. As a bonus, the revised code accesses the socket's port more efficiently, without the overhead incurred by a reference table lookup. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:43:32 -07:00
Allan Stephens	8642bd9e04	tipc: Optimize pointer dereferencing when receiving stream data This patch eliminates an unnecessary pointer dereference when accessing a stream-based socket's receive queue. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:42:51 -07:00
Allan Stephens	0ea522416b	tipc: Remove unneeded parameter to tipc_createport_raw() This patch eliminates an unneeded parameter when creating a low-level TIPC port object. Instead of returning both the pointer to the port structure and the port's reference ID, it now returns only the pointer since the port structure contains the reference ID as one of its fields. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:42:19 -07:00
Denis V. Lunev	83aa2e964b	netlabel: return msg overflow error from netlbl_cipsov4_list faster Currently, we are trying to place the information from the kernel to 1, 2, 3 and 4 pages sequentially. These pages are allocated via slab. Though, from the slab point of view steps 3 and 4 are equivalent on most architectures. So, lets skip 3 pages attempt. By the way, should we switch from .doit to .dumpit interface here? The amount of data seems quite big for me. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:28:25 -07:00
Johann Felix Soden	7197914c35	net: Remove references to wan-router.txt in Kconfigs This patch removes references in drivers/net/wan/Kconfig and net/wanrouter/Kconfig to Documentation/networking/wan-router.txt which was removed in commit `99971e70fd` ("[WANPIPE]: Forgotten bits of Sangoma drivers removal."). Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:22:29 -07:00
Wang Chen	89146504cb	8021q: Check return of dev_set_promiscuity/allmulti dev_set_promiscuity/allmulti might overflow. Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes dev_set_promiscuity/allmulti return error number if overflow happened. Here, we check all positive increment for promiscuity and allmulti to get error return. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 20:59:03 -07:00
Wang Chen	7dc00c82cb	ipv4: Fix ipmr unregister device oops An oops happens during device unregister. The following oops happened when I add two tunnels, which use a same device, and then delete one tunnel. Obviously deleting tunnel "A" causes device unregister, which send a notification, and after receiving notification, ipmr do unregister again for tunnel "B" which also use same device. That is wrong. After receiving notification, ipmr only needs to decrease reference count and don't do duplicated unregister. Fortunately, IPv6 side doesn't add tunnel in ip6mr, so it's clean. This patch fixs: - unregister device oops - using after dev_put() Here is the oops: === Jul 11 15:39:29 wangchen kernel: ------------[ cut here ]------------ Jul 11 15:39:29 wangchen kernel: kernel BUG at net/core/dev.c:3651! Jul 11 15:39:29 wangchen kernel: invalid opcode: 0000 [#1] Jul 11 15:39:29 wangchen kernel: Modules linked in: ipip tunnel4 nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs ipv6 snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device af_packet binfmt_misc button battery ac loop dm_mod usbhid ff_memless pcmcia firmware_class ohci1394 8139too mii ieee1394 yenta_socket rsrc_nonstatic pcmcia_core ide_cd_mod cdrom snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm i2c_i801 snd_timer snd i2c_core soundcore snd_page_alloc rng_core shpchp ehci_hcd uhci_hcd pci_hotplug intel_agp agpgart usbcore ext3 jbd ata_piix ahci libata dock edd fan thermal processor thermal_sys piix sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table] Jul 11 15:39:29 wangchen kernel: Jul 11 15:39:29 wangchen kernel: Pid: 4102, comm: mroute Not tainted (2.6.26-rc9-default #69) Jul 11 15:39:29 wangchen kernel: EIP: 0060:[<c024636b>] EFLAGS: 00010202 CPU: 0 Jul 11 15:39:29 wangchen kernel: EIP is at rollback_registered+0x61/0xe3 Jul 11 15:39:29 wangchen kernel: EAX: 00000001 EBX: ecba6000 ECX: 00000000 EDX: ffffffff Jul 11 15:39:29 wangchen kernel: ESI: 00000001 EDI: ecba6000 EBP: c03de2e8 ESP: ed8e7c3c Jul 11 15:39:29 wangchen kernel: DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Jul 11 15:39:29 wangchen kernel: Process mroute (pid: 4102, ti=ed8e6000 task=ed41e830 task.ti=ed8e6000) Jul 11 15:39:29 wangchen kernel: Stack: ecba6000 c024641c 00000028 c0284e1a 00000001 c03de2e8 ecba6000 eecff360 Jul 11 15:39:29 wangchen kernel: c0284e4c c03536f4 fffffff8 00000000 c029a819 ecba6000 00000006 ecba6000 Jul 11 15:39:29 wangchen kernel: 00000000 ecba6000 c03de2c0 c012841b ffffffff 00000000 c024639f ecba6000 Jul 11 15:39:29 wangchen kernel: Call Trace: Jul 11 15:39:29 wangchen kernel: [<c024641c>] unregister_netdevice+0x2f/0x51 Jul 11 15:39:29 wangchen kernel: [<c0284e1a>] vif_delete+0xaf/0xc3 Jul 11 15:39:29 wangchen kernel: [<c0284e4c>] ipmr_device_event+0x1e/0x30 Jul 11 15:39:29 wangchen kernel: [<c029a819>] notifier_call_chain+0x2a/0x47 Jul 11 15:39:29 wangchen kernel: [<c012841b>] raw_notifier_call_chain+0x9/0xc Jul 11 15:39:29 wangchen kernel: [<c024639f>] rollback_registered+0x95/0xe3 Jul 11 15:39:29 wangchen kernel: [<c024641c>] unregister_netdevice+0x2f/0x51 Jul 11 15:39:29 wangchen kernel: [<c0284e1a>] vif_delete+0xaf/0xc3 Jul 11 15:39:29 wangchen kernel: [<c0285eee>] ip_mroute_setsockopt+0x47a/0x801 Jul 11 15:39:29 wangchen kernel: [<eea5a70c>] do_get_write_access+0x2df/0x313 [jbd] Jul 11 15:39:29 wangchen kernel: [<c01727c4>] __find_get_block_slow+0xda/0xe4 Jul 11 15:39:29 wangchen kernel: [<c0172a7f>] __find_get_block+0xf8/0x122 Jul 11 15:39:29 wangchen kernel: [<c0172a7f>] __find_get_block+0xf8/0x122 Jul 11 15:39:29 wangchen kernel: [<eea5d563>] journal_cancel_revoke+0xda/0x110 [jbd] Jul 11 15:39:29 wangchen kernel: [<c0263501>] ip_setsockopt+0xa9/0x9ee Jul 11 15:39:29 wangchen kernel: [<eea5d563>] journal_cancel_revoke+0xda/0x110 [jbd] Jul 11 15:39:29 wangchen kernel: [<eea5a70c>] do_get_write_access+0x2df/0x313 [jbd] Jul 11 15:39:29 wangchen kernel: [<eea69287>] __ext3_get_inode_loc+0xcf/0x271 [ext3] Jul 11 15:39:29 wangchen kernel: [<eea743c7>] __ext3_journal_dirty_metadata+0x13/0x32 [ext3] Jul 11 15:39:29 wangchen kernel: [<c0116434>] __wake_up+0xf/0x15 Jul 11 15:39:29 wangchen kernel: [<eea5a424>] journal_stop+0x1bd/0x1c6 [jbd] Jul 11 15:39:29 wangchen kernel: [<eea703a7>] __ext3_journal_stop+0x19/0x34 [ext3] Jul 11 15:39:29 wangchen kernel: [<c014291e>] get_page_from_freelist+0x94/0x369 Jul 11 15:39:29 wangchen kernel: [<c01408f2>] filemap_fault+0x1ac/0x2fe Jul 11 15:39:29 wangchen kernel: [<c01a605e>] security_sk_alloc+0xd/0xf Jul 11 15:39:29 wangchen kernel: [<c023edea>] sk_prot_alloc+0x36/0x78 Jul 11 15:39:29 wangchen kernel: [<c0240037>] sk_alloc+0x3a/0x40 Jul 11 15:39:29 wangchen kernel: [<c0276062>] raw_hash_sk+0x46/0x4e Jul 11 15:39:29 wangchen kernel: [<c0166aff>] d_alloc+0x1b/0x157 Jul 11 15:39:29 wangchen kernel: [<c023e4d1>] sock_common_setsockopt+0x12/0x16 Jul 11 15:39:29 wangchen kernel: [<c023cb1e>] sys_setsockopt+0x6f/0x8e Jul 11 15:39:29 wangchen kernel: [<c023e105>] sys_socketcall+0x15c/0x19e Jul 11 15:39:29 wangchen kernel: [<c0103611>] sysenter_past_esp+0x6a/0x99 Jul 11 15:39:29 wangchen kernel: [<c0290000>] unix_poll+0x69/0x78 Jul 11 15:39:29 wangchen kernel: ======================= Jul 11 15:39:29 wangchen kernel: Code: 83 e0 01 00 00 85 c0 75 1f 53 53 68 12 81 31 c0 e8 3c 30 ed ff ba 3f 0e 00 00 b8 b9 7f 31 c0 83 c4 0c 5b e9 f5 26 ed ff 48 74 04 <0f> 0b eb fe 89 d8 e8 21 ff ff ff 89 d8 e8 62 ea ff ff c7 83 e0 Jul 11 15:39:29 wangchen kernel: EIP: [<c024636b>] rollback_registered+0x61/0xe3 SS:ESP 0068:ed8e7c3c Jul 11 15:39:29 wangchen kernel: ---[ end trace c311acf85d169786 ]--- === Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 20:56:34 -07:00
Wang Chen	d607032db0	ipv4: Check return of dev_set_allmulti allmulti might overflow. Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes dev_set_promiscuity/allmulti return error number if overflow happened. Here, we check the positive increment for allmulti to get error return. PS: For unwinding tunnel creating, we let ipip->ioctl() to handle it. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 20:55:26 -07:00
Wang Chen	7af3db78a9	ipv6: Fix using after dev_put() Patrick McHardy pointed it out. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 20:54:54 -07:00
Wang Chen	5ae7b44413	ipv6: Check return of dev_set_allmulti allmulti might overflow. Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes dev_set_promiscuity/allmulti return error number if overflow happened. Here, we check the positive increment for allmulti to get error return. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 20:54:23 -07:00
Wang Chen	bc3f9076f6	bridge: Check return of dev_set_promiscuity dev_set_promiscuity/allmulti might overflow. Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes dev_set_promiscuity/allmulti return error number if overflow happened. Here, we check the positive increment for promiscuity to get error return. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 20:53:13 -07:00
Wang Chen	2aeb0b88b3	af_packet: Check return of dev_set_promiscuity/allmulti dev_set_promiscuity/allmulti might overflow. Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes dev_set_promiscuity/allmulti return error number if overflow happened. In af_packet, we check all positive increment for promiscuity and allmulti to get error return. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 20:49:46 -07:00
David S. Miller	fc943b12e4	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6	2008-07-14 20:40:34 -07:00
Patrick McHardy	72d9794f44	net-sched: cls_flow: add perturbation support Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 20:36:32 -07:00
David S. Miller	0c4c8cae44	Merge branch 'master' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6	2008-07-14 20:32:07 -07:00
David S. Miller	2aec609fb4	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/netfilter/nf_conntrack_proto_tcp.c	2008-07-14 20:23:54 -07:00
David S. Miller	4c88949800	netfilter: Let nf_ct_kill() callers know if del_timer() returned true. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 20:22:38 -07:00
Linus Torvalds	666484f025	Merge branch 'core/softirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core/softirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: softirq: remove irqs_disabled warning from local_bh_enable softirq: remove initialization of static per-cpu variable Remove argument from open_softirq which is always NULL	2008-07-14 15:28:42 -07:00
Linus Torvalds	d1794f2c5b	Merge branch 'bkl-removal' of git://git.lwn.net/linux-2.6 * 'bkl-removal' of git://git.lwn.net/linux-2.6: (146 commits) IB/umad: BKL is not needed for ib_umad_open() IB/uverbs: BKL is not needed for ib_uverbs_open() bf561-coreb: BKL unneeded for open() Call fasync() functions without the BKL snd/PCM: fasync BKL pushdown ipmi: fasync BKL pushdown ecryptfs: fasync BKL pushdown Bluetooth VHCI: fasync BKL pushdown tty_io: fasync BKL pushdown tun: fasync BKL pushdown i2o: fasync BKL pushdown mpt: fasync BKL pushdown Remove BKL from remote_llseek v2 Make FAT users happier by not deadlocking x86-mce: BKL pushdown vmwatchdog: BKL pushdown vmcp: BKL pushdown via-pmu: BKL pushdown uml-random: BKL pushdown uml-mmapper: BKL pushdown ...	2008-07-14 14:48:31 -07:00
Jonathan Corbet	2fceef397f	Merge commit 'v2.6.26' into bkl-removal	2008-07-14 15:29:34 -06:00
Emmanuel Grumbach	1e18863790	mac80211: dont add a STA which is not in the same IBSS This patch avoids adding STAs that don't belong to our IBSS ieee80211_bssid_match matches also bcast address so also APs were added Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-14 14:52:57 -04:00
Johannes Berg	f434b2d111	mac80211: fix struct ieee80211_tx_queue_params Multiple issues: - there are no "default" values needed - cw_min/cw_max can be larger than documented - restructure to decrease size - use get_unaligned_le16 Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-14 14:52:57 -04:00
Johannes Berg	f591fa5dbb	mac80211: fix TX sequence numbers This patch makes mac80211 assign proper sequence numbers to QoS-data frames. It also removes the old sequence number code because we noticed that only the driver or hardware can assign sequence numbers to non-QoS-data and especially management frames in a race-free manner because beacons aren't passed through mac80211's TX path. This patch also adds temporary code to the rt2x00 drivers to not break them completely, that code will have to be reworked for proper sequence numbers on beacons. It also moves sequence number assignment down in the TX path so no sequence numbers are assigned to frames that are dropped. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-14 14:52:57 -04:00
Johannes Berg	22bb1be4d2	wext: make sysfs bits optional and deprecate them The /sys/class/net/*/wireless/ direcory is, as far as I know, not used by anyone. Additionally, the same data is available via wext ioctls. Hence the sysfs files are pretty much useless. This patch makes them optional and schedules them for removal. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: Jean Tourrilhes <jt@hpl.hp.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-14 14:52:57 -04:00
Johannes Berg	1411f9b531	mac80211: fix RX sequence number check According to 802.11-2007, we are doing the wrong thing in the sequence number checks when receiving frames. This fixes it. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-14 14:52:57 -04:00
Emmanuel Grumbach	2560b6e2e4	mac80211: Fix ieee80211_rx_reorder_ampdu: ignore QoS null packets This patch fixes the check at the entrance to ieee80211_rx_reorder_ampdu. This check has been broken by 'mac80211: rx.c use new helpers'. Letting QoS NULL packet in ieee80211_rx_reorder_ampdu led to packet loss in RX. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-14 14:52:56 -04:00
Johannes Berg	9d139c810a	mac80211: revamp beacon configuration This patch changes mac80211's beacon configuration handling to never pass skbs to the driver directly but rather always require the driver to use ieee80211_beacon_get(). Additionally, it introduces "change flags" on the config_interface() call to enable drivers to figure out what is changing. Finally, it removes the beacon_update() driver callback in favour of having IBSS beacon delivered by ieee80211_beacon_get() as well. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-14 14:30:07 -04:00
Johannes Berg	f3947e2dfa	mac80211: push interface checks down This patch pushes the "netif_running()" and "same type as before" checks down into ieee80211_if_change_type() to centralise the logic instead of duplicating it for cfg80211 and wext. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-14 14:30:07 -04:00
Johannes Berg	75636525fb	mac80211: revamp virtual interface handling This patch revamps the virtual interface handling and makes the code much easier to follow. Fewer functions, better names, less spaghetti code. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-14 14:30:07 -04:00
Johannes Berg	3e122be089	mac80211: make master netdev handling sane Currently, almost every interface type has a 'bss' pointer pointing to BSS information. This BSS information, however, is for a _local_ BSS, not for the BSS we joined, so having it on a STA mode interface makes little sense, but now they have it pointing to the master device, which is an AP mode virtual interface. However, except for some bitrate control data, this pointer is only used in AP/VLAN modes (for power saving stations.) Overall, it is not necessary to even have the master netdev be a valid virtual interface, and it doesn't have to be on the list of interfaces either. This patch changes the master netdev to be special, it now - no longer is on the list of virtual interfaces, which lets me remove a lot of tests for that - no longer has sub_if_data attached, since that isn't used Additionally, this patch changes some vlan/ap mode handling that is related to these 'bss' pointers described above (but in the VLAN case they actually make sense because there they point to the AP they belong to); it also adds some debugging code to IEEE80211_DEV_TO_SUB_IF to validate it is not called on the master netdev any more. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-14 14:30:06 -04:00
Samuel Ortiz	49292d5635	mac80211: power management wext hooks This patch implements the power management routines wireless extensions for mac80211. For now we only support switching PS mode between on and off. Signed-off-by: Samuel Ortiz <sameo@openedhand.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-14 14:30:06 -04:00
Marcel Holtmann	b1235d7961	[Bluetooth] Allow security for outgoing L2CAP connections When requested the L2CAP layer will now enforce authentication and encryption on outgoing connections. The usefulness of this feature is kinda limited since it will not allow proper connection ownership tracking until the authentication procedure has been finished. This is a limitation of Bluetooth 2.0 and before and can only be fixed by using Simple Pairing. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:54 +02:00
Marcel Holtmann	7cb127d5b0	[Bluetooth] Add option to disable eSCO connection creation It has been reported that some eSCO capable headsets are not able to connect properly. The real reason for this is unclear at the moment. So for easier testing add a module parameter to disable eSCO connection creation. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:53 +02:00
Marcel Holtmann	ec8dab36e0	[Bluetooth] Signal user-space for HIDP and BNEP socket errors When using the HIDP or BNEP kernel support, the user-space needs to know if the connection has been terminated for some reasons. Wake up the application if that happens. Otherwise kernel and user-space are no longer on the same page and weird behaviors can happen. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:53 +02:00
Marcel Holtmann	a0c22f2265	[Bluetooth] Move pending packets from RFCOMM socket to TTY When an incoming RFCOMM socket connection gets converted into a TTY, it can happen that packets are lost. This mainly happens with the Handsfree profile where the remote side starts sending data right away. The problem is that these packets are in the socket receive queue. So when creating the TTY make sure to copy all pending packets from the socket receive queue to a private queue inside the TTY. To make this actually work, the flow control on the newly created TTY will be disabled and only enabled again when the TTY is opened by an application. And right before that, the pending packets will be put into the TTY flip buffer. Signed-off-by: Denis Kenzior <denis.kenzior@trolltech.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:52 +02:00
Marcel Holtmann	8b6b3da765	[Bluetooth] Store remote modem status for RFCOMM TTY When switching a RFCOMM socket to a TTY, the remote modem status might be needed later. Currently it is lost since the original configuration is done via the socket interface. So store the modem status and reply it when the socket has been converted to a TTY. Signed-off-by: Denis Kenzior <denis.kenzior@trolltech.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:52 +02:00
Marcel Holtmann	ca37bdd53b	[Bluetooth] Use non-canonical TTY by default for RFCOMM While the RFCOMM TTY emulation can act like a real serial port, in reality it is not used like this. So to not mess up stupid applications, use the non-canonical mode by default. Signed-off-by: Denis Kenzior <denis.kenzior@trolltech.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:52 +02:00
Marcel Holtmann	78c6a1744f	[Bluetooth] Update Bluetooth core version number With all the Bluetooth 2.1 changes and the support for Simple Pairing, it is important to update the Bluetooth core version number. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:51 +02:00
Marcel Holtmann	7d0db0a373	[Bluetooth] Use a more unique bus name for connections When attaching Bluetooth low-level connections to the bus, the bus name is constructed from the remote address since at that time the connection handle is not assigned yet. This has worked so far, but also caused a lot of troubles. It is better to postpone the creation of the sysfs entry to the time when the connection actually has been established and then use its connection handle as unique identifier. This also fixes the case where two different adapters try to connect to the same remote device. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:51 +02:00
Marcel Holtmann	43cbeee9f9	[Bluetooth] Add support for TIOCOUTQ and TIOCINQ ioctls Almost every protocol family supports the TIOCOUTQ and TIOCINQ ioctls and even Bluetooth could make use of them. When implementing audio streaming and integration with GStreamer or PulseAudio they will allow a better timing and synchronization. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:51 +02:00
Marcel Holtmann	3241ad820d	[Bluetooth] Add timestamp support to L2CAP, RFCOMM and SCO Enable the common timestamp functionality that the network subsystem provides for L2CAP, RFCOMM and SCO sockets. It is possible to either use SO_TIMESTAMP or the IOCTLs to retrieve the timestamp of the current packet. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:50 +02:00
Marcel Holtmann	40be492fe4	[Bluetooth] Export details about authentication requirements With the Simple Pairing support, the authentication requirements are an explicit setting during the bonding process. Track and enforce the requirements and allow higher layers like L2CAP and RFCOMM to increase them if needed. This patch introduces a new IOCTL that allows to query the current authentication requirements. It is also possible to detect Simple Pairing support in the kernel this way. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:50 +02:00
Marcel Holtmann	f8558555f3	[Bluetooth] Initiate authentication during connection establishment With Bluetooth 2.1 and Simple Pairing the requirement is that any new connection needs to be authenticated and that encryption has been switched on before allowing L2CAP to use it. So make sure that all the requirements are fulfilled and otherwise drop the connection with a minimal disconnect timeout of 10 milliseconds. This change only affects Bluetooth 2.1 devices and Simple Pairing needs to be enabled locally and in the remote host stack. The previous changes made sure that these information are discovered before any kind of authentication and encryption is triggered. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:49 +02:00
Marcel Holtmann	769be974d0	[Bluetooth] Use ACL config stage to retrieve remote features The Bluetooth technology introduces new features on a regular basis and for some of them it is important that the hardware on both sides support them. For features like Simple Pairing it is important that the host stacks on both sides have switched this feature on. To make valid decisions, a config stage during ACL link establishment has been introduced that retrieves remote features and if needed also the remote extended features (known as remote host features) before signalling this link as connected. This change introduces full reference counting of incoming and outgoing ACL links and the Bluetooth core will disconnect both if no owner of it is present. To better handle interoperability during the pairing phase the disconnect timeout for incoming connections has been increased to 10 seconds. This is five times more than for outgoing connections. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:49 +02:00
Marcel Holtmann	a8bd28baf2	[Bluetooth] Export remote Simple Pairing mode via sysfs Since the remote Simple Pairing mode is stored together with the inquiry cache, it makes sense to show it together with the other information. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:49 +02:00
Marcel Holtmann	41a96212b3	[Bluetooth] Track status of remote Simple Pairing mode The Simple Pairing process can only be used if both sides have the support enabled in the host stack. The current Bluetooth specification has three ways to detect this support. If an Extended Inquiry Result has been sent during inquiry then it is safe to assume that Simple Pairing is enabled. It is not allowed to enable Extended Inquiry without Simple Pairing. During the remote name request phase a notification with the remote host supported features will be sent to indicate Simple Pairing support. Also the second page of the remote extended features can indicate support for Simple Pairing. For all three cases the value of remote Simple Pairing mode is stored in the inquiry cache for later use. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:48 +02:00
Marcel Holtmann	333140b57f	[Bluetooth] Track status of Simple Pairing mode The Simple Pairing feature is optional and needs to be enabled by the host stack first. The Linux kernel relies on the Bluetooth daemon to either enable or disable it, but at any time it needs to know the current state of the Simple Pairing mode. So track any changes made by external entities and store the current mode in the HCI device structure. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:48 +02:00
Marcel Holtmann	0493684ed2	[Bluetooth] Disable disconnect timer during Simple Pairing During the Simple Pairing process the HCI disconnect timer must be disabled. The way to do this is by holding a reference count of the HCI connection. The Simple Pairing process on both sides starts with an IO Capabilities Request and ends with Simple Pairing Complete. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:48 +02:00
Marcel Holtmann	c7bdd5026d	[Bluetooth] Update class of device value whenever possible The class of device value can only be retrieved via inquiry or during an incoming connection request. Outgoing connections can't ask for the class of device. To compensate for this the value is stored and copied via the inquiry cache, but currently only updated via inquiry. This update should also happen during an incoming connection request. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:47 +02:00
Marcel Holtmann	f383f2750a	[Bluetooth] Some cleanups for HCI event handling Some minor cosmetic cleanups to the HCI event handling to make the code easier to read and understand. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:47 +02:00
Marcel Holtmann	e4e8e37c42	[Bluetooth] Make use of the default link policy settings The Bluetooth specification supports the default link policy settings on a per host controller basis. For every new connection the link manager would then use these settings. It is better to use this instead of bothering the controller on every connection setup to overwrite the default settings. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:47 +02:00
Marcel Holtmann	a8746417e8	[Bluetooth] Track connection packet type changes The connection packet type can be changed after the connection has been established and thus needs to be properly tracked to ensure that the host stack has always correct and valid information about it. On incoming connections the Bluetooth core switches the supported packet types to the configured list for this controller. However the usefulness of this feature has been questioned a lot. The general consent is that every Bluetooth host stack should enable as many packet types as the hardware actually supports and leave the decision to the link manager software running on the Bluetooth chip. When running on Bluetooth 2.0 or later hardware, don't change the packet type for incoming connections anymore. This hardware likely supports Enhanced Data Rate and thus leave it completely up to the link manager to pick the best packet type. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:46 +02:00
Marcel Holtmann	9dc0a3afc0	[Bluetooth] Support the case when headset falls back to SCO link When trying to establish an eSCO link between two devices then it can happen that the remote device falls back to a SCO link. Currently this case is not handled correctly and the message dispatching will break since it is looking for eSCO packets. So in case the configured link falls back to SCO overwrite the link type with the correct value. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:46 +02:00
Marcel Holtmann	ae29319649	[Bluetooth] Update authentication status after successful encryption The authentication status is not communicated to both parties. This is actually a flaw in the Bluetooth specification. Only the requesting side really knows if the authentication was successful or not. This piece of information is however needed on the other side to know if it has to trigger the authentication procedure or not. Worst case is that both sides will request authentication at different times, but this should be avoided since it costs extra time when setting up a new connection. For Bluetooth encryption it is required to authenticate the link first and the encryption status is communicated to both sides. So when a link is switched to encryption it is possible to update the authentication status since it implies an authenticated link. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:45 +02:00
Marcel Holtmann	9719f8afce	[Bluetooth] Disconnect when encryption gets disabled The Bluetooth specification allows to enable or disable the encryption of an ACL link at any time by either the peer or the remote device. If a L2CAP or RFCOMM connection requested an encrypted link, they will now disconnect that link if the encryption gets disabled. Higher protocols that don't care about encryption (like SDP) are not affected. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:45 +02:00
Marcel Holtmann	77db198056	[Bluetooth] Enforce security for outgoing RFCOMM connections Recent tests with various Bluetooth headsets have shown that some of them don't enforce authentication and encryption when connecting. All of them leave it up to the host stack to enforce it. Non of them should allow unencrypted connections, but that is how it is. So in case the link mode settings require authentication and/or encryption it will now also be enforced on outgoing RFCOMM connections. Previously this was only done for incoming connections. This support has a small drawback from a protocol level point of view since the host stack can't really tell with 100% certainty if a remote side is already authenticated or not. So if both sides are configured to enforce authentication it will be requested twice. Most Bluetooth chips are caching this information and thus no extra authentication procedure has to be triggered over-the-air, but it can happen. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:45 +02:00
Marcel Holtmann	79d554a697	[Bluetooth] Change retrieval of L2CAP features mask Getting the remote L2CAP features mask is really important, but doing this as less intrusive as possible is tricky. To play nice with older systems and Bluetooth qualification testing, the features mask is now only retrieved in two specific cases and only once per lifetime of an ACL link. When trying to establish a L2CAP connection and the remote features mask is unknown, the L2CAP information request is sent when the ACL link goes into connected state. This applies only to outgoing connections and also only for the connection oriented channels. The second case is when a connection request has been received. In this case a connection response with the result pending and the information request will be send. After receiving an information response or if the timeout gets triggered, the normal connection setup process with security setup will be initiated. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2008-07-14 20:13:44 +02:00
Ursula Braun	c2b4afd2f9	[S390] Cleanup iucv printk messages. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Ursula Braun <braunu@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2008-07-14 10:02:20 +02:00
Gerrit Renker	2eeea7ba6b	dccp ccid-3: Length of loss intervals This corrects an error in the computation of the open loss interval I_0: * the interval length is (highest_seqno - start_seqno) + 1 * and not (highest_seqno - start_seqno). This condition was not fully clear in RFC 3448, but reflects the current revision state of rfc3448bis and is also consistent with RFC 4340, 6.1.1. Further changes: ---------------- * variable renamed due to line length constraints; * explicit typecast to `s64' to avoid implicit signed/unsigned casting. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-07-13 11:51:40 +01:00
Gerrit Renker	b552c6231f	dccp ccid-3: Fix a loss detection bug This fixes a bug in the logic of the TFRC loss detection: * new_loss_indicated() should not be called while a loss is pending; * but the code allows this; * thus, for two subsequent gaps in the sequence space, when loss_count has not yet reached NDUPACK=3, the loss_count is falsely reduced to 1. To avoid further and similar problems, all loss handling and loss detection is now done inside tfrc_rx_hist_handle_loss(), using an appropriate routine to track new losses. Further changes: ---------------- * added a reminder that no RX history operations should be performed when rx_handle_loss() has identified a (new) loss, since the function takes care of packet reordering during loss detection; * made tfrc_rx_hist_loss_pending() bool (thanks to an earlier suggestion by Arnaldo); * removed unused functions. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>	2008-07-13 11:51:40 +01:00

... 9 10 11 12 13 ...

10389 Commits