* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (55 commits)
netxen: fix tx ring accounting
netxen: fix detection of cut-thru firmware mode
forcedeth: fix dma api mismatches
atm: sk_wmem_alloc initial value is one
net: correct off-by-one write allocations reports
via-velocity : fix no link detection on boot
Net / e100: Fix suspend of devices that cannot be power managed
TI DaVinci EMAC : Fix rmmod error
net: group address list and its count
ipv4: Fix fib_trie rebalancing, part 2
pkt_sched: Update drops stats in act_police
sky2: version 1.23
sky2: add GRO support
sky2: skb recycling
sky2: reduce default transmit ring
sky2: receive counter update
sky2: fix shutdown synchronization
sky2: PCI irq issues
sky2: more receive shutdown
sky2: turn off pause during shutdown
...
Manually fix trivial conflict in net/core/skbuff.c due to kmemcheck
commit 2b85a34e91
(net: No more expensive sock_hold()/sock_put() on each tx)
changed initial sk_wmem_alloc value.
This broke net/atm since this protocol assumed a null
initial value. This patch makes necessary changes.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 2b85a34e91
(net: No more expensive sock_hold()/sock_put() on each tx)
changed initial sk_wmem_alloc value.
We need to take into account this offset when reporting
sk_wmem_alloc to user, in PROC_FS files or various
ioctls (SIOCOUTQ/TIOCOUTQ)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is inspired by patch recently posted by Johannes Berg. Basically what
my patch does is to group list and a count of addresses into newly introduced
structure netdev_hw_addr_list. This brings us two benefits:
1) struct net_device becames a bit nicer.
2) in the future there will be a possibility to operate with lists independently
on netdevices (with exporting right functions).
I wanted to introduce this patch before I'll post a multicast lists conversion.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
drivers/net/bnx2.c | 4 +-
drivers/net/e1000/e1000_main.c | 4 +-
drivers/net/ixgbe/ixgbe_main.c | 6 +-
drivers/net/mv643xx_eth.c | 2 +-
drivers/net/niu.c | 4 +-
drivers/net/virtio_net.c | 10 ++--
drivers/s390/net/qeth_l2_main.c | 2 +-
include/linux/netdevice.h | 17 +++--
net/core/dev.c | 130 ++++++++++++++++++--------------------
9 files changed, 89 insertions(+), 90 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
My previous patch, which explicitly delays freeing of tnodes by adding
them to the list to flush them after the update is finished, isn't
strict enough. It treats exceptionally tnodes without parent, assuming
they are newly created, so "invisible" for the read side yet.
But the top tnode doesn't have parent as well, so we have to exclude
all exceptions (at least until a better way is found). Additionally we
need to move rcu assignment of this node before flushing, so the
return type of the trie_rebalance() function is changed.
Reported-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Action police statistics could be misleading because drops are not
shown when expected.
With feedback from: Jamal Hadi Salim <hadi@cyberus.ca>
Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
The skb mac_header field is sometimes NULL (or ~0u) as a sentinel
value. The places where skb is expanded add an offset which would
change this flag into an invalid pointer (or offset).
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Looking at the crash in log_martians(), one suspect is that the check for
mac header being set is not correct. The value of mac_header defaults to
0 on allocation, therefore skb_mac_header_was_set will always be true on
platforms using NET_SKBUFF_USES_OFFSET.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 'rq_received' member of 'struct rpc_rqst' is used to track when we
have received a reply to our request. With v4.1, the backchannel
can now accept callback requests over the existing connection. Rename
this field to make it clear that it is only used for tracking reply bytes
and not all bytes received on the connection.
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Obtain the rpc_xprt from the rpc_rqst so that calls and callback replies
can both use the same code path. A client needs the rpc_xprt in order
to reply to a callback.
Signed-off-by: Rahul Iyer <iyer@netapp.com>
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
This svc_xprt is passed on to the callback service thread to be later used
to processes incoming svc_rqst's
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
For nfs41 callbacks we need an svc_xprt to process requests coming up the
backchannel socket as rpc_rqst's that are transformed into svc_rqst's that
need a rq_xprt to be processed.
The svc_{udp,tcp}_create methods are too heavy for this job as svc_create_socket
creates an actual socket to listen on while for nfs41 we're "reusing" the
fore channel's socket.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Implement the NFSv4.1 backchannel service. Invokes the common callback
processing logic svc_process_common() to authenticate the call and
dispatch the appropriate NFSv4.1 XDR decoder and operation procedure.
It then invokes bc_send() to send the reply over the same connection.
bc_send() is implemented in a separate patch.
At this time there is no slot validation or reply cache handling.
[nfs41: Preallocate rpc_rqst receive buffer for handling callbacks]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Move bc_svc_process() declaration to correct patch]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
net/sunrpc/svc.c:svc_process() is used by the NFSv4 callback service
to process RPC requests arriving over connections initiated by the
server. NFSv4.1 supports callbacks over the backchannel on connections
initiated by the client. This patch refactors svc_process() so that
common code can also be used by the backchannel.
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Executes the backchannel task on the RPC state machine using
the existing open connection previously established by the client.
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
nfs41: Add bc_svc.o to sunrpc Makefile.
[nfs41: bc_send() does not need to be exported outside RPC module]
[nfs41: xprt_free_bc_request() need not be exported outside RPC module]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Update copyright]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Adds rpc_run_bc_task() which is called by the NFS callback service to
process backchannel requests. It performs similar work to rpc_run_task()
though "schedules" the backchannel task to be executed starting at the
call_trasmit state in the RPC state machine.
It also introduces some miscellaneous updates to the argument validation,
call_transmit, and transport cleanup functions to take into account
that there are now forechannel and backchannel tasks.
Backchannel requests do not carry an RPC message structure, since the
payload has already been XDR encoded using the existing NFSv4 callback
mechanism.
Introduce a new transmit state for the client to reply on to backchannel
requests. This new state simply reserves the transport and issues the
reply. In case of a connection related error, disconnects the transport and
drops the reply. It requires the forechannel to re-establish the connection
and the server to retransmit the request, as stated in NFSv4.1 section
2.9.2 "Client and Server Transport Behavior".
Note: There is no need to loop attempting to reserve the transport. If EAGAIN
is returned by xprt_prepare_transmit(), return with tk_status == 0,
setting tk_action to call_bc_transmit. rpc_execute() will invoke it again
after the task is taken off the sleep queue.
[nfs41: rpc_run_bc_task() need not be exported outside RPC module]
[nfs41: New call_bc_transmit RPC state]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Backchannel: No need to loop in call_bc_transmit()]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[rpc_count_iostats incorrectly exits early]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Convert rpc_reply_expected() to inline function]
[Remove unnecessary BUG_ON()]
[Rename variable]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
In the case of -EADDRNOTAVAIL and/or unhandled connection errors, we want
to get rid of the existing socket and retry immediately, just as the
comment says. Currently we end up sleeping for a minute, due to the missing
"break" statement.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Handles RPC replies and backchannel callbacks. Traditionally the NFS
client has expected only RPC replies on its open connections. With
NFSv4.1, callbacks can arrive over an existing open connection.
This patch refactors the old xs_tcp_read_request() into an RPC reply handler:
xs_tcp_read_reply(), a new backchannel callback handler: xs_tcp_read_callback(),
and a common routine to read the data off the transport: xs_tcp_read_common().
The new xs_tcp_read_callback() queues callback requests onto a queue where
the callback service (a separate thread) is listening for the processing.
This patch incorporates work and suggestions from Rahul Iyer (iyer@netapp.com)
and Benny Halevy (bhalevy@panasas.com).
xs_tcp_read_callback() drops the connection when the number of expected
callbacks is exceeded. Use xprt_force_disconnect(), ensuring tasks on
the pending queue are awaken on disconnect.
[nfs41: Keep track of RPC call/reply direction with a flag]
[nfs41: Preallocate rpc_rqst receive buffer for handling callbacks]
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: sunrpc: xs_tcp_read_callback() should use xprt_force_disconnect()]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Moves embedded #ifdefs into #ifdef function blocks]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
This patch introduces support to setup the callback xprt on the client side.
It allocates/ destroys the preallocated memory structures used to process
backchannel requests.
At setup time, xprt_setup_backchannel() is invoked to allocate one or
more rpc_rqst structures and substructures. This ensures that they
are available when an RPC callback arrives. The rpc_rqst structures
are maintained in a linked list attached to the rpc_xprt structure.
We keep track of the number of allocations so that they can be correctly
removed when the channel is destroyed.
When an RPC callback arrives, xprt_alloc_bc_request() is invoked to
obtain a preallocated rpc_rqst structure. An rpc_xprt structure is
returned, and its RPC_BC_PREALLOC_IN_USE bit is set in
rpc_xprt->bc_flags. The structure is removed from the the list
since it is now in use, and it will be later added back when its
user is done with it.
After the RPC callback replies, the rpc_rqst structure is returned
by invoking xprt_free_bc_request(). This clears the
RPC_BC_PREALLOC_IN_USE bit and adds it back to the list, allowing it
to be reused by a subsequent RPC callback request.
To be consistent with the reception of RPC messages, the backchannel requests
should be placed into the 'struct rpc_rqst' rq_rcv_buf, which is then in turn
copied to the 'struct rpc_rqst' rq_private_buf.
[nfs41: Preallocate rpc_rqst receive buffer for handling callbacks]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Update copyright notice and explain page allocation]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Reading and storing the RPC direction is a three step process.
1. xs_tcp_read_calldir() reads the RPC direction, but it will not store it
in the XDR buffer since the 'struct rpc_rqst' is not yet available.
2. The 'struct rpc_rqst' is obtained during the TCP_RCV_COPY_DATA state.
This state need not necessarily be preceeded by the TCP_RCV_READ_CALLDIR.
For example, we may be reading a continuation packet to a large reply.
Therefore, we can't simply obtain the 'struct rpc_rqst' during the
TCP_RCV_READ_CALLDIR state and assume it's available during TCP_RCV_COPY_DATA.
This patch adds a new TCP_RCV_READ_CALLDIR flag to indicate the need to
read the RPC direction. It then uses TCP_RCV_COPY_CALLDIR to indicate the
RPC direction needs to be saved after the 'struct rpc_rqst' has been allocated.
3. The 'struct rpc_rqst' is obtained by the xs_tcp_read_data() helper
functions. xs_tcp_read_common() then saves the RPC direction in the XDR
buffer if TCP_RCV_COPY_CALLDIR is set. This will happen when we're reading
the data immediately after the direction was read. xs_tcp_read_common()
then clears this flag.
[was nfs41: Skip past the RPC call direction]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: sunrpc: Add RPC direction back into the XDR buffer]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: sunrpc: Don't skip past the RPC call direction]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
NFSv4.1 callbacks can arrive over an existing connection. This patch adds
the logic to read the RPC call direction (call or reply). It does this by
updating the state machine to look for the call direction invoking
xs_tcp_read_calldir(...) after reading the XID.
[nfs41: Keep track of RPC call/reply direction with a flag]
As per 11/14/08 review of RFC 53/85.
Add a new flag to track whether the incoming message is an RPC call or an
RPC reply. TCP_RPC_REPLY is set in the 'struct sock_xprt' tcp_flags in
xs_tcp_read_calldir() if the message is an RPC reply sent on the forechannel.
It is cleared if the message is an RPC request sent on the back channel.
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
commit 2b85a34e91
(net: No more expensive sock_hold()/sock_put() on each tx)
changed initial sk_wmem_alloc value.
Some protocols check sk_wmem_alloc value to determine if a timer
must delay socket deallocation. We must take care of the sk_wmem_alloc
value being one instead of zero when no write allocations are pending.
Reported by Ingo Molnar, and full diagnostic from David Miller.
This patch introduces three helpers to get read/write allocations
and a followup patch will use these helpers to report correct
write allocations to user.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Connections that have seen a connection-level abort should not be reused
as the far end will just abort them again; instead a new connection
should be made.
Connection-level aborts occur due to such things as authentication
failures.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* akpm: (182 commits)
fbdev: bf54x-lq043fb: use kzalloc over kmalloc/memset
fbdev: *bfin*: fix __dev{init,exit} markings
fbdev: *bfin*: drop unnecessary calls to memset
fbdev: bfin-t350mcqb-fb: drop unused local variables
fbdev: blackfin has __raw I/O accessors, so use them in fb.h
fbdev: s1d13xxxfb: add accelerated bitblt functions
tcx: use standard fields for framebuffer physical address and length
fbdev: add support for handoff from firmware to hw framebuffers
intelfb: fix a bug when changing video timing
fbdev: use framebuffer_release() for freeing fb_info structures
radeon: P2G2CLK_ALWAYS_ONb tested twice, should 2nd be P2G2CLK_DAC_ALWAYS_ONb?
s3c-fb: CPUFREQ frequency scaling support
s3c-fb: fix resource releasing on error during probing
carminefb: fix possible access beyond end of carmine_modedb[]
acornfb: remove fb_mmap function
mb862xxfb: use CONFIG_OF instead of CONFIG_PPC_OF
mb862xxfb: restrict compliation of platform driver to PPC
Samsung SoC Framebuffer driver: add Alpha Channel support
atmel-lcdc: fix pixclock upper bound detection
offb: use framebuffer_alloc() to allocate fb_info struct
...
Manually fix up conflicts due to kmemcheck in mm/slab.c
num_online_nodes() is called in a number of places but most often by the
page allocator when deciding whether the zonelist needs to be filtered
based on cpusets or the zonelist cache. This is actually a heavy function
and touches a number of cache lines.
This patch stores the number of online nodes at boot time and updates the
value when nodes get onlined and offlined. The value is then used in a
number of important paths in place of num_online_nodes().
[rientjes@google.com: do not override definition of node_set_online() with macro]
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If socket destuction gets delayed to a timer, we try to
lock_sock() from that timer which won't work.
Use bh_lock_sock() in that case.
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Ingo Molnar <mingo@elte.hu>
Patch establishes a dummy afiucv-device to make sure af_iucv is
notified as iucv-bus device about suspend/resume.
The PM freeze callback severs all iucv pathes of connected af_iucv sockets.
The PM thaw/restore callback switches the state of all previously connected
sockets to IUCV_DISCONN.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Patch calls the PM callback functions of iucv-bus devices, which are
responsible for removal of their established iucv pathes.
The PM freeze callback for the first iucv-bus device disables all iucv
interrupts except the connection severed interrupt.
The PM freeze callback for the last iucv-bus device shuts down iucv.
The PM thaw callback for the first iucv-bus device re-enables iucv
if it has been shut down during freeze. If freezing has been interrupted,
it re-enables iucv interrupts according to the needs of iucv-exploiters.
The PM restore callback for the first iucv-bus device re-enables iucv.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
To guarantee a proper cleanup, patch adds a reboot notifier to
the iucv base code, which disables iucv interrupts, shuts down
established iucv pathes, and removes iucv declarations for z/VM.
Checks have to be added to the iucv-API functions, whether
iucv-buffers removed at reboot time are still declared.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In case the check on ch_count fails the cleanup path is skipped and the
previously allocated memory 'rpl_map', 'chl_map' is not freed.
Reported by Coverity.
Signed-off-by: Christian Engelmayer <christian.engelmayer@frequentis.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Align cache_clean work.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
When changing to a new BSSID or SSID, the code in
ieee80211_set_disassoc() needs to have the old data
still valid to be able to disconnect and clean up
properly. Currently, however, the old data is thrown
away before ieee80211_set_disassoc() is ever called,
so fix that by calling the function _before_ the old
data is overwritten.
This is (one of) the issue(s) causing mac80211 to hold
cfg80211's BSS structs forever, and them thus being
returned in scan results after they're long gone.
http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=2015
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If we do not disconnect when a channel switch is requested,
we end up eventually detection beacon loss from the AP and
then disconnecting, without ever really telling the AP, so
we might just as well disconnect right away.
Additionally, this fixes a problem with iwlwifi where the
driver will clear some internal state on channel changes
like this and then get confused when we actually go clear
that state from mac80211.
It may look like this patch drops the no-IBSS check, but
that is already handled by cfg80211 in the wext handler it
provides for IBSS (cfg80211_ibss_wext_siwfreq).
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
I suspect that some driver bugs can cause queues to be
stopped while they shouldn't be, but it's hard to find
out whether that is the case or not without having any
visible information about the queues. This adds a file
to debugfs that allows us to see the queues' statuses.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
It looks like some programs (e.g., NM) are setting an empty SSID with
SIOCSIWESSID in some cases. This seems to trigger mac80211 to try to
associate with an invalid configuration (wildcard SSID) which will
result in failing associations (or odd issues, potentially including
kernel panic with some drivers) if the AP were to actually accept this
anyway).
Only start association process if the SSID is actually set. This
speeds up connection with NM in number of cases and avoids sending out
broken association request frames.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The use of bitfields here would lead to false positive warnings with
kmemcheck. Silence them.
(Additionally, one erroneous comment related to the bitfield was also
fixed.)
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Let's use TICKS instead of US, so PSCHED_TICKS2NS and PSCHED_NS2TICKS
(like in PSCHED_TICKS_PER_SEC already) to avoid misleading.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While doing trie_rebalance(): resize(), inflate(), halve() RCU free
tnodes before updating their parents. It depends on RCU delaying the
real destruction, but if RCU readers start after call_rcu() and before
parent update they could access freed memory.
It is currently prevented with preempt_disable() on the update side,
but it's not safe, except maybe classic RCU, plus it conflicts with
memory allocations with GFP_KERNEL flag used from these functions.
This patch explicitly delays freeing of tnodes by adding them to the
list, which is flushed after the update is finished.
Reported-by: Yan Zheng <zheng.yan@oracle.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the re-write of the RFKILL subsystem it is no longer good to just
select RFKILL, but it is important to add a proper depends on rule.
Based on a report by Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
IPv4:
- make PIM register vifs netns local
- set the netns when a PIM register vif is created
- make PIM available in all network namespaces (if CONFIG_IP_PIMSM_V2)
by adding the protocol handler when multicast routing is initialized
IPv6:
- make PIM register vifs netns local
- make PIM available in all network namespaces (if CONFIG_IPV6_PIMSM_V2)
by adding the protocol handler when multicast routing is initialized
Signed-off-by: Tom Goff <thomas.goff@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removed the statements about ARP cache size as this config option does
not affect it. The cache size is controlled by neigh_table gc thresholds.
Remove also expiremental and obsolete markings as the API originally
intended for arp caching is useful for implementing ARP-like protocols
(e.g. NHRP) in user space and has been there for a long enough time.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the sake of power saver lovers, use a deferrable timer to fire
rt_check_expire()
As some big routers cache equilibrium depends on garbage collection
done in time, we take into account elapsed time between two
rt_check_expire() invocations to adjust the amount of slots we have to
check.
Based on an initial idea and patch from Tero Kristo
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Tero Kristo <tero.kristo@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch improves ctnetlink event reliability if one broadcast
listener has set the NETLINK_BROADCAST_ERROR socket option.
The logic is the following: if an event delivery fails, we keep
the undelivered events in the missed event cache. Once the next
packet arrives, we add the new events (if any) to the missed
events in the cache and we try a new delivery, and so on. Thus,
if ctnetlink fails to deliver an event, we try to deliver them
once we see a new packet. Therefore, we may lose state
transitions but the userspace process gets in sync at some point.
At worst case, if no events were delivered to userspace, we make
sure that destroy events are successfully delivered. Basically,
if ctnetlink fails to deliver the destroy event, we remove the
conntrack entry from the hashes and we insert them in the dying
list, which contains inactive entries. Then, the conntrack timer
is added with an extra grace timeout of random32() % 15 seconds
to trigger the event again (this grace timeout is tunable via
/proc). The use of a limited random timeout value allows
distributing the "destroy" resends, thus, avoiding accumulating
lots "destroy" events at the same time. Event delivery may
re-order but we can identify them by means of the tuple plus
the conntrack ID.
The maximum number of conntrack entries (active or inactive) is
still handled by nf_conntrack_max. Thus, we may start dropping
packets at some point if we accumulate a lot of inactive conntrack
entries that did not successfully report the destroy event to
userspace.
During my stress tests consisting of setting a very small buffer
of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
flag, and generating lots of very small connections, I noticed
very few destroy entries on the fly waiting to be resend.
A simple way to test this patch consist of creating a lot of
entries, set a very small Netlink buffer in conntrackd (+ a patch
which is not in the git tree to set the BROADCAST_ERROR flag)
and invoke `conntrack -F'.
For expectations, no changes are introduced in this patch.
Currently, event delivery is only done for new expectations (no
events from expectation expiration, removal and confirmation).
In that case, they need a per-expectation event cache to implement
the same idea that is exposed in this patch.
This patch can be useful to provide reliable flow-accouting. We
still have to add a new conntrack extension to store the creation
and destroy time.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch moves the helper destruction to a function that lives
in nf_conntrack_helper.c. This new function is used in the patch
to add ctnetlink reliable event delivery.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch reworks the per-cpu event caching to use the conntrack
extension infrastructure.
The main drawback is that we consume more memory per conntrack
if event delivery is enabled. This patch is required by the
reliable event delivery that follows to this patch.
BTW, this patch allows you to enable/disable event delivery via
/proc/sys/net/netfilter/nf_conntrack_events in runtime, although
you can still disable event caching as compilation option.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Use mod_timer_pending() instead of atomic sequence of del_timer()/
add_timer(). mod_timer_pending() does not rearm an inactive timer,
so we don't need the conntrack lock anymore to make sure we don't
accidentally rearm a timer of a conntrack which is in the process
of being destroyed.
With this change, we don't need to take the global lock anymore at all,
counter updates can be performed under the per-conntrack lock.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Convert magic values 1 and -1 to NETDEV_TX_BUSY and NETDEV_TX_LOCKED respectively.
0 (NETDEV_TX_OK) is not changed to keep the noise down, except in very few cases
where its in direct proximity to one of the other values.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix up ATM drivers that return an errno value to qdisc_restart(), causing
qdisc_restart() to print a warning an requeue/retransmit the skb.
- lec: condition can only be remedied by userspace, until that retransmissions
Compile tested only.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
.ko is normally not included in Kconfig help, make it consistent.
Signed-off-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations,
and updates all drivers. This is needed for MSI support, because MSI
needs to know the total number of vectors upfront.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ lguest/9p compile fixes)
Add a linked list of all virtqueues for a virtio device: this helps for
debugging and is also needed for upcoming interface change.
Also, add a "name" field for clearer debug messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch changes FDB entry check for ATM LANE bridge integration.
There's no point in holding a FDB entry around SKB building.
br_fdb_get()/br_fdb_put() pair are changed into single br_fdb_test_addr()
hook that checks if the addr has FDB entry pointing to other port
to the one the request arrived on.
FDB entry refcounting is removed as it's not used anywhere else.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit b00055aacd " [NET] core: add
RFC2863 operstate" defined new interface flag values. Its
documentation specified that these flags could be accessed from user
space via SIOCGIFFLAGS. However, this does not work because the new
flags do not fit in that ioctl's argument width.
Change the documentation to match the code's behavior. Also change
the source to explicitly show the truncation. This _should_ have no
effect on executable code, and did not with gcc 4.2.4 generating x86
code.
A new ioctl could be defined to return all interface flags to user
space. However, since this has been broken for three years with no
one complaining, there doesn't seem much need. They are still
accessible via netlink.
Reported-by: "Fredrik Arnerup" <fredrik.arnerup@edgeware.tv>
Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix build error introduced by commit bb70dfa5 (netfilter: xtables:
consolidate comefrom debug cast access):
net/ipv4/netfilter/ip_tables.c: In function 'ipt_do_table':
net/ipv4/netfilter/ip_tables.c:421: error: 'comefrom' undeclared (first use in this function)
net/ipv4/netfilter/ip_tables.c:421: error: (Each undeclared identifier is reported only once
net/ipv4/netfilter/ip_tables.c:421: error: for each function it appears in.)
Signed-off-by: Patrick McHardy <kaber@trash.net>
Caused by an API update. The return value can be safely ignored, as
there is notthing we can do with it.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
The current code errors out the INCOMPLETE neigh entry skb queue only from
the timer if maximum probes have been attempted and there has been no reply.
This also causes the transtion to FAILED state.
However, the neigh entry can be also updated via Netlink to inform that the
address is unavailable. Currently, neigh_update() just stops the timers and
leaves the pending skb's unreleased. This results that the clean up code in
the timer callback is never called, preventing also proper garbage collection.
This fixes neigh_update() to process the pending skb queue immediately if
INCOMPLETE -> FAILED state transtion occurs due to a Netlink request.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.
This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.
We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.
We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.
As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.
skb_set_owner_w() doesnt call sock_hold() anymore
sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.
Drawback is that a skb->truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.
Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits)
Revert "x86, bts: reenable ptrace branch trace support"
tracing: do not translate event helper macros in print format
ftrace/documentation: fix typo in function grapher name
tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK
tracing: add protection around module events unload
tracing: add trace_seq_vprint interface
tracing: fix the block trace points print size
tracing/events: convert block trace points to TRACE_EVENT()
ring-buffer: fix ret in rb_add_time_stamp
ring-buffer: pass in lockdep class key for reader_lock
tracing: add annotation to what type of stack trace is recorded
tracing: fix multiple use of __print_flags and __print_symbolic
tracing/events: fix output format of user stack
tracing/events: fix output format of kernel stack
tracing/trace_stack: fix the number of entries in the header
ring-buffer: discard timestamps that are at the start of the buffer
ring-buffer: try to discard unneeded timestamps
ring-buffer: fix bug in ring_buffer_discard_commit
ftrace: do not profile functions when disabled
tracing: make trace pipe recognize latency format flag
...
rfkill currently requires a global lock within the
rfkill_register() function, and holds that lock over
calls to the set_block() methods. This means that we
cannot hold a lock around rfkill_register() that we
also require in set_block(), directly or indirectly.
Fix cfg80211 to register rfkill outside the block
locked by its global lock. Much of what cfg80211 does
in the locked block doesn't need to be locked anyway.
Reported-by: Vasanthakumar Thiagarajan <vasanth@atheros.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When associated, but probing the AP because we detected
beacon loss, we need to disable powersave to be able to
receive the probe response. Change the code to do that by
checking whether we're trying to probe when determining
the possibility of going into PS, and recalculate the PS
ability at the necessary spots.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We don't want to trigger moving between PS mode during scan,
because then we will sometimes end up sending nullfunc frames
during scan. We're supposed to only send one prior to scan
and after scan.
This fixes an oops which occured due to an assert in ath9k:
http://marc.info/?l=linux-wireless&m=124277331319024
The assert was happening because the rate control algorithm
figures it should find at least one valid dual stream or
single stream rate. Since we allow mac80211 to send nullfunc
frames during scan and dynamic PS was enabled at times we ended
up trying to send nullfunc frames for the target sta on the
wrong band for which we have no valid rate to communicate with
it. This breaks the assumptions in rate control.
We determine we also need to disable moving between PS modes
when not associated so lets just add that now as well, and we
should not have a ps_sdata when that interface cannot actually
go into PS because it's not associated.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The return type has more than two values, but it can validly
only ever return TX_DROP and TX_CONTINUE, so use a bool
instead of ieee80211_tx_result.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
addba_req_num[tid] is supposed to have the count of consecutive
addba request attempts on 'tid' which failed. This count is checked
against a retry threshold (3 times) before starting the addba negotiation.
This patch fixes the way this addba count is incremented/reset and thereby
avoids indefinite addba attempts.
Signed-off-by: Vasanthakumar Thiagarajan <vasanth@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Once rfkill-input is disabled, the "global" states will only be used as
default initial states.
Since the states will always be the same after resume, we shouldn't
generate events on resume.
Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
rfkill_set_global_sw_state() (previously rfkill_set_default()) will no
longer be exported by the rewritten rfkill core.
Instead, platform drivers which can provide persistent soft-rfkill state
across power-down/reboot should indicate their initial state by calling
rfkill_set_sw_state() before registration. Otherwise, they will be
initialized to a default value during registration by a set_block call.
We remove existing calls to rfkill_set_sw_state() which happen before
registration, since these had no effect in the old model. If these
drivers do have persistent state, the calls can be put back (subject
to testing :-). This affects hp-wmi and acer-wmi.
Drivers with persistent state will affect the global state only if
rfkill-input is enabled. This is required, otherwise booting with
wireless soft-blocked and pressing the wireless-toggle key once would
have no apparent effect. This special case will be removed in future
along with rfkill-input, in favour of a more flexible userspace daemon
(see Documentation/feature-removal-schedule.txt).
Now rfkill_global_states[n].def is only used to preserve global states
over EPO, it is renamed to ".sav".
Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In order to handle powersave frames properly we had needed
to pass these out to the device queues again, and introduce
the skb->requeue bit. This, however, also has unnecessary
overhead by needing to 'clean up' already tried frames, and
this clean-up code is also buggy when software encryption
is used.
Instead of sending the frames via the master netdev queue
again, simply put them into the pending queue. This also
fixes a problem where frames for that particular station
could be reordered when some were still on the software
queues and older ones are re-injected into the software
queue after them.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Now that we added the ioctl, there's no need to ask
the user to configure this. We will keep it enabled
for now, and eventually swap the default to n. Also
let embedded users select it only if they need it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
It is useful for debugging when we know if something disabled
the in-kernel rfkill input handler.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
mac80211 is checking is the skb is aligned on 32 bit boundary.
But it is checking against ethernet header, whereas Linux expect IP
header aligned. And ethernet ether size is 6*2+2=14, so aligning
ethernet header make IP header unaligned.
Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The minstrel rate controller periodically looks up rate indexes in
a sampling table. When accessing a specific row and column, minstrel
correctly does a bounds check which, on the surface, appears to handle
the case where mi->n_rates < 2. However, mi->sample_idx is actually
defined as an unsigned, so the right hand side is taken to be a huge
positive number when negative, and the check will always fail.
Consequently, the RC will overrun the array and cause random memory
corruption when communicating with a peer that has only a single rate.
The max value of mi->sample_idx is around 25 so casting to int should
have no ill effects.
Without the change, uptime is a few minutes under load with an AP
that has a single hard-coded rate, and both the AP and STA could
potentially crash. With the change, both lasted 12 hours with a
steady load.
Thanks to Ognjen Maric for providing the single-rate clue so I could
reproduce this.
This fixes http://bugzilla.kernel.org/show_bug.cgi?id=12490 on the
regression list (also http://bugzilla.kernel.org/show_bug.cgi?id=13000).
Cc: stable@kernel.org
Reported-by: Sergey S. Kostyliov <rathamahata@gmail.com>
Reported-by: Ognjen Maric <ognjen.maric@gmail.com>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Introduce per-conntrack locks and use them instead of the global protocol
locks to avoid contention. Especially tcp_lock shows up very high in
profiles on larger machines.
This will also allow to simplify the upcoming reliable event delivery patches.
Signed-off-by: Patrick McHardy <kaber@trash.net>
As the module uses rcu_call() we should make sure that all
rcu callback has been completed before removing the code.
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On module unload call rcu_barrier(), this is needed as synchronize_rcu()
is not strong enough. The kmem_cache_destroy() does invoke
synchronize_rcu() but it does not provide same protection.
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This module uses rcu_call() thus it should use rcu_barrier()
on module unload.
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Acked-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This module uses rcu_call() thus it should use rcu_barrier() on module unload.
Also fixed a trivial typo 'nfetlink' -> 'nfnetlink' in comment.
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The VLAN 8021q driver needs to call rcu_barrier() when unloading the module,
instead of syncronize_net(). This is needed to make sure that outstanding
call_rcu() callbacks have completed, before the callback function code is
removed on module unload.
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
cls_cgroup: Fix oops when user send improperly 'tc filter add' request
r8169: fix crash when large packets are received
If the XT_SOCKET_TRANSPARENT flag is set, enabled 'transparent'
socket option is required for the socket to be matched.
Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Add a netlink interface for configuration of IEEE 802.15.4 device. Also this
interface specifies events notification sent by devices towards higher layers.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Sergey Lapin <slapin@ossfans.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for communication over IEEE 802.15.4 networks. This implementation
is neither certified nor complete, but aims to that goal. This commit contains
only the socket interface for communication over IEEE 802.15.4 networks.
One can either send RAW datagrams or use SOCK_DGRAM to encapsulate data
inside normal IEEE 802.15.4 packets.
Configuration interface, drivers and software MAC 802.15.4 implementation will
follow.
Initial implementation was done by Maxim Gorbachyov, Maxim Osipov and Pavel
Smolensky as a research project at Siemens AG. Later the stack was heavily
reworked to better suit the linux networking model, and is now maitained
as an open project partially sponsored by Siemens.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Sergey Lapin <slapin@ossfans.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IEEE 802.15.4 stack requires several constants to be defined/adjusted.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Sergey Lapin <slapin@ossfans.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use PSCHED_SHIFT constant instead of '10' in PSCHED_US2NS() and
PSCHED_NS2US() macros to enable changing this value later.
Additionally use PSCHED_SHIFT in sch_hfsc SM_SHIFT and ISM_SHIFT
definitions. This part of the patch is based on feedback from
Patrick McHardy <kaber@trash.net>.
Reported-by: Antonio Almeida <vexwek@gmail.com>
Tested-by: Antonio Almeida <vexwek@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit f001fde5ea
(net: introduce a list of device addresses dev_addr_list (v6))
added one regression Vegard Nossum found in its testings.
With kmemcheck help, Vegard found some uninitialized memory
was read and reported to user, potentialy leaking kernel data.
( thread can be found on http://lkml.org/lkml/2009/5/30/177 )
dev_addr_init() incorrectly uses sizeof() operator. We were
initializing one byte instead of MAX_ADDR_LEN bytes.
Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I found a bug in cls_cgroup_change() in cls_cgroup.c.
cls_cgroup_change() expected tca[TCA_OPTIONS] was set from user space properly,
but tc in iproute2-2.6.29-1 (which I used) didn't set it.
In the current source code of tc in git, it set tca[TCA_OPTIONS].
git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
If we always use a newest iproute2 in git when we use cls_cgroup,
we don't face this oops probably.
But I think, kernel shouldn't panic regardless of use program's behaviour.
Signed-off-by: Minoru Usui <usui@mxm.nes.nec.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Passive OS fingerprinting netfilter module allows to passively detect
remote OS and perform various netfilter actions based on that knowledge.
This module compares some data (WS, MSS, options and it's order, ttl, df
and others) from packets with SYN bit set with dynamically loaded OS
fingerprints.
Fingerprint matching rules can be downloaded from OpenBSD source tree
or found in archive and loaded via netfilter netlink subsystem into
the kernel via special util found in archive.
Archive contains library file (also attached), which was shipped
with iptables extensions some time ago (at least when ipt_osf existed
in patch-o-matic).
Following changes were made in this release:
* added NLM_F_CREATE/NLM_F_EXCL checks
* dropped _rcu list traversing helpers in the protected add/remove calls
* dropped unneded structures, debug prints, obscure comment and check
Fingerprints can be downloaded from
http://www.openbsd.org/cgi-bin/cvsweb/src/etc/pf.os
or can be found in archive
Example usage:
-d switch removes fingerprints
Please consider for inclusion.
Thank you.
Passive OS fingerprint homepage (archives, examples):
http://www.ioremap.net/projects/osf
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Current conntrack code kills the ICMP conntrack entry as soon as
the first reply is received. This is incorrect, as we then see only
the first ICMP echo reply out of several possible duplicates as
ESTABLISHED, while the rest will be INVALID. Also this unnecessarily
increases the conntrackd traffic on H-A firewalls.
Make all the ICMP conntrack entries (including the replied ones)
last for the default of nf_conntrack_icmp{,v6}_timeout seconds.
Signed-off-by: Jan "Yenya" Kasprzak <kas@fi.muni.cz>
Signed-off-by: Patrick McHardy <kaber@trash.net>
With the re-write of the RFKILL subsystem it is now possible to easily
integrate RFKILL soft-switch support into the Bluetooth subsystem. All
Bluetooth devices will now get automatically RFKILL support.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Bluetooth source uses some endian conversion helpers, that in the end
translate to kernel standard routines. So remove this obfuscation since it
is fully pointless.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This adds the basic constants required to add support for L2CAP Enhanced
Retransmission feature.
Based on a patch from Nathan Holstein <nathan@lampreynetworks.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch fixes the errors without changing the l2cap.o binary:
text data bss dec hex filename
18059 568 0 18627 48c3 l2cap.o.after
18059 568 0 18627 48c3 l2cap.o.before
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The initial value of err is not used until it is set to -ENOMEM. So just
remove the initialization completely.
Based on a patch from Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Using the L2CAP_CONF_HINT macro is easier to understand than using a
hardcoded 0x80 value.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Use macros instead of hardcoded numbers to make the L2CAP source code
more readable.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Change the name of the Kernel CAPI exported function capi_ctr_reseted()
to something representing its purpose better.
Impact: renaming, no functional change
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
vfree() does its own 'NULL' check, so no need for check before
calling it.
Signed-off-by: Figo.zhang <figo1802@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Increment the iovec base by the offset passed in for the initial
copy_to_user() in memcpy_to_iovecend().
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_dma_unmap() is quite expensive for small packets,
because we use two different cache lines from skb_shared_info.
One to access nr_frags, one to access dma_maps[0]
Instead of dma_maps being an array of MAX_SKB_FRAGS + 1 elements,
let dma_head alone in a new dma_head field, close to nr_frags,
to reduce cache lines misses.
Tested on my dev machine (bnx2 & tg3 adapters), nice speedup !
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of num_dma_maps in struct skb_shared_info, as it seems unused.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On Thu, Jun 04, 2009 at 09:06:00PM +1000, Herbert Xu wrote:
>
> tun: Optimise handling of bogus gso->hdr_len
>
> As all current versions of virtio_net generate a value for the
> header length that's too small, we should optimise this so that
> we don't copy it twice. This can be done by ensuring that it is
> at least as large as the place where we'll write the checksum.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
With this applied we can strengthen the partial checksum check:
In skb_partial_csum_set we check to see if the checksum offset
is within the packet. However, we really should check that it
is within the skb head as that's the only bit we can modify
without copying.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The lock "protects" an assignment and a comparision of an integer.
When the caller of device_cmp() evaluates the result, nat->masq_index
may already have been changed (regardless if the lock is there or not).
So, the lock either has to be held during nf_ct_iterate_cleanup(),
or can be removed.
This does the latter.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Adds support for specifying a range of queues instead of a single queue
id. Flows will be distributed across the given range.
This is useful for multicore systems: Instead of having a single
application read packets from a queue, start multiple
instances on queues x, x+1, .. x+n. Each instance can process
flows independently.
Packets for the same connection are put into the same queue.
Signed-off-by: Holger Eitzenberger <heitzenberger@astaro.com>
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
We can use wildcard matching here, just like
ab4f21e6fb ("xtables: use NFPROTO_UNSPEC
in more extensions").
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
My mistake, I should have added that when cleaning up
rfkill and changing wimax.
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
ip_mc_drop_socket() method is declared in linux/igmp.h, which
is included anyhow in af_inet.c. So there is no need for this declaration.
This patch removes it from af_inet.c.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* iwm doesn't depend on cfg80211 or wireless extensions
* rndis wlan selects cfg80211 - needs to depend
* mac80211 selects cfg80211 - needs to depend
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The rfkill core didn't initialise the poll delayed work
because it assumed that polling was always done by specifying
the poll function. cfg80211, however, would like to start
polling only later, which is a valid use case and easy to
support, so change rfkill to always initialise the poll
delayed work and thus allow starting polling by calling the
rfkill_resume_polling() function after registration.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Fixes spares warning:
net/wireless/util.c:261:5: warning:
symbol 'ieee80211_get_mesh_hdrlen' was not declared. Should it be static?
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
To be easier on drivers and users, have cfg80211 register an
rfkill structure that drivers can access. When soft-killed,
simply take down all interfaces; when hard-killed the driver
needs to notify us and we will take down the interfaces
after the fact. While rfkilled, interfaces cannot be set UP.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Sometimes it is necessary to know how the state is,
and it is easier to query rfkill than keep track of
it somewhere else, so add a function for that. This
could later be expanded to return hard/soft block,
but so far that isn't necessary.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch introduces new cfg80211 API to set the TX power
via cfg80211, puts the wext code into cfg80211 and updates
mac80211 to use all that. The -ENETDOWN bits are a hack but
will go away soon.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The new code added by this patch will make rfkill create
a misc character device /dev/rfkill that userspace can use
to control rfkill soft blocks and get status of devices as
well as events when the status changes.
Using it is very simple -- when you open it you can read
a number of times to get the initial state, and every
further read blocks (you can poll) on getting the next
event from the kernel. The same structure you read is
also used when writing to it to change the soft block of
a given device, all devices of a given type, or all
devices.
This also makes CONFIG_RFKILL_INPUT selectable again in
order to be able to test without it present since its
functionality can now be replaced by userspace entirely
and distros and users may not want the input part of
rfkill interfering with their userspace code. We will
also write a userspace daemon to handle all that and
consequently add the input code to the feature removal
schedule.
In order to have rfkilld support both kernels with and
without CONFIG_RFKILL_INPUT (or new kernels after its
eventual removal) we also add an ioctl (that only exists
if rfkill-input is present) to disable rfkill-input.
It is not very efficient, but at least gives the correct
behaviour in all cases.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch completely rewrites the rfkill core to address
the following deficiencies:
* all rfkill drivers need to implement polling where necessary
rather than having one central implementation
* updating the rfkill state cannot be done from arbitrary
contexts, forcing drivers to use schedule_work and requiring
lots of code
* rfkill drivers need to keep track of soft/hard blocked
internally -- the core should do this
* the rfkill API has many unexpected quirks, for example being
asymmetric wrt. alloc/free and register/unregister
* rfkill can call back into a driver from within a function the
driver called -- this is prone to deadlocks and generally
should be avoided
* rfkill-input pointlessly is a separate module
* drivers need to #ifdef rfkill functions (unless they want to
depend on or select RFKILL) -- rfkill should provide inlines
that do nothing if it isn't compiled in
* the rfkill structure is not opaque -- drivers need to initialise
it correctly (lots of sanity checking code required) -- instead
force drivers to pass the right variables to rfkill_alloc()
* the documentation is hard to read because it always assumes the
reader is completely clueless and contains way TOO MANY CAPS
* the rfkill code needlessly uses a lot of locks and atomic
operations in locked sections
* fix LED trigger to actually change the LED when the radio state
changes -- this wasn't done before
Tested-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> [thinkpad]
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes an incorrect assumption (BUG_ON) made in
cfg80211 when handling country IE regulatory requests.
The assumption was that we won't try to call_crda()
twice for the same event and therefore we will not
recieve two replies through nl80211 for the regulatory
request. As it turns out it is true we don't call_crda()
twice for the same event, however, kobject_uevent_env()
*might* send the udev event twice and/or userspace can
simply process the udev event twice. We remove the BUG_ON()
and simply ignore the duplicate request.
For details refer to this thread:
http://marc.info/?l=linux-wireless&m=124149987921337&w=2
Cc: stable@kernel.org
Reported-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
NETDEV_UP is called after the device is set UP, but sometimes
it is useful to be able to veto the device UP. Introduce a
new NETDEV_PRE_UP notifier that can be used for exactly this.
The first use case will be cfg80211 denying interfaces to be
set UP if the device is known to be rfkill'ed.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When the SME requests to associate to an open AP
ieee80211_sta_set_extra_ie() can be called with zero IE
length. When this happens or when the extra IE has already
been set -EALREADY is passed down and the supplicant will
complain that the operation is already in progress and it will
not let us associate. We correct this by treating -EALREADY
from ieee80211_sta_set_extra_ie() as a success just as we do
for wext.
Cc: Shan.Palanisamy@Atheros.com
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
On non-AP interfaces userspace has no business interfering with
the station management, this can confuse mac80211 (and other
drivers probably wouldn't support it anyway). Allow adding and
removing stations only on AP interfaces.
(Reconcile this w/ previous version of patch posted with same
subject... -- JWL)
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
I accidentally transposed these in the patch that "fixed" the defaults,
leading to extremely low throughput because of the huge min CW.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Instead of hardcoding the key length for validation, use the
constants Zhu Yi recently added and add one for AES_CMAC too.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When a scan finishes only the program that asked for it
knows what kind of scan it was; let's tell everybody else
about the scan parameters as well so they can evaluate
the result of the scan better. Also helps with debugging.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We have some validation code in mac80211 but said code will
force an invalid AID to 0 which isn't a valid AID either;
instead require a valid AID (1-2007) to be passed in from
userspace in cfg80211 already. Also move the code before
the race comment since it can only be executed during STA
addition and thus is not racy.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Ivo has updated the driver to no longer use the change flag,
so we can remove that, but rt2x00 and ath5k still use the
actual value so let's mark it as deprecated too.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Prior implementation of the new sctp_connectx() call that returns
an association ID did not work correctly on non-blocking socket.
This is because we could not return both a EINPROGRESS error and
an association id. This is a new implementation that supports this.
Originally from Ivan Skytte Jørgensen <isj-sctp@i1.dk
Signed-off-by: Ivan Skytte Jørgensen <isj-sctp@i1.dk
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
RFC 5061 Section 5.1 ASCONF Chunk Procedures said:
B4) Re-transmit the ASCONF Chunk last sent and if possible choose an
alternate destination address (please refer to [RFC4960],
Section 6.4.1). An endpoint MUST NOT add new parameters to this
chunk; it MUST be the same (including its Sequence Number) as
the last ASCONF sent. An endpoint MAY, however, bundle an
additional ASCONF with new ASCONF parameters with the next
Sequence Number. For details, see Section 5.5.
This patch fix to choose an alternate destination address when
re-transmit the ASCONF chunk, with some dup codes cleanup.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
sctp_sack_timeout is defined as int, but the sysctl's maxsize is set
to sizeof(long) and the min/max are defined as long.
Signed-off-by: jean-mickael.guerin@6wind.com
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
If T4-rto timer is expired on a removed transport, kernel panic
will occur when we do failure management on that transport.
You can reproduce this use the following sequence:
Endpoint A Endpoint B
(ESTABLISHED) (ESTABLISHED)
<----------------- ASCONF
(SRC=X)
ASCONF ----------------->
(Delete IP Address = X)
<----------------- ASCONF-ACK
(Success Indication)
<----------------- ASCONF
(T4-rto timer expire)
This patch fixed the problem.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
If T2-shutdown timer is expired on a removed transport, kernel
panic will occur when we do failure management on that transport.
You can reproduce this use the following sequence:
Endpoint A Endpoint B
(ESTABLISHED) (ESTABLISHED)
<----------------- SHUTDOWN
(SRC=X)
ASCONF ----------------->
(Delete IP Address = X)
<----------------- ASCONF-ACK
(Success Indication)
<----------------- SHUTDOWN
(T2-shutdown timer expire)
This patch fixed the problem.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
If socket is create by PF_INET type, it can not used IPv6 address
to send/recv DATA. So only enable IPv6 address support on PF_INET6
socket.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Just fix a typo in net/sctp/sm_statetable.c.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Use Unresolvable Address error cause instead of Invalid Mandatory
Parameter error cause when process ASCONF chunk with invalid address
since address parameters are not mandatory in the ASCONF chunk.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
RFC5061 Section 5.2. Upon Reception of an ASCONF Chunk
V2) In processing the chunk, the receiver should build a
response message with the appropriate error TLVs, as
specified in the Parameter type bits, for any ASCONF
Parameter it does not understand. To indicate an
unrecognized parameter, Cause Type 8 should be used as
defined in the ERROR in Section 3.3.10.8, [RFC4960]. The
endpoint may also use the response to carry rejections for
other reasons, such as resource shortages, etc., using the
Error Cause TLV and an appropriate error condition.
So we should indicate an unrecognized parameter with error
SCTP_ERROR_UNKNOWN_PARAM in ACSONF-ACK chunk, not
SCTP_ERROR_INV_PARAM.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Define three accessors to get/set dst attached to a skb
struct dst_entry *skb_dst(const struct sk_buff *skb)
void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;
Delete skb->dst field
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb
Delete skb->rtable field
Setting rtable is not allowed, just set dst instead as rtable is an alias.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the notify chain infrastructure and replace it
by a simple function pointer. This issue has been mentioned in the
mailing list several times: the use of the notify chain adds
too much overhead for something that is only used by ctnetlink.
This patch also changes nfnetlink_send(). It seems that gfp_any()
returns GFP_KERNEL for user-context request, like those via
ctnetlink, inside the RCU read-side section which is not valid.
Using GFP_KERNEL is also evil since netlink may schedule(),
this leads to "scheduling while atomic" bug reports.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch simplifies the conntrack event caching system by removing
several events:
* IPCT_[*]_VOLATILE, IPCT_HELPINFO and IPCT_NATINFO has been deleted
since the have no clients.
* IPCT_COUNTER_FILLING which is a leftover of the 32-bits counter
days.
* IPCT_REFRESH which is not of any use since we always include the
timeout in the messages.
After this patch, the existing events are:
* IPCT_NEW, IPCT_RELATED and IPCT_DESTROY, that are used to identify
addition and deletion of entries.
* IPCT_STATUS, that notes that the status bits have changes,
eg. IPS_SEEN_REPLY and IPS_ASSURED.
* IPCT_PROTOINFO, that reports that internal protocol information has
changed, eg. the TCP, DCCP and SCTP protocol state.
* IPCT_HELPER, that a helper has been assigned or unassigned to this
entry.
* IPCT_MARK and IPCT_SECMARK, that reports that the mark has changed, this
covers the case when a mark is set to zero.
* IPCT_NATSEQADJ, to report that there's updates in the NAT sequence
adjustment.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
During the module removal there are no possible event listeners
since ctnetlink must be removed before to allow removing
nf_conntrack. This patch removes the event reporting for the
module removal case which is not of any use in the existing code.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch cleans up the message calculation to make it similar
to rtnetlink, moreover, it removes unneeded verbose information.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch is a cleanup, it removes the `nowait' parameter
from all *fill_info() function since it is always set to one.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch cleans up the message handling path in two aspects:
* it uses NLMSG_LENGTH() instead of NLMSG_SPACE() like rtnetlink
does in this case to check if there is enough room for the
Netlink/nfnetlink headers. No need to check for the padding room.
* it removes a redundant header size checking that has been
already do at the beginning of the function.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The patch below adds supporting TCP simultaneous open to conntrack. The
unused LISTEN state is replaced by a new state (SYN_SENT2) denoting the
second SYN sent from the reply direction in the new case. The state table
is updated and the function tcp_in_window is modified to handle
simultaneous open.
The functionality can fairly easily be tested by socat. A sample tcpdump
recording
23:21:34.244733 IP (tos 0x0, ttl 64, id 49224, offset 0, flags [DF], proto TCP (6), length 60) 192.168.0.254.2020 > 192.168.0.1.2020: S, cksum 0xe75f (correct), 3383710133:3383710133(0) win 5840 <mss 1460,sackOK,timestamp 173445629 0,nop,wscale 7>
23:21:34.244783 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.0.1.2020 > 192.168.0.254.2020: R, cksum 0x0253 (correct), 0:0(0) ack 3383710134 win 0
23:21:36.038680 IP (tos 0x0, ttl 64, id 28092, offset 0, flags [DF], proto TCP (6), length 60) 192.168.0.1.2020 > 192.168.0.254.2020: S, cksum 0x704b (correct), 2634546729:2634546729(0) win 5840 <mss 1460,sackOK,timestamp 824213 0,nop,wscale 1>
23:21:36.038777 IP (tos 0x0, ttl 64, id 49225, offset 0, flags [DF], proto TCP (6), length 60) 192.168.0.254.2020 > 192.168.0.1.2020: S, cksum 0xb179 (correct), 3383710133:3383710133(0) ack 2634546730 win 5840 <mss 1460,sackOK,timestamp 173447423 824213,nop,wscale 7>
23:21:36.038847 IP (tos 0x0, ttl 64, id 28093, offset 0, flags [DF], proto TCP (6), length 52) 192.168.0.1.2020 > 192.168.0.254.2020: ., cksum 0xebad (correct), ack 3383710134 win 2920 <nop,nop,timestamp 824213 173447423>
and the corresponding netlink events:
[NEW] tcp 6 120 SYN_SENT src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 [UNREPLIED] src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020
[UPDATE] tcp 6 120 LISTEN src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020
[UPDATE] tcp 6 60 SYN_RECV src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020
[UPDATE] tcp 6 432000 ESTABLISHED src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020 [ASSURED]
The RST packet was dropped in the raw table, thus it did not reach
conntrack. nfnetlink_conntrack is unpatched so it shows the new SYN_SENT2
state as the old unused LISTEN.
With TCP simultaneous open support we satisfy REQ-2 in RFC 5382 ;-) .
Additional minor correction in this patch is that in order to catch
uninitialized reply directions, "td_maxwin == 0" is used instead of
"td_end == 0" because the former can't be true except in uninitialized
state while td_end may accidentally be equal to zero in the mid of a
connection.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch fixes a bug which unconfigured struct tcf_proto keeps
chaining in tc_ctl_tfilter(), and avoids kernel panic in
cls_cgroup_classify() when we use cls_cgroup.
When we execute 'tc filter add', tcf_proto is allocated, initialized
by classifier's init(), and chained. After it's chained,
tc_ctl_tfilter() calls classifier's change(). When classifier's
change() fails, tc_ctl_tfilter() does not free and keeps tcf_proto.
In addition, cls_cgroup is initialized in change() not in init(). It
accesses unconfigured struct tcf_proto which is chained before
change(), then hits Oops.
Signed-off-by: Minoru Usui <usui@mxm.nes.nec.co.jp>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Tested-by: Minoru Usui <usui@mxm.nes.nec.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
After some discussion offline with Christoph Lameter and David Stevens
regarding multicast behaviour in Linux, I'm submitting a slightly
modified patch from the one Christoph submitted earlier.
This patch provides a new socket option IP_MULTICAST_ALL.
In this case, default behaviour is _unchanged_ from the current
Linux standard. The socket option is set by default to provide
original behaviour. Sockets wishing to receive data only from
multicast groups they join explicitly will need to clear this
socket option.
Signed-off-by: Nivedita Singhvi <niv@us.ibm.com>
Signed-off-by: Christoph Lameter<cl@linux.com>
Acked-by: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Print-out the error value when sock_alloc_send_skb() fails in
the IPv6 neighbor discovery code - can be useful for debugging.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the unlikely event that gprs_writeable() and gprs_xmit() check for
writeability at the same, we could stop the device queue forever.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add 'autoconf' and 'disable_ipv6' parameters to the IPv6 module.
The first controls if IPv6 addresses are autoconfigured from
prefixes received in Router Advertisements. The IPv6 loopback
(::1) and link-local addresses are still configured.
The second controls if IPv6 addresses are desired at all. No
IPv6 addresses will be added to any interfaces.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts unicast address list to standard list_head using
previously introduced struct netdev_hw_addr. It also relaxes the
locking. Original spinlock (still used for multicast addresses) is not
needed and is no longer used for a protection of this list. All
reading and writing takes place under rtnl (with no changes).
I also removed a possibility to specify the length of the address
while adding or deleting unicast address. It's always dev->addr_len.
The convertion touched especially e1000 and ixgbe codes when the
change is not so trivial.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
drivers/net/bnx2.c | 13 +--
drivers/net/e1000/e1000_main.c | 24 +++--
drivers/net/ixgbe/ixgbe_common.c | 14 ++--
drivers/net/ixgbe/ixgbe_common.h | 4 +-
drivers/net/ixgbe/ixgbe_main.c | 6 +-
drivers/net/ixgbe/ixgbe_type.h | 4 +-
drivers/net/macvlan.c | 11 +-
drivers/net/mv643xx_eth.c | 11 +-
drivers/net/niu.c | 7 +-
drivers/net/virtio_net.c | 7 +-
drivers/s390/net/qeth_l2_main.c | 6 +-
drivers/scsi/fcoe/fcoe.c | 16 ++--
include/linux/netdevice.h | 18 ++--
net/8021q/vlan.c | 4 +-
net/8021q/vlan_dev.c | 10 +-
net/core/dev.c | 195 +++++++++++++++++++++++++++-----------
net/dsa/slave.c | 10 +-
net/packet/af_packet.c | 4 +-
18 files changed, 227 insertions(+), 137 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
Somewhat luckily, I was looking into these parts with very fine
comb because I've made somewhat similar changes on the same
area (conflicts that arose weren't that lucky though). The loop
was very much overengineered recently in commit 915219441d
(tcp: Use SKB queue and list helpers instead of doing it
by-hand), while it basically just wants to know if there are
skbs after 'skb'.
Also it got broken because skb1 = skb->next got translated into
skb1 = skb1->next (though abstracted) improperly. Note that
'skb1' is pointing to previous sk_buff than skb or NULL if at
head. Two things went wrong:
- We'll kfree 'skb' on the first iteration instead of the
skbuff following 'skb' (it would require required SACK reneging
to recover I think).
- The list head case where 'skb1' is NULL is checked too early
and the loop won't execute whereas it previously did.
Conclusion, mostly revert the recent changes which makes the
cset very messy looking but using proper accessor in the
previous-like version.
The effective changes against the original can be viewed with:
git-diff 915219441d566f1da0caa0e262be49b666159e17^ \
net/ipv4/tcp_input.c | sed -n -e '57,70 p'
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-2.6.30' of git://linux-nfs.org/~bfields/linux:
svcrdma: dma unmap the correct length for the RPCRDMA header page.
nfsd: Revert "svcrpc: take advantage of tcp autotuning"
nfsd: fix hung up of nfs client while sync write data to nfs server
ipgre_tunnel_xmit() might need skb->dst, so tell dev_hard_start_xmit()
to no release it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
clip_start_xmit() needs skb->dst so tell dev_hard_start_xmit()
to no release it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipip_tunnel_xmit() might need skb->dst, so tell dev_hard_start_xmit()
to no release it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This comment suggested storing two pieces of state in the
LLC skb control block, and in fact we do. Someone did
the implementation but never killed this todo comment :-)
Signed-off-by: David S. Miller <davem@davemloft.net>
wimax connection manager / daemon has to know what is current
state of the device. Previously it was only possible to get
notification whet state has changed.
Note:
By mistake, the new generic netlink's number for
WIMAX_GNL_OP_STATE_GET was declared inserting into the existing list
of API calls, not appending; thus, it'd break existing API.
Fixed by Inaky Perez-Gonzalez <inaky@linux.intel.com> by moving to
the tail, where we add to the interface, not modify the interface.
Thanks to Stephen Hemminger <shemminger@vyatta.com> for catching this.
Signed-off-by: Paulius Zaleckas <paulius.zaleckas@teltonika.lt>
Funcion documentation for wimax_msg_alloc() and wimax_msg_send() needs
to clarify that they can be used in the very early stages of a
wimax_dev lifecycle.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
The svcrdma module was incorrectly unmapping the RPCRDMA header page.
On IBM pserver systems this causes a resource leak that results in
running out of bus address space (10 cthon iterations will reproduce it).
The code was mapping the full page but only unmapping the actual header
length. The fix is to only map the header length.
I also cleaned up the use of ib_dma_map_page() calls since the unmap
logic always uses ib_dma_unmap_single(). I made these symmetrical.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This reverts commit 47a14ef1af "svcrpc:
take advantage of tcp autotuning", which uncovered some further problems
in the server rpc code, causing significant performance regressions in
common cases.
We will likely reinstate this patch after releasing 2.6.30 and applying
some work on the underlying fixes to the problem (developed by Trond).
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Olga Kornievskaia <aglo@citi.umich.edu>
Cc: Jim Rees <rees@umich.edu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Use ALIGN() and PTR_ALIGN() macros instead of handcoding them.
Get rid of NETDEV_ALIGN_CONST ugly define
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds CTA_PROTOINFO_DCCP_HANDSHAKE_SEQ that exposes
the u64 handshake sequence number to user-space.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This problem was introduced in 72961ecf84
since no space was reserved for the new attributes NFULA_HWTYPE,
NFULA_HWLEN and NFULA_HWHEADER.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The function dl_seq_show() returns 1 (equal to SEQ_SKIP) in case
a seq_printf() call return -1. It should return -1.
This SEQ_SKIP behavior brakes processing the proc file e.g. via a
pipe or just through less.
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: Patrick McHardy <kaber@trash.net>
There is no need to repeatedly check flush when comparing TCP
options for GRO as it will be false 99% of the time where it
matters.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch stores the two shinfo pointers in local variables
because they're used over and over again in skb_gro_receive.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reverses the direction of the frags array copy in
skb_gro_receive in order simplify the loop conditional. It
also avoids touching the first element of the original frags
array.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As we know the only packets which need the final pskb_may_pull
are completely non-linear, and have all the required bits in
frag0, we can perform a straight memcpy instead of going through
pskb_may_pull and doing skb_copy_bits.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch optimises the IPv4 GRO code by using 32-bit loads
(instead of 16-bit ones) on the ID and length checks in the receive
function.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the overwhelming majority of cases, skb_gro_header's return
value cannot be NULL. Yet we must check it because of its current
form. This patch splits it up into multiple functions in order
to avoid this.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
By caching frag0_len, we can avoid checking both frag0 and the
length separately in skb_gro_header. This helps as skb_gro_header
is called four times per packet which amounts to a few million
times at 10Gb/s.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of checking len > mss || len == 0, we can accomplish
both by checking (len - 1) > mss using the unsigned wraparound.
At nearly a million times a second, this might just help.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The window has already been checked as part of the flag word
so there is no need to check it explicitly.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of doing two 16-bit operations for the source/destination
ports, we can do one 32-bit operation to take care both.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently skb_gro_header is used for packets which put the hardware
header in skb->data with the rest in frags. Since the drivers that
need this optimisation all provide completely non-linear packets,
we can gain extra optimisations by only performing the frag0
optimisation for completely non-linear packets.
In particular, we can simply test frag0 (instead of skb_headlen)
to see whether the optimisation is in force.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch stores the offset/headlen in local variables as they're
used repeatedly in skb_gro_offset.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function skb_gro_header is called four times per packet which
quickly adds up at 10Gb/s. This patch inlines it to allow better
optimisations.
Some architectures perform multiplication for page_address, which
is done by each skb_gro_header invocation. This patch caches that
value in skb->cb to avoid the unnecessary multiplications.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
gcc does a poor job at generating code for the memcpy of the frags
array in skb_gro_receive, which is the primary purpose of that
function when merging frags. In particular, it can't utilise the
alignment information of the source and destination. This patch
open-codes the copy so we process words instead of bytes.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The calls to flush_work() are pointless in a single thread workqueue
and they are actually causing a lockdep warning.
=============================================
[ INFO: possible recursive locking detected ]
2.6.30-rc6-02911-gbb803cf #16
---------------------------------------------
bluetooth/2518 is trying to acquire lock:
(bluetooth){+.+.+.}, at: [<c0130c14>] flush_work+0x28/0xb0
but task is already holding lock:
(bluetooth){+.+.+.}, at: [<c0130424>] worker_thread+0x149/0x25e
other info that might help us debug this:
2 locks held by bluetooth/2518:
#0: (bluetooth){+.+.+.}, at: [<c0130424>] worker_thread+0x149/0x25e
#1: (&conn->work_del){+.+...}, at: [<c0130424>] worker_thread+0x149/0x25e
stack backtrace:
Pid: 2518, comm: bluetooth Not tainted 2.6.30-rc6-02911-gbb803cf #16
Call Trace:
[<c03d64d9>] ? printk+0xf/0x11
[<c0140d96>] __lock_acquire+0x7ce/0xb1b
[<c0141173>] lock_acquire+0x90/0xad
[<c0130c14>] ? flush_work+0x28/0xb0
[<c0130c2e>] flush_work+0x42/0xb0
[<c0130c14>] ? flush_work+0x28/0xb0
[<f8b84966>] del_conn+0x1c/0x84 [bluetooth]
[<c0130469>] worker_thread+0x18e/0x25e
[<c0130424>] ? worker_thread+0x149/0x25e
[<f8b8494a>] ? del_conn+0x0/0x84 [bluetooth]
[<c0133843>] ? autoremove_wake_function+0x0/0x33
[<c01302db>] ? worker_thread+0x0/0x25e
[<c013355a>] kthread+0x45/0x6b
[<c0133515>] ? kthread+0x0/0x6b
[<c01034a7>] kernel_thread_helper+0x7/0x10
Based on a report by Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Tested-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
BUS_ID_SIZE is really no more, and device names are dynamically
allocated and thus can be any necessary size.
So remove the BUG check here making sure BUS_ID_SIZE is at least
as large as IFNAMSIZ.
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid reading the unsynchronized value cs->classid multiple times,
since it could change concurrently from non-zero to zero; this would
result in the classifier returning a positive result with a bogus
(zero) classid.
Signed-off-by: Paul Menage <menage@google.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFSv4: Fix the case where NFSv4 renewal fails
nfs: fix build error in nfsroot with initconst
XPRTRDMA: fix client rpcrdma FRMR registration on mlx4 devices
mlx4/connectX FRMR requires local write enable together with remote
rdma write enable. This fixes NFS/RDMA operation over the ConnectX
Infiniband HCA in the default memreg mode.
Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Tom Talpey <tmtalpey@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We would like to get rid of netdev->trans_start = jiffies; that about all net
drivers have to use in their start_xmit() function, and use txq->trans_start
instead.
This can be done generically in core network, as suggested by David.
Some devices, (particularly loopback) dont need trans_start update, because
they dont have transmit watchdog. We could add a new device flag, or rely
on fact that txq->tran_start can be updated is txq->xmit_lock_owner is
different than -1. Use a helper function to hide our choice.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Right-shifts of signed integers are implementation-defined so unportable.
With feedback from: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes ssthresh accounting issues in tcp_vegas when cwnd decreases
Signed-off-by: Doug Leith <doug.leith@nuim.ie>
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert L Mathews discovered that some clients send evil TCP RST segments,
which are accepted by netfilter conntrack but discarded by the
destination. Thus the conntrack entry is destroyed but the destination
retransmits data until timeout.
The same technique, i.e. sending properly crafted RST segments, can easily
be used to bypass connlimit/connbytes based restrictions (the sample
script written by Robert can be found in the netfilter mailing list
archives).
The patch below adds a new flag and new field to struct ip_ct_tcp_state so
that checking RST segments can be made more strict and thus TCP conntrack
can catch the invalid ones: the RST segment is accepted only if its
sequence number higher than or equal to the highest ack we seen from the
other direction. (The last_ack field cannot be reused because it is used
to catch resent packets.)
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
All drivers are already converted to new net_device_ops API
and nobody uses old API anymore.
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hi:
skbuff: Copy csum instead of csum_start/csum_offset
It's easier to copy the u32 csum instead of its two u16
constituents.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
Signed-off-by: David S. Miller <davem@davemloft.net>
Hi:
skbuff: Move new __skb_clone code into __copy_skb_header
It seems that people just keep on adding stuff to __skb_clone
instead __copy_skb_header. This is wrong as it means your brand-new
attributes won't always get copied as you intended.
This patch moves them to the right place, and adds a comment to
prevent this from happening again.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Thanks,
Signed-off-by: David S. Miller <davem@davemloft.net>