Commit Graph

12167 Commits

Author SHA1 Message Date
Greg Banks
59a252ff8c knfsd: avoid overloading the CPU scheduler with enormous load averages
Avoid overloading the CPU scheduler with enormous load averages
when handling high call-rate NFS loads.  When the knfsd bottom half
is made aware of an incoming call by the socket layer, it tries to
choose an nfsd thread and wake it up.  As long as there are idle
threads, one will be woken up.

If there are lot of nfsd threads (a sensible configuration when
the server is disk-bound or is running an HSM), there will be many
more nfsd threads than CPUs to run them.  Under a high call-rate
low service-time workload, the result is that almost every nfsd is
runnable, but only a handful are actually able to run.  This situation
causes two significant problems:

1. The CPU scheduler takes over 10% of each CPU, which is robbing
   the nfsd threads of valuable CPU time.

2. At a high enough load, the nfsd threads starve userspace threads
   of CPU time, to the point where daemons like portmap and rpc.mountd
   do not schedule for tens of seconds at a time.  Clients attempting
   to mount an NFS filesystem timeout at the very first step (opening
   a TCP connection to portmap) because portmap cannot wake up from
   select() and call accept() in time.

Disclaimer: these effects were observed on a SLES9 kernel, modern
kernels' schedulers may behave more gracefully.

The solution is simple: keep in each svc_pool a counter of the number
of threads which have been woken but have not yet run, and do not wake
any more if that count reaches an arbitrary small threshold.

Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16
synthetic client threads simulating an rsync (i.e. recursive directory
listing) workload reading from an i386 RH9 install image (161480
regular files in 10841 directories) on the server.  That tree is small
enough to fill in the server's RAM so no disk traffic was involved.
This setup gives a sustained call rate in excess of 60000 calls/sec
before being CPU-bound on the server.  The server was running 128 nfsds.

Profiling showed schedule() taking 6.7% of every CPU, and __wake_up()
taking 5.2%.  This patch drops those contributions to 3.0% and 2.2%.
Load average was over 120 before the patch, and 20.9 after.

This patch is a forward-ported version of knfsd-avoid-nfsd-overload
which has been shipping in the SGI "Enhanced NFS" product since 2006.
It has been posted before:

http://article.gmane.org/gmane.linux.nfs/10374

Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-03-18 17:38:41 -04:00
Patrick McHardy
0f5b3e85a3 netfilter: ctnetlink: fix rcu context imbalance
Introduced by 7ec47496 (netfilter: ctnetlink: cleanup master conntrack assignation):

net/netfilter/nf_conntrack_netlink.c:1275:2: warning: context imbalance in 'ctnetlink_create_conntrack' - different lock contexts for basic block

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-18 17:36:40 +01:00
Florian Westphal
711d60a9e7 netfilter: remove nf_ct_l4proto_find_get/nf_ct_l4proto_put
users have been moved to __nf_ct_l4proto_find.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-18 17:30:50 +01:00
Florian Westphal
cd91566e4b netfilter: ctnetlink: remove remaining module refcounting
Convert the remaining refcount users.

As pointed out by Patrick McHardy, the protocols can be accessed safely using RCU.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-18 17:28:37 +01:00
David S. Miller
af4330631c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-03-17 15:04:31 -07:00
David S. Miller
2d6a5e9500 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/igb/igb_main.c
	drivers/net/qlge/qlge_main.c
	drivers/net/wireless/ath9k/ath9k.h
	drivers/net/wireless/ath9k/core.h
	drivers/net/wireless/ath9k/hw.c
2009-03-17 15:01:30 -07:00
David S. Miller
f10023a4ef Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2009-03-17 14:29:22 -07:00
David S. Miller
4ada8107f4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2009-03-17 13:12:47 -07:00
Herbert Xu
303c6a0251 gro: Fix legacy path napi_complete crash
On the legacy netif_rx path, I incorrectly tried to optimise
the napi_complete call by using __napi_complete before we reenable
IRQs.  This simply doesn't work since we need to flush the held
GRO packets first.

This patch fixes it by doing the obvious thing of reenabling
IRQs first and then calling napi_complete.

Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-17 13:11:29 -07:00
Herbert Xu
2ffb455819 gro: Fix vlan/netpoll check again
Jarek Poplawski pointed out that my previous fix is broken for
VLAN+netpoll as if netpoll is enabled we'd end up in the normal
receive path instead of the VLAN receive path.

This patch fixes it by calling the VLAN receive hook.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-17 13:10:52 -07:00
Luis R. Rodriguez
73d54c9e74 cfg80211: add regulatory netlink multicast group
This allows us to send to userspace "regulatory" events.
For now we just send an event when we change regulatory domains.
We also notify userspace when devices are using their own custom
world roaming regulatory domains.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:09:40 -04:00
Luis R. Rodriguez
7db90f4a25 cfg80211: move enum reg_set_by to nl80211.h
We do this so we can later inform userspace who set the
regulatory domain and provide details of the request.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:09:40 -04:00
Luis R. Rodriguez
0fee54cab7 cfg80211: remove REGDOM_SET_BY_INIT
This is not used as we can always just assume the first
regulatory domain set will _always_ be a static regulatory
domain. REGDOM_SET_BY_CORE will be the first request from
cfg80211 for a regdomain and that then populates the first
regulatory request.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:09:39 -04:00
Herton Ronaldo Krzesinski
1a28c78b46 mac80211: deauth before flushing STA information
Even after commit "mac80211: deauth when interface is marked down"
(e327b847 on Linus tree), userspace still isn't notified when interface
goes down. There isn't a problem with this commit, but because of other
code changes it doesn't work on kernels >= 2.6.28 (works if same/similar
change applied on 2.6.27 for example).

The issue is as follows: after commit "mac80211: restructure disassoc/deauth
flows" in 2.6.28, the call to ieee80211_sta_deauthenticate added by
commit e327b847 will not work: because we do sta_info_flush(local, sdata)
inside ieee80211_stop (iface.c), all stations in interface are cleared, so
when calling ieee80211_sta_deauthenticate->ieee80211_set_disassoc (mlme.c),
inside ieee80211_set_disassoc we have this in the beginning:

         sta = sta_info_get(local, ifsta->bssid);
         if (!sta) {

The !sta check triggers, thus the function returns early and
ieee80211_sta_send_apinfo(sdata, ifsta) later isn't called, so
wpa_supplicant/userspace isn't notified with SIOCGIWAP.

This commit moves deauthentication to before flushing STA info
(sta_info_flush), thus the above can't happen and userspace is really
notified when interface goes down.

Signed-off-by: Herton Ronaldo Krzesinski <herton@mandriva.com.br>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:09:39 -04:00
Helmut Schaa
af88b9078d mac80211: handle failed scan requests in STA mode
If cfg80211 requests a scan it awaits either a return code != 0 from
the scan function or the cfg80211_scan_done to be called. In case of
a STA mac80211's scan function ever returns 0 and queues the scan request.
If ieee80211_sta_work is executed and ieee80211_start_scan fails for
some reason cfg80211_scan_done will never be called but cfg80211 still
thinks the scan was triggered successfully and will refuse any future
scan requests due to drv->scan_req not being cleaned up.

If a scan is triggered from within the MLME a similar problem appears. If
ieee80211_start_scan returns an error, local->scan_req will not be reset
and mac80211 will refuse any future scan requests.

Hence, in both cases call ieee80211_scan_failed (which notifies cfg80211
and resets local->scan_req) if ieee80211_start_scan returns an error.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:09:38 -04:00
Luis R. Rodriguez
ec329acef9 cfg80211: fix max tx power for world regdom on 5 GHz to 20dBm
This is the lowest value amongst countries which do enable 5 GHz operation.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:09:29 -04:00
Luis R. Rodriguez
611b6a82aa cfg80211: Enable passive scan on channels 12-14 for world roaming
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:09:29 -04:00
Jouni Malinen
0eeb59fe2c mac80211: Fix WMM ACM parsing and AC downgrade operation
Incorrect local->wmm_acm bits were set for AC_BK and AC_BE. Fix this
and add some comments to make it easier to understand the AC-to-UP(pair)
mapping. Set the wmm_acm bits (and show WMM debug) even if the driver
does not implement conf_tx() handler.

In addition, fix the ACM-based AC downgrade code to not use the
highest priority in error cases. We need to break the loop to get the
correct AC_BK value (3) instead of returning 0 (which would indicate
AC_VO). The comment here was not really very useful either, so let's
provide somewhat more helpful description of the situation.

Since it is very unlikely that the ACM flag would be set for AC_BK and
AC_BE, these bugs are not likely to be seen in real life networks.
Anyway, better do these things correctly should someone really use
silly AP configuration (and to pass some functionality tests, too).

Remove the TODO comment about handling ACM. Downgrading AC is
perfectly valid mechanism for ACM. Eventually, we may add support for
WMM-AC and send a request for a TS, but anyway, that functionality
won't be here at the location of this TODO comment.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:09:27 -04:00
Jouni Malinen
055249d20d mac80211: Fix panic on fragmentation with power saving
It was possible to hit a kernel panic on NULL pointer dereference in
dev_queue_xmit() when sending power save buffered frames to a STA that
woke up from sleep. This happened when the buffered frame was requeued
for transmission in ap_sta_ps_end(). In order to avoid the panic, copy
the skb->dev and skb->iif values from the first fragment to all other
fragments.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:01:59 -04:00
John W. Linville
6f16bf3bdb lib80211: silence excessive crypto debugging messages
When they were part of the now defunct ieee80211 component, these
messages were only visible when special debugging settings were enabled.
Let's mirror that with a new lib80211 debugging Kconfig option.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-16 18:01:58 -04:00
Herbert Xu
d1c76af9e2 GRO: Move netpoll checks to correct location
As my netpoll fix for net doesn't really work for net-next, we
need this update to move the checks into the right place.  As it
stands we may pass freed skbs to netpoll_receive_skb.

This patch also introduces a netpoll_rx_on function to avoid GRO
completely if we're invoked through netpoll.  This might seem
paranoid but as netpoll may have an external receive hook it's
better to be safe than sorry.  I don't think we need this for
2.6.29 though since there's nothing immediately broken by it.

This patch also moves the GRO_* return values to netdevice.h since
VLAN needs them too (I tried to avoid this originally but alas
this seems to be the easiest way out).  This fixes a bug in VLAN
where it continued to use the old return value 2 instead of the
correct GRO_DROP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-16 10:50:02 -07:00
Pablo Neira Ayuso
0269ea4937 netfilter: xtables: add cluster match
This patch adds the iptables cluster match. This match can be used
to deploy gateway and back-end load-sharing clusters. The cluster
can be composed of 32 nodes maximum (although I have only tested
this with two nodes, so I cannot tell what is the real scalability
limit of this solution in terms of cluster nodes).

Assuming that all the nodes see all packets (see below for an
example on how to do that if your switch does not allow this), the
cluster match decides if this node has to handle a packet given:

	(jhash(source IP) % total_nodes) & node_mask

For related connections, the master conntrack is used. The following
is an example of its use to deploy a gateway cluster composed of two
nodes (where this is the node 1):

iptables -I PREROUTING -t mangle -i eth1 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth1 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth1 \
	-m mark ! --mark 0xffff -j DROP
iptables -A PREROUTING -t mangle -i eth2 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth2 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth2 \
	-m mark ! --mark 0xffff -j DROP

And the following commands to make all nodes see the same packets:

ip maddr add 01:00:5e:00:01:01 dev eth1
ip maddr add 01:00:5e:00:01:02 dev eth2
arptables -I OUTPUT -o eth1 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:01
arptables -I INPUT -i eth1 --h-length 6 \
	--destination-mac 01:00:5e:00:01:01 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
arptables -I OUTPUT -o eth2 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:02
arptables -I INPUT -i eth2 --h-length 6 \
	--destination-mac 01:00:5e:00:01:02 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27

In the case of TCP connections, pickup facility has to be disabled
to avoid marking TCP ACK packets coming in the reply direction as
valid.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

BTW, some final notes:

 * This match mangles the skbuff pkt_type in case that it detects
PACKET_MULTICAST for a non-multicast address. This may be done in
a PKTTYPE target for this sole purpose.
 * This match supersedes the CLUSTERIP target.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 17:10:36 +01:00
Cyrill Gorcunov
1546000fe8 net: netfilter conntrack - add per-net functionality for DCCP protocol
Module specific data moved into per-net site and being allocated/freed
during net namespace creation/deletion.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 16:30:49 +01:00
Cyrill Gorcunov
81a1d3c31e net: sysctl_net - use net_eq to compare nets
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 16:23:30 +01:00
Linus Torvalds
8e91f178a2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
  r8169: revert "r8169: read MAC address from EEPROM on init (2nd attempt)"
  r8169: use hardware auto-padding.
  igb: remove ASPM L0s workaround
  netxen: remove old flash check.
  mv643xx_eth: fix unicast address filter corruption on mtu change
  xfrm: Fix xfrm_state_find() wrt. wildcard source address.
  emac: Fix clock control for 405EX and 405EXr chips
  ixgbe: fix multiple unicast address support
  via-velocity: Fix DMA mapping length errors on transmit.
  qlge: bugfix: Pad outbound frames smaller than 60 bytes.
  qlge: bugfix: Move netif_napi_del() to common call point.
  qlge: bugfix: Tell hw to strip vlan header.
  qlge: bugfix: Increase filter on inbound csum.
  dnet: replace obsolete *netif_rx_* functions with *napi_*
  net: Add be2net driver.
  dnet: Fix warnings on 64-bit.
  dnet: Dave DNET ethernet controller driver (updated)
  ipv6:  Fix BUG when disabled ipv6 module is unloaded
  bnx2x: Using DMAE to initialize the chip
  bnx2x: Casting page alignment
  ...
2009-03-16 07:56:58 -07:00
Christoph Paasch
d1238d5337 netfilter: conntrack: check for NEXTHDR_NONE before header sanity checking
NEXTHDR_NONE doesn't has an IPv6 option header, so the first check
for the length will always fail and results in a confusing message
"too short" if debugging enabled. With this patch, we check for
NEXTHDR_NONE before length sanity checkings are done.

Signed-off-by: Christoph Paasch <christoph.paasch@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:52:11 +01:00
Christoph Paasch
ec8d540969 netfilter: conntrack: fix dropping packet after l4proto->packet()
We currently use the negative value in the conntrack code to encode
the packet verdict in the error. As NF_DROP is equal to 0, inverting
NF_DROP makes no sense and, as a result, no packets are ever dropped.

Signed-off-by: Christoph Paasch <christoph.paasch@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:51:29 +01:00
Pablo Neira Ayuso
626ba8fbac netfilter: ctnetlink: fix crash during expectation creation
This patch fixes a possible crash due to the missing initialization
of the expectation class when nf_ct_expect_related() is called.

Reported-by: BORBELY Zoltan <bozo@andrews.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:50:51 +01:00
Jan Engelhardt
acc738fec0 netfilter: xtables: avoid pointer to self
Commit 784544739a (netfilter: iptables:
lock free counters) broke a number of modules whose rule data referenced
itself. A reallocation would not reestablish the correct references, so
it is best to use a separate struct that does not fall under RCU.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:35:29 +01:00
Jonathan Corbet
76398425bb Move FASYNC bit handling to f_op->fasync()
Removing the BKL from FASYNC handling ran into the challenge of keeping the
setting of the FASYNC bit in filp->f_flags atomic with regard to calls to
the underlying fasync() function.  Andi Kleen suggested moving the handling
of that bit into fasync(); this patch does exactly that.  As a result, we
have a couple of internal API changes: fasync() must now manage the FASYNC
bit, and it will be called without the BKL held.

As it happens, every fasync() implementation in the kernel with one
exception calls fasync_helper().  So, if we make fasync_helper() set the
FASYNC bit, we can avoid making any changes to the other fasync()
functions - as long as those functions, themselves, have proper locking.
Most fasync() implementations do nothing but call fasync_helper() - which
has its own lock - so they are easily verified as correct.  The BKL had
already been pushed down into the rest.

The networking code has its own version of fasync_helper(), so that code
has been augmented with explicit FASYNC bit handling.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Miller <davem@davemloft.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2009-03-16 08:32:27 -06:00
Scott James Remnant
95ba434f89 netfilter: auto-load ip_queue module when socket opened
The ip_queue module is missing the net-pf-16-proto-3 alias that would
causae it to be auto-loaded when a socket of that type is opened.  This
patch adds the alias.

Signed-off-by: Scott James Remnant <scott@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:31:10 +01:00
Scott James Remnant
26c3b67806 netfilter: auto-load ip6_queue module when socket opened
The ip6_queue module is missing the net-pf-16-proto-13 alias that would
cause it to be auto-loaded when a socket of that type is opened.  This
patch adds the alias.

Signed-off-by: Scott James Remnant <scott@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:30:14 +01:00
Pablo Neira Ayuso
f0a3c0869f netfilter: ctnetlink: move event reporting for new entries outside the lock
This patch moves the event reporting outside the lock section. With
this patch, the creation and update of entries is homogeneous from
the event reporting perspective. Moreover, as the event reporting is
done outside the lock section, the netlink broadcast delivery can
benefit of the yield() call under congestion.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:28:09 +01:00
Pablo Neira Ayuso
e098360f15 netfilter: ctnetlink: cleanup conntrack update preliminary checkings
This patch moves the preliminary checkings that must be fulfilled
to update a conntrack, which are the following:

 * NAT manglings cannot be updated
 * Changing the master conntrack is not allowed.

This patch is a cleanup.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:27:22 +01:00
Pablo Neira Ayuso
7ec4749675 netfilter: ctnetlink: cleanup master conntrack assignation
This patch moves the assignation of the master conntrack to
ctnetlink_create_conntrack(), which is where it really belongs.
This patch is a cleanup.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:25:46 +01:00
Pablo Neira Ayuso
1db7a748df netfilter: conntrack: increase drop stats if sequence adjustment fails
This patch increases the statistics of packets drop if the sequence
adjustment fails in ipv4_confirm().

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:18:50 +01:00
Stephen Hemminger
67c0d57930 netfilter: Kconfig spelling fixes (trivial)
Signed-off-by: Stephen Hemminger <sheminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:17:23 +01:00
Christoph Paasch
9d2493f88f netfilter: remove IPvX specific parts from nf_conntrack_l4proto.h
Moving the structure definitions to the corresponding IPvX specific header files.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:15:35 +01:00
Eric Leblond
c7a913cd55 netfilter: print the list of register loggers
This patch modifies the proc output to add display of registered
loggers. The content of /proc/net/netfilter/nf_log is modified. Instead
of displaying a protocol per line with format:
	proto:logger
it now displays:
	proto:logger (comma_separated_list_of_loggers)
NONE is used as keyword if no logger is used.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 14:55:27 +01:00
Eric Leblond
ca735b3aaa netfilter: use a linked list of loggers
This patch modifies nf_log to use a linked list of loggers for each
protocol. This list of loggers is read and write protected with a
mutex.

This patch separates registration and binding. To be used as
logging module, a module has to register calling nf_log_register()
and to bind to a protocol it has to call nf_log_bind_pf().
This patch also converts the logging modules to the new API. For nfnetlink_log,
it simply switchs call to register functions to call to bind function and
adds a call to nf_log_register() during init. For other modules, it just
remove a const flag from the logger structure and replace it with a
__read_mostly.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 14:54:21 +01:00
Ilpo Järvinen
afece1c658 tcp: make sure xmit goal size never becomes zero
It's not too likely to happen, would basically require crafted
packets (must hit the max guard in tcp_bound_to_half_wnd()).
It seems that nothing that bad would happen as there's tcp_mems
and congestion window that prevent runaway at some point from
hurting all too much (I'm not that sure what all those zero
sized segments we would generate do though in write queue).
Preventing it regardless is certainly the best way to go.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-15 20:09:55 -07:00
Ilpo Järvinen
2a3a041c4e tcp: cache result of earlier divides when mss-aligning things
The results is very unlikely change every so often so we
hardly need to divide again after doing that once for a
connection. Yet, if divide still becomes necessary we
detect that and do the right thing and again settle for
non-divide state. Takes the u16 space which was previously
taken by the plain xmit_size_goal.

This should take care part of the tso vs non-tso difference
we found earlier.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-15 20:09:55 -07:00
Ilpo Järvinen
0c54b85f28 tcp: simplify tcp_current_mss
There's very little need for most of the callsites to get
tp->xmit_goal_size updated. That will cost us divide as is,
so slice the function in two. Also, the only users of the
tp->xmit_goal_size are directly behind tcp_current_mss(),
so there's no need to store that variable into tcp_sock
at all! The drop of xmit_goal_size currently leaves 16-bit
hole and some reorganization would again be necessary to
change that (but I'm aiming to fill that hole with u16
xmit_goal_size_segs to cache the results of the remaining
divide to get that tso on regression).

Bring xmit_goal_size parts into tcp.c

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-15 20:09:54 -07:00
Ilpo Järvinen
72211e9050 tcp: don't check mtu probe completion in the loop
It seems that no variables clash such that we couldn't do
the check just once later on. Therefore move it.

Also kill dead obvious comment, dead argument and add
unlikely since this mtu probe does not happen too often.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-15 20:09:53 -07:00
Ilpo Järvinen
c887e6d2d9 tcp: consolidate paws check
Wow, it was quite tricky to merge that stream of negations
but I think I finally got it right:

check & replace_ts_recent:
(s32)(rcv_tsval - ts_recent) >= 0                  => 0
(s32)(ts_recent - rcv_tsval) <= 0                  => 0

discard:
(s32)(ts_recent - rcv_tsval)  > TCP_PAWS_WINDOW    => 1
(s32)(ts_recent - rcv_tsval) <= TCP_PAWS_WINDOW    => 0

I toggled the return values of tcp_paws_check around since
the old encoding added yet-another negation making tracking
of truth-values really complicated.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-15 20:09:52 -07:00
Ilpo Järvinen
c43d558a51 tcp: kill dead end_seq variable in clean_rtx_queue
I've already forgotten what for this was necessary, anyway
it's no longer used (if it ever was).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-15 20:09:51 -07:00
Ilpo Järvinen
5861f8e58d tcp: remove pointless .dsack/.num_sacks code
In the pure assignment case, the earlier zeroing is
still in effect.

David S. Miller raised concerns if the ifs are there to avoid
dirtying cachelines. I came to these conclusions:

> We'll be dirty it anyway (now that I check), the first "real" statement
> in tcp_rcv_established is:
>
>       tp->rx_opt.saw_tstamp = 0;
>
> ...that'll land on the same dword. :-/
>
> I suppose the blocks are there just because they had more complexity
> inside when they had to calculate the eff_sacks too (maybe it would
> have been better to just remove them in that drop-patch so you would
> have had less head-ache :-)).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-15 20:09:51 -07:00
Jarek Poplawski
7cd0a63872 pkt_sched: Change misleading code in class delete.
While looking for a possible reason of bugzilla report on HTB oops:
http://bugzilla.kernel.org/show_bug.cgi?id=12858
I found the code in htb_delete calling htb_destroy_class on zero
refcount is very misleading: it can suggest this is a common path, and
destroy is called under sch_tree_lock. Actually, this can never happen
like this because before deletion cops->get() is done, and after
delete a class is still used by tclass_notify. The class destroy is
always called from cops->put(), so without sch_tree_lock.

This doesn't mean much now (since 2.6.27) because all vulnerable calls
were moved from htb_destroy_class to htb_delete, but there was a bug
in older kernels. The same change is done for other classful scheds,
which, it seems, didn't have similar locking problems here.

Reported-by: m0sia <m0sia@m0sia.ru>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-15 20:00:19 -07:00
Roel Kluin
a2025b8b10 tcp: '< 0' test on unsigned
promote 'cnt' to size_t, to match 'len'.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 16:05:14 -07:00
Roel Kluin
8db09f26f9 x25: '< 0' and '>= 0' test on unsigned
skb->len is an unsigned int, so the test in x25_rx_call_request() always
evaluates to true.

len in x25_sendmsg() is unsigned as well. so -ERRORS returned by x25_output()
are not noticed.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 16:04:12 -07:00
Denys Fedoryshchenko
73ce7b01b4 ipv4: arp announce, arp_proxy and windows ip conflict verification
Windows (XP at least) hosts on boot, with configured static ip, performing 
address conflict detection, which is defined in RFC3927.
Here is quote of important information:

"
An ARP announcement is identical to the ARP Probe described above, 
except    that now the sender and target IP addresses are both set 
to the host's newly selected IPv4 address. 
"

But it same time this goes wrong with RFC5227.
"
The 'sender IP address' field MUST be set to all zeroes; this is to avoid
polluting ARP caches in other hosts on the same link in the case
where the address turns out to be already in use by another host.
"

When ARP proxy configured, it must not answer to both cases, because 
it is address conflict verification in any case. For Windows it is just 
causing to detect false "ip conflict". Already there is code for RFC5227, so 
just trivially we just check also if source ip == target ip.

Signed-off-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 16:02:07 -07:00
David S. Miller
08ec9af1c0 xfrm: Fix xfrm_state_find() wrt. wildcard source address.
The change to make xfrm_state objects hash on source address
broke the case where such source addresses are wildcarded.

Fix this by doing a two phase lookup, first with fully specified
source address, next using saddr wildcarded.

Reported-by: Nicolas Dichtel <nicolas.dichtel@dev.6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 14:22:40 -07:00
Yi Zou
1c8dbcf649 [SCSI] net: add NETIF_F_FCOE_CRC to can_checksum_protocol
Add FC CRC offload check for ETH_P_FCOE.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-03-13 15:11:38 -05:00
Neil Horman
273ae44b9c Network Drop Monitor: Adding Build changes to enable drop monitor
Network Drop Monitor: Adding Build changes to enable drop monitor

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 include/linux/Kbuild |    1 +
 net/Kconfig          |   11 +++++++++++
 net/core/Makefile    |    1 +
 3 files changed, 13 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 12:09:29 -07:00
Neil Horman
9a8afc8d39 Network Drop Monitor: Adding drop monitor implementation & Netlink protocol
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 include/linux/net_dropmon.h |   56 +++++++++
 net/core/drop_monitor.c     |  263 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 319 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 12:09:29 -07:00
Neil Horman
ead2ceb0ec Network Drop Monitor: Adding kfree_skb_clean for non-drops and modifying end-of-line points for skbs
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 include/linux/skbuff.h |    4 +++-
 net/core/datagram.c    |    2 +-
 net/core/skbuff.c      |   22 ++++++++++++++++++++++
 net/ipv4/arp.c         |    2 +-
 net/ipv4/udp.c         |    2 +-
 net/packet/af_packet.c |    2 +-
 6 files changed, 29 insertions(+), 5 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 12:09:28 -07:00
Neil Horman
4893d39e86 Network Drop Monitor: Add trace declaration for skb frees
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

 include/trace/skb.h   |    8 ++++++++
 net/core/Makefile     |    2 ++
 net/core/net-traces.c |   29 +++++++++++++++++++++++++++++
 3 files changed, 39 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 12:09:27 -07:00
malc
6fc791ee63 sctp: add Adaptation Layer Indication parameter only when it's set
RFC5061 states:

        Each adaptation layer that is defined that wishes
        to use this parameter MUST specify an adaptation code point in an
        appropriate RFC defining its use and meaning.

If the user has not set one - assume they don't want to sent the param
with a zero Adaptation Code Point.

Rationale - Currently the IANA defines zero as reserved - and
1 as the only valid value - so we consider zero to be unset - to save
adding a boolean to the socket structure.

Including this parameter unconditionally causes endpoints that do not
understand it to report errors unnecessarily.

Signed-off-by: Malcolm Lashley <mlashley@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 11:37:58 -07:00
Wei Yongjun
76595024ff sctp: fix to send FORWARD-TSN chunk only if peer has such capable
RFC3758 Section 3.3.1.  Sending Forward-TSN-Supported param in INIT

   Note that if the endpoint chooses NOT to include the parameter, then
   at no time during the life of the association can it send or process
   a FORWARD TSN.

If peer does not support PR-SCTP capable, don't send FORWARD-TSN chunk
to peer.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 11:37:58 -07:00
Wei Yongjun
5ffad5aceb sctp: fix to indicate ASCONF support in INIT-ACK only if peer has such capable
This patch fix to indicate ASCONF support in INIT-ACK only if peer has
such capable.

This patch also fix to calc the chunk size if peer has no FWD-TSN
capable.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 11:37:56 -07:00
Vlad Yasevich
5e8f3f703a sctp: simplify sctp listening code
sctp_inet_listen() call is split between UDP and TCP style.  Looking
at the code, the two functions are almost the same and can be
merged into a single helper.  This also fixes a bug that was
fixed in the UDP function, but missed in the TCP function.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-13 11:37:56 -07:00
Rusty Russell
a70f730282 cpumask: replace node_to_cpumask with cpumask_of_node.
Impact: cleanup

node_to_cpumask (and the blecherous node_to_cpumask_ptr which
contained a declaration) are replaced now everyone implements
cpumask_of_node.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13 14:49:46 +10:30
Trond Myklebust
5e3771ce2d SUNRPC: Ensure that xs_nospace return values are propagated
If xs_nospace() finds that the socket has disconnected, it attempts to
return ENOTCONN, however that value is then squashed by the callers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:38:01 -04:00
Trond Myklebust
8a2cec295f SUNRPC: Delay, then retry on connection errors.
Enforce the comment in xs_tcp_connect_worker4/xs_tcp_connect_worker6 that
we should delay, then retry on certain connection errors.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:38:01 -04:00
Trond Myklebust
2a4919919a SUNRPC: Return EAGAIN instead of ENOTCONN when waking up xprt->pending
While we should definitely return socket errors to the task that is
currently trying to send data, there is no need to propagate the same error
to all the other tasks on xprt->pending. Doing so actually slows down
recovery, since it causes more than one tasks to attempt socket recovery.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:38:00 -04:00
Trond Myklebust
482f32e65d SUNRPC: Handle socket errors correctly
Ensure that we pick up and handle socket errors as they occur.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:38:00 -04:00
Trond Myklebust
c8485e4d63 SUNRPC: Handle ECONNREFUSED correctly in xprt_transmit()
If we get an ECONNREFUSED error, we currently go to sleep on the
'xprt->sending' wait queue. The problem is that no timeout is set there,
and there is nothing else that will wake the task up later.

We should deal with ECONNREFUSED in call_status, given that is where we
also deal with -EHOSTDOWN, and friends.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:37:59 -04:00
Trond Myklebust
40d2549db5 SUNRPC: Don't disconnect if a connection is still in progress.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:37:58 -04:00
Trond Myklebust
670f945731 SUNRPC: Ensure we set XPRT_CLOSING only after we've sent a tcp FIN...
...so that we can distinguish between when we need to shutdown and when we
don't. Also remove the call to xs_tcp_shutdown() from xs_tcp_connect(),
since xprt_connect() makes the same test.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:37:58 -04:00
Trond Myklebust
15f081ca8d SUNRPC: Avoid an unnecessary task reschedule on ENOTCONN
If the socket is unconnected, and xprt_transmit() returns ENOTCONN, we
currently give up the lock on the transport channel. Doing so means that
the lock automatically gets assigned to the next task in the xprt->sending
queue, and so that task needs to be woken up to do the actual connect.

The following patch aims to avoid that unnecessary task switch.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:37:57 -04:00
Tom Talpey
441e3e2429 SUNRPC: dynamically load RPC transport modules on-demand
Provide an api to attempt to load any necessary kernel RPC
client transport module automatically. By convention, the
desired module name is "xprt"+"transport name". For example,
when NFS mounting with "-o proto=rdma", attempt to load the
"xprtrdma" module.

Signed-off-by: Tom Talpey <tmtalpey@gmail.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:37:56 -04:00
Tom Talpey
b38ab40ad5 XPRTRDMA: correct an rpc/rdma inline send marshaling error
Certain client rpc's which contain both lengthy page-contained
metadata and a non-empty xdr_tail buffer require careful handling
to avoid overlapped memory copying. Rearranging of existing rpcrdma
marshaling code avoids it; this fixes an NFSv4 symlink creation error
detected with connectathon basic/test8 to multiple servers.

Signed-off-by: Tom Talpey <tmtalpey@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:37:55 -04:00
Tom Talpey
b1e1e15877 SVCRDMA: remove faulty assertions in rpc/rdma chunk validation.
Certain client-provided RPCRDMA chunk alignments result in an
additional scatter/gather entry, which triggered nfs/rdma server
assertions incorrectly. OpenSolaris nfs/rdma client connectathon
testing was blocked by these in the special/locking section.

Signed-off-by: Tom Talpey <tmtalpey@gmail.com>
Cc: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:37:55 -04:00
Chuck Lever
fe315e76fc SUNRPC: Avoid spurious wake-up during UDP connect processing
To clear out old state, the UDP connect workers unconditionally invoke
xs_close() before proceeding with a new connect.  Nowadays this causes
a spurious wake-up of the task waiting for the connect to complete.

This is a little racey, but usually harmless.  The waiting task
immediately retries the connect via a call_bind/call_connect sequence,
which usually finds the transport already in the connected state
because the connect worker has finished in the background.

To avoid a spurious wake-up, factor the xs_close() logic that resets
the underlying socket into a helper, and have the UDP connect workers
call that helper instead of xs_close().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:10:21 -04:00
Trond Myklebust
01d37c428a SUNRPC: xprt_connect() don't abort the task if the transport isn't bound
If the transport isn't bound, then we should just return ENOTCONN, letting
call_connect_status() and/or call_status() deal with retrying. Currently,
we appear to abort all pending tasks with an EIO error.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:09:39 -04:00
Trond Myklebust
fba91afbec SUNRPC: Fix an Oops due to socket not set up yet...
We can Oops in both xs_udp_send_request() and xs_tcp_send_request() if the
call to xs_sendpages() returns an error due to the socket not yet being
set up.
Deal with that situation by returning a new error: ENOTSOCK, so that we
know to avoid dereferencing transport->sock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:06:41 -04:00
Eric Dumazet
fc1ad92dfc tcp: allow timestamps even if SYN packet has tsval=0
Some systems send SYN packets with apparently wrong RFC1323 timestamp
option values [timestamp tsval=0 tsecr=0].
It might be for security reasons (http://www.secuobs.com/plugs/25220.shtml )

Linux TCP stack ignores this option and sends back a SYN+ACK packet
without timestamp option, thus many TCP flows cannot use timestamps
and lose some benefit of RFC1323.

Other operating systems seem to not care about initial tsval value, and let
tcp flows to negotiate timestamp option.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-11 09:23:57 -07:00
John Dykstra
ff8cf9a938 ipv6: Fix BUG when disabled ipv6 module is unloaded
Do not try to "uninitialize" ipv6 if its initialization had been skipped
because module parameter disable=1 had been specified.

Reported-by:  Thomas Backlund <tmb@mandriva.org>
Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Acked-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-11 09:22:51 -07:00
Ingo Molnar
78b020d035 Merge branches 'x86/cleanups', 'x86/kexec', 'x86/mce2' and 'linus' into x86/core 2009-03-11 10:49:15 +01:00
Trond Myklebust
eb9b55ab4d SUNRPC: Tighten up the task locking rules in __rpc_execute()
We should probably not be testing any flags after we've cleared the
RPC_TASK_RUNNING flag, since rpc_make_runnable() is then free to assign the
rpc_task to another workqueue, which may then destroy it.

We can fix any races with rpc_make_runnable() by ensuring that we only
clear the RPC_TASK_RUNNING flag while holding the rpc_wait_queue->lock that
the task is supposed to be sleeping on (and then checking whether or not
the task really is sleeping).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-10 20:33:16 -04:00
Stephen Hemminger
a2205472c3 net: fix warning about non-const string
Since dev_set_name takes a printf style string, new gcc complains
if arg is not const.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-10 05:22:43 -07:00
Stephen Hemminger
7546dd97d2 net: convert usage of packet_type to read_mostly
Protocols that use packet_type can be __read_mostly section for better
locality. Elminate any unnecessary initializations of NULL.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-10 05:22:43 -07:00
David S. Miller
d5df2a1613 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bnx2x_main.c
	drivers/net/wireless/iwlwifi/iwl3945-base.c
	drivers/net/wireless/rt2x00/rt73usb.c
2009-03-10 05:04:16 -07:00
Roel Kluin
bd05f28e1a cfg80211: test before subtraction on unsigned
freq_diff is unsigned, so test before subtraction

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-06 15:54:32 -05:00
Sujith
707c1b4e68 mac80211: Update IBSS beacon timestamp properly
In IBSS mode, the beacon timestamp has to be filled with the
BSS's timestamp when joining, and set to zero when creating
a new BSS.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-05 14:39:40 -05:00
Vivek Natarajan
25c9c87528 mac80211: Always send a null data frame if TIM bit is set.
If the AP thinks we are in power save state eventhough we are not truly
in that state, it sets the TIM bit and does not send a data frame unless
we send a null data frame to correct the state in the AP.
This might happen if the null data frame for wake up is lost in the air
after we disable power save.

Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-05 14:39:38 -05:00
Sujith
e65c22633c mac80211: Fix TKIP/WEP HT capability handling
There is no need to parse the AP's HT capabilities if
the STA uses TKIP/WEP cipher. This allows the rate control
module to choose the correct(legacy) rate table.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-05 14:39:37 -05:00
Johannes Berg
24776cfd55 mac80211: Fix quality reporting for wireless stats
Since "mac80211/cfg80211: move iwrange handler to cfg80211", the
results for link quality from "iwlist scan" and "iwconfig" commands
have been very different. The results are now consistent.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reported- and tested-by: Larry Finger <larry.finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-05 14:39:35 -05:00
Sujith
e31ae05083 mac80211: Notify the driver only when the beacon interval changes
Currently, the driver is unconditionally notified of beacon
interval. This is a problem in AP mode, because the driver has
to know that the beacon interval has actualy changed to recalculate
TBTT and reset the HW TSF. Fix this to make mac80211 notify the driver
only when the beacon interval has been reconfigured to a new value.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-03-05 14:39:32 -05:00
David S. Miller
508827ff0a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/tokenring/tmspci.c
	drivers/net/ucc_geth_mii.c
2009-03-05 02:06:47 -08:00
David S. Miller
9d40bbda59 vlan: Fix vlan-in-vlan crashes.
As analyzed by Patrick McHardy, vlan needs to reset it's
netdev_ops pointer in it's ->init() function but this
leaves the compat method pointers stale.

Add a netdev_resync_ops() and call it from the vlan code.

Any other driver which changes ->netdev_ops after register_netdevice()
will need to call this new function after doing so too.

With help from Patrick McHardy.

Tested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 23:46:25 -08:00
David S. Miller
54acd0efab net: Fix missing dev->neigh_setup in register_netdevice().
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 23:01:02 -08:00
Jarek Poplawski
a883bf564e pkt_sched: act_police: Fix a rate estimator test.
A commit c1b56878fb "tc: policing requires
a rate estimator" introduced a test which invalidates previously working
configs, based on examples from iproute2: doc/actions/actions-general.
This is too rigorous: a rate estimator is needed only when police's
"avrate" option is used.

Reported-by: Joao Correia <joaomiguelcorreia@gmail.com>
Diagnosed-by: John Dykstra <john.dykstra1@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 17:38:10 -08:00
Brian Haley
fb13d9f9e4 SCTP: change sctp_ctl_sock_init() to try IPv4 if IPv6 fails
Change sctp_ctl_sock_init() to try IPv4 if IPv6 socket registration
fails.  Required if the IPv6 module is loaded with "disable=1", else
SCTP will fail to load.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 03:20:26 -08:00
Brian Haley
fe7ca2e1e8 IPv6: add "disable" module parameter support to ipv6.ko
Add "disable" module parameter support to ipv6.ko by specifying
"disable=1" on module load.  We just do the minimum of initializing
inetsw6[] so calls from other modules to inet6_register_protosw()
won't OOPs, then bail out.  No IPv6 addresses or sockets can be
created as a result, and a reboot is required to enable IPv6.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 03:19:08 -08:00
Eric Biederman
0c5c2d3089 neigh: Allow for user space users of the neighbour table
Currently it is possible to do just about everything with the arp table
from user space except treat an entry like you are using it.  To that end
implement and a flag NTF_USE that when set in a netwlink update request
treats the neighbour table entry like the kernel does on the output path.

This allows user space applications to share the kernel's arp cache.

Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 00:03:08 -08:00
Meelis Roos
4222474519 net: fix tokenring license
Currently, modular tokenring ("tr") lacks a license and fails to load:

tr: module license 'unspecified' taints kernel.
tr: Unknown symbol proc_net_fops_create

Beacuse of this, no tokenring driver can load if it depends on modular 
tr. Fix this by adding GPL module license as it is in the kernel.

With this fix, tr module loads fine and tms380 driver also loads. Well, 
it does'nt work but that's a different bug.

Signed-off-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 23:48:50 -08:00
Pablo Neira Ayuso
4843b93c96 netlink: invert error code in netlink_set_err()
The callers of netlink_set_err() currently pass a negative value
as parameter for the error code. However, sk->sk_err wants a
positive error value. Without this patch, skb_recv_datagram() called
by netlink_recvmsg() may return a positive value to report an error.

Another choice to fix this is to change callers to pass a positive
error value, but this seems a bit inconsistent and error prone
to me. Indeed, the callers of netlink_set_err() assumed that the
(usual) negative value for error codes was fine before this patch :).

This patch also includes some documentation in docbook format
for netlink_set_err() to avoid this sort of confusion.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 23:37:30 -08:00
Geert Uytterhoeven
e9cc8bddae netlink: Move netlink attribute parsing support to lib
Netlink attribute parsing may be used even if CONFIG_NET is not set.
Move it from net/netlink to lib and control its inclusion based on the new
config symbol CONFIG_NLATTR, which is selected by CONFIG_NET.

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2009-03-04 14:53:30 +08:00
Randy Dunlap
abb79972b4 rds: fix iband RDMA dependencies
Fix RDS Infiniband dependencies for RDMA so that these
build errors won't happen:

ERROR: "rdma_accept" [net/rds/rds.ko] undefined!
ERROR: "rdma_destroy_id" [net/rds/rds.ko] undefined!
ERROR: "rdma_connect" [net/rds/rds.ko] undefined!
ERROR: "rdma_destroy_qp" [net/rds/rds.ko] undefined!
ERROR: "rdma_listen" [net/rds/rds.ko] undefined!
ERROR: "rdma_notify" [net/rds/rds.ko] undefined!
ERROR: "rdma_create_id" [net/rds/rds.ko] undefined!
ERROR: "rdma_create_qp" [net/rds/rds.ko] undefined!
ERROR: "rdma_bind_addr" [net/rds/rds.ko] undefined!
ERROR: "rdma_resolve_route" [net/rds/rds.ko] undefined!
ERROR: "rdma_disconnect" [net/rds/rds.ko] undefined!
ERROR: "rdma_reject" [net/rds/rds.ko] undefined!
ERROR: "rdma_resolve_addr" [net/rds/rds.ko] undefined!

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 21:39:40 -08:00
Ingo Molnar
91d75e209b Merge branch 'x86/core' into core/percpu 2009-03-04 02:29:19 +01:00
Eric W. Biederman
17edde5209 netns: Remove net_alive
It turns out that net_alive is unnecessary, and the original problem
that led to it being added was simply that the icmp code thought
it was a network device and wound up being unable to handle packets
while there were still packets in the network namespace.

Now that icmp and tcp have been fixed to properly register themselves
this problem is no longer present and we have a stronger guarantee
that packets will not arrive in a network namespace then that provided
by net_alive in netif_receive_skb.  So remove net_alive allowing
packet reception run a little faster.

Additionally document the strong reason why network namespace cleanup
is safe so that if something happens again someone else will have
a chance of figuring it out.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 01:14:27 -08:00
Eric W. Biederman
2f20d2e667 tcp: Like icmp use register_pernet_subsys
To remove the possibility of packets flying around when network
devices are being cleaned up use reisger_pernet_subsys instead of
register_pernet_device.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 01:14:21 -08:00
Eric W. Biederman
6eb0777228 netns: Fix icmp shutdown.
Recently I had a kernel panic in icmp_send during a network namespace
cleanup.  There were packets in the arp queue that failed to be sent
and we attempted to generate an ICMP host unreachable message, but
failed because icmp_sk_exit had already been called.

The network devices are removed from a network namespace and their
arp queues are flushed before we do attempt to shutdown subsystems
so this error should have been impossible.

It turns out icmp_init is using register_pernet_device instead
of register_pernet_subsys.  Which resulted in icmp being shut down
while we still had the possibility of packets in flight, making
a nasty NULL pointer deference in interrupt context possible.

Changing this to register_pernet_subsys fixes the problem in
my testing.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 01:14:15 -08:00
Daniel Lezcano
176c39af29 netns: fix addrconf_ifdown kernel panic
When a network namespace is destroyed the network interfaces are
all unregistered, making addrconf_ifdown called by the netdevice
notifier. 
In the other hand, the addrconf exit method does a loop on the network
devices and does addrconf_ifdown on each of them. But the ordering of 
the netns subsystem is not right because it uses the register_pernet_device
instead of register_pernet_subsys. If we handle the loopback as
any network device, we can safely use register_pernet_subsys.

But if we use register_pernet_subsys, the addrconf exit method will do
exactly what was already done with the unregistering of the network
devices. So in definitive, this code is pointless.

I removed the netns addrconf exit method and moved the code to the
addrconf cleanup function.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 01:06:45 -08:00
Stephen Hemminger
b325fddb7f ipv6: Fix sysctl unregistration deadlock
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 00:47:47 -08:00
Stephen Hemminger
5a5990d309 net: Avoid race between network down and sysfs
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 00:47:46 -08:00
Vlad Yasevich
7e99013a50 sctp: Fix broken RTO-doubling for data retransmits
Commit faee47cdbf
(sctp: Fix the RTO-doubling on idle-link heartbeats)
broke the RTO doubling for data retransmits.  If the
heartbeat was sent before the data T3-rtx time, the
the RTO will not double upon the T3-rtx expiration.
Distingish between the operations by passing an argument
to the function.

Additionally, Wei Youngjun pointed out that our treatment
of requested HEARTBEATS and timer HEARTBEATS is the same
wrt resetting congestion window.  That needs to be separated,
since user requested HEARTBEATS should not treat the link
as idle.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:49:18 -08:00
Wei Yongjun
f61f6f82c9 sctp: use time_before or time_after for comparing jiffies
The functions time_before or time_after are more robust
for comparing jiffies against other values.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:49:18 -08:00
Wei Yongjun
c6db93a58f sctp: fix the length check in sctp_getsockopt_maxburst()
The code in sctp_getsockopt_maxburst() doesn't allow len to be larger
then struct sctp_assoc_value, which is a common case where app writers
just pass down the sizeof(buf) or something similar.

This patch fix the problem.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:49:17 -08:00
Wei Yongjun
d212318c9d sctp: remove dup code in net/sctp/socket.c
Remove dup check of "if (optlen < sizeof(int))".

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:49:16 -08:00
Wei Yongjun
906f8257ee sctp: Add some missing types for debug message
This patch add the type name "AUTH" and primitive type name
"PRIMITIVE_ASCONF" for debug message.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:49:16 -08:00
Hantzis Fotis
ee7537b63a tcp: tcp_init_wl / tcp_update_wl argument cleanup
The above functions from include/net/tcp.h have been defined with an
argument that they never use. The argument is 'u32 ack' which is never
used inside the function body, and thus it can be removed. The rest of
the patch involves the necessary changes to the function callers of the
above two functions.

Signed-off-by: Hantzis Fotis <xantzis@ceid.upatras.gr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:42:02 -08:00
Wei Yongjun
3df2678737 sctp: fix kernel panic with ERROR chunk containing too many error causes
If ERROR chunk is received with too many error causes in ESTABLISHED
state, the kernel get panic.

This is because sctp limit the max length of cmds to 14, but while
ERROR chunk is received, one error cause will add around 2 cmds by
sctp_add_cmd_sf(). So many error causes will fill the limit of cmds
and panic.

This patch fixed the problem.

This bug can be test by SCTP Conformance Test Suite
<http://networktest.sourceforge.net/>.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:27:39 -08:00
Vlad Yasevich
d1dd524785 sctp: fix crash during module unload
An extra list_del() during the module load failure and unload
resulted in a crash with a list corruption.  Now sctp can
be unloaded again.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 22:27:38 -08:00
Gerrit Renker
86739fb96e dccp: Do not let initial option overhead shrink the MPS
This fixes a problem caused by the overlap of the connection-setup and
established-state phases of DCCP connections.

During connection setup, the client retransmits Confirm Feature-Negotiation
options until a response from the server signals that it can move from the
half-established PARTOPEN into the OPEN state, whereupon the connection is
fully established on both ends (RFC 4340, 8.1.5).

However, since the client may already send data while it is in the PARTOPEN
state, consequences arise for the Maximum Packet Size: the problem is that the
initial option overhead is much higher than for the subsequent established
phase, as it involves potentially many variable-length list-type options
(server-priority options, RFC 4340, 6.4).

Applying the standard MPS is insufficient here: especially with larger
payloads this can lead to annoying, counter-intuitive EMSGSIZE errors.

On the other hand, reducing the MPS available for the established phase by
the added initial overhead is highly wasteful and inefficient.

The solution chosen therefore is a two-phase strategy:

   If the payload length of the DataAck in PARTOPEN is too large, an Ack is sent
   to carry the options, and the feature-negotiation list is then flushed.

   This means that the server gets two Acks for one Response. If both Acks get
   lost, it is probably better to restart the connection anyway and devising yet
   another special-case does not seem worth the extra complexity.

The result is a higher utilisation of the available packet space for the data
transmission phase (established state) of a connection.

The patch (over-)estimates the initial overhead to be 32*4 bytes -- commonly
seen values were around 90 bytes for initial feature-negotiation options.

It uses sizeof(u32) to mean "aligned units of 4 bytes".
For consistency, another use of 4-byte alignment is adapted.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:07:23 -08:00
Gerrit Renker
361a5c1dd0 dccp: Minimise header option overhead in setting the MPS
This patch resolves a long-standing FIXME to dynamically update the Maximum
Packet Size depending on actual options usage.

It uses the flags set by the feature-negotiation infrastructure to compute
the required header option size.

Most options are fixed-size, a notable exception are Ack Vectors (required
currently only by CCID-2). These can have any length between 3 and 1020
bytes. As a result of testing, 16 bytes (2 bytes for type/length plus 14 Ack
Vector cells) have been found to be sufficient for loss-free situations.

There are currently no CCID-specific header options which may appear on data
packets, thus it is not necessary to define a corresponding CCID field as
suggested in the old comment.

Further changes:
----------------
 Adjusted the type of 'cur_mps' to match the unsigned return type of the
 function.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:07:23 -08:00
Ilpo Järvinen
9ce0146102 tcp: get rid of two unnecessary u16s in TCP skb flags copying
I guess these fields were one day 16-bit in the struct but
nowadays they're just using 8 bits anyway.

This is just a precaution, didn't result any change in my
case but who knows what all those varying gcc versions &
options do. I've been told that 16-bit is not so nice with
some cpus.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:17 -08:00
Ilpo Järvinen
0d6a775e27 tcp: in sendmsg/pages open code the real goto target
copied was assigned zero right before the goto, so if (copied)
cannot ever be true.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:16 -08:00
Ilpo Järvinen
cabeccbd17 tcp: kill eff_sacks "cache", the sole user can calculate itself
Also fixes insignificant bug that would cause sending of stale
SACK block (would occur in some corner cases).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:16 -08:00
Ilpo Järvinen
758ce5c8d1 tcp: add helper for AI algorithm
It seems that implementation in yeah was inconsistent to what
other did as it would increase cwnd one ack earlier than the
others do.

Size benefits:

  bictcp_cong_avoid |  -36
  tcp_cong_avoid_ai |  +52
  bictcp_cong_avoid |  -34
  tcp_scalable_cong_avoid |  -36
  tcp_veno_cong_avoid |  -12
  tcp_yeah_cong_avoid |  -38

= -104 bytes total

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:15 -08:00
Ilpo Järvinen
571a5dd8d0 htcp: merge icsk_ca_state compare
Similar to what is done elsewhere in TCP code when double
state checks are being done.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:14 -08:00
Ilpo Järvinen
e6c7d08579 tcp: drop unnecessary local var in collapse
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:13 -08:00
Ilpo Järvinen
bc079e9ede tcp: cleanup ca_state mess in tcp_timer
Redundant checks made indentation impossible to follow.
However, it might be useful to make this ca_state+is_sack
indexed array.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:13 -08:00
Ilpo Järvinen
7363a5b233 tcp: separate timeout marking loop to it's own function
Some comment about its current state added. So far I have
seen very few cases where the thing is actually useful,
usually just marginally (though admittedly I don't usually
see top of window losses where it seems possible that there
could be some gain), instead, more often the cases suffer
from L-marking spike which is certainly not desirable
(I'll bury improving it to my todo list, but on a low
prio position).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:12 -08:00
Ilpo Järvinen
d0af4160d1 tcp: remove redundant code from tcp_mark_lost_retrans
Arnd Hannemann <hannemann@nets.rwth-aachen.de> noticed and was
puzzled by the fact that !tcp_is_fack(tp) leads to early return
near the beginning and the later on tcp_is_fack(tp) was still
used in an if condition. The later check was a left-over from
RFC3517 SACK stuff (== !tcp_is_fack(tp) behavior nowadays) as
there wasn't clear way how to handle this particular check
cheaply in the spirit of RFC3517 (using only SACK blocks, not
holes + SACK blocks as with FACK). I sort of left it there as
a reminder but since it's confusing other people just remove
it and comment the missing-feature stuff instead.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:11 -08:00
Ilpo Järvinen
02276f3c96 tcp: fix corner case issue in segmentation during rexmitting
If cur_mss grew very recently so that the previously G/TSOed skb
now fits well into a single segment it would get send up in
parts unless we calculate # of segments again. This corner-case
could happen eg. after mtu probe completes or less than
previously sack blocks are required for the opposite direction.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:11 -08:00
Ilpo Järvinen
d3d2ae4545 tcp: Don't clear hints when tcp_fragmenting
1) We didn't remove any skbs, so no need to handle stale refs.

2) scoreboard_skb_hint is trivial, no timestamps were changed
   so no need to clear that one

3) lost_skb_hint needs tweaking similar to that of
   tcp_sacktag_one().

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:10 -08:00
Ilpo Järvinen
62ad27619c tcp: deferring in middle of queue makes very little sense
If skb can be sent right away, we certainly should do that
if it's in the middle of the queue because it won't get
more data into it.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:10 -08:00
Ilpo Järvinen
59a08cba6a tcp: fix lost_cnt_hint miscounts
It is possible that lost_cnt_hint gets underflow in
tcp_clean_rtx_queue because the cumulative ACK can cover
the segment where lost_skb_hint points to only partially,
which means that the hint is not cleared, opposite to what
my (earlier) comment claimed.

Also I don't agree what I ended up writing about non-trivial
case there to be what I intented to say. It was not supposed
to happen that the hint won't get cleared and we underflow
in any scenario.

In general, this is quite hard to trigger in practice.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:09 -08:00
Ilpo Järvinen
ac11ba753f tcp: don't backtrack to sacked skbs
Backtracking to sacked skbs is a horrible performance killer
since the hint cannot be advanced successfully past them...
...And it's totally unnecessary too.

In theory this is 2.6.27..28 regression but I doubt anybody
can make .28 to have worse performance because of other TCP
improvements.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 03:00:08 -08:00
David S. Miller
c3b3240450 rds: Fix build on powerpc.
As reported by Stephen Rothwell.

> Today's linux-next build (powerpc allyesconfig) failed like this:
>
> net/rds/cong.c: In function 'rds_cong_set_bit':
> net/rds/cong.c:284: error: implicit declaration of function 'generic___set_le_bit'
> net/rds/cong.c: In function 'rds_cong_clear_bit':
> net/rds/cong.c:298: error: implicit declaration of function 'generic___clear_le_bit'
> net/rds/cong.c: In function 'rds_cong_test_bit':
> net/rds/cong.c:309: error: implicit declaration of function 'generic_test_le_bit'

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-02 01:49:28 -08:00
David S. Miller
aa4abc9bcc Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-tx.c
	net/8021q/vlan_core.c
	net/core/dev.c
2009-03-01 21:35:16 -08:00
Ilpo Järvinen
9ec06ff57a tcp: fix retrans_out leaks
There's conflicting assumptions in shifting, the caller assumes
that dupsack results in S'ed skbs (or a part of it) for sure but
never gave a hint to tcp_sacktag_one when dsack is actually in
use. Thus DSACK retrans_out -= pcount was not taken and the
counter became out of sync. Remove obstacle from that information
flow to get DSACKs accounted in tcp_sacktag_one as expected.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-01 00:21:36 -08:00
Herbert Xu
4ead443163 netpoll: Add drop checks to all entry points
The netpoll entry checks are required to ensure that we don't
receive normal packets when invoked via netpoll.  Unfortunately
it only ever worked for the netif_receive_skb/netif_rx entry
points.  The VLAN (and subsequently GRO) entry point didn't
have the check and therefore can trigger all sorts of weird
problems.

This patch adds the netpoll check to all entry points.

I'm still uneasy with receiving at all under netpoll (which
apparently is only used by the out-of-tree kdump code).  The
reason is it is perfectly legal to receive all data including
headers into highmem if netpoll is off, but if you try to do
that with netpoll on and someone gets a printk in an IRQ handler                                             
you're going to get a nice BUG_ON.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-01 00:11:52 -08:00
David S. Miller
8010dc306b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-02-28 22:32:16 -08:00
Jouni Malinen
0bfbce18b9 nl80211: Avoid AP mode BUG_ON hang with invalid lock assert
"cfg80211: add assert_cfg80211_lock() to ensure proper protection"
added assert_cfg80211_lock() calls into various places. At least
one of them, nl80211_send_wiphy(), should not have been there. That
triggers the BUG_ON in assert_cfg80211_lock() and pretty much kills
the kernel whenever someone runs hostapd.. Remove that call and make
assert_cfg80211_lock() use WARN_ON instead of BUG_ON to be a bit more
friendly to users.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:53:04 -05:00
Alina Friedrichsen
4a332a385a mac80211: Give it some time to do the TSF sync
Give slow hardware some time to do the TSF sync, to not run into an
IBSS merging endless loop in some rarely situations.

Signed-off-by: Alina Friedrichsen <x-alina@gmx.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:53:03 -05:00
Alina Friedrichsen
34e8f08231 mac80211: Don't merge with the same BSSID
It was not a good idea to do a TSF reset on strange IBSS merges to the same BSSID. For example it will break the TSF sync of ath9k completely and it is unnecessary as all hardware I have tested do a TSF sync to a higher value automatically and IBSS merges are only done to higher TSF values. It only need a TSF reset to accept a lower value, when the IBSS network is changed manually.

Signed-off-by: Alina Friedrichsen <x-alina@gmx.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:53:02 -05:00
Luis R. Rodriguez
2f92cd2e5f cfg80211: pass the regulatory_request to ignore_request
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:53:00 -05:00
Luis R. Rodriguez
d951c1ddeb cfg80211: do not kzalloc() again for a new request on __regulatory_hint
Since we already have a regulatory request from the workqueue use that
and avoid a new kzalloc()

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:53:00 -05:00
Luis R. Rodriguez
28da32d7ca cfg80211: pass the regulatory_request struct in __regulatory_hint()
We were passing value by value, lets just pass the struct.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:53:00 -05:00
Luis R. Rodriguez
d1c96a9a29 cfg80211: make __regulatory_hint() static
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:59 -05:00
Luis R. Rodriguez
e38f8a7a8b cfg80211: Add AP beacon regulatory hints
When devices are world roaming they cannot beacon or do active scan
on 5 GHz or on channels 12, 13 and 14 on the 2 GHz band. Although
we have a good regulatory API some cards may _always_ world roam, this
is also true when a system does not have CRDA present. Devices doing world
roaming can still passive scan, if they find a beacon from an AP on
one of the world roaming frequencies we make the assumption we can do
the same and we also remove the passive scan requirement.

This adds support for providing beacon regulatory hints based on scans.
This works for devices that do either hardware or software scanning.
If a channel has not yet been marked as having had a beacon present
on it we queue the beacon hint processing into the workqueue.

All wireless devices will benefit from beacon regulatory hints from
any wireless device on a system including new devices connected to
the system at a later time.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:59 -05:00
Luis R. Rodriguez
3fc71f775a cfg80211: enable 5 GHz world roaming channels
The current static world regulatory domain is too restrictive,
we can use some 5 GHz channels world wide so long as they do not
touch frequencies which require DFS. The compromise is we must
also enforce passive scanning and disallow usage of a mode of
operation that beacons: (AP | IBSS | Mesh)

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:59 -05:00
Luis R. Rodriguez
68798a6263 cfg80211: enable active-scan / beaconing on Ch 1-11 for world regdom
This enables active scan and beaconing on Channels 1 through 11
on the static world regulatory domain.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:58 -05:00
Luis R. Rodriguez
69b1572bd8 cfg80211: rename regdom_changed to regdom_changes() and use it
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:58 -05:00
Luis R. Rodriguez
fff32c04f6 cfg80211: allow drivers that agree on regulatory to agree
This allows drivers that agree on regulatory to share their
regulatory domain.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:58 -05:00
Luis R. Rodriguez
fb1fc7add5 cfg80211: comments style cleanup
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:57 -05:00
Luis R. Rodriguez
fe33eb3908 cfg80211: move all regulatory hints to workqueue
All regulatory hints (core, driver, userspace and 11d) are now processed in
a workqueue.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:57 -05:00
Luis R. Rodriguez
0441d6ffc7 cfg80211: free rd on unlikely event on 11d hint
This was never happening but it was still wrong, so correct it.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:57 -05:00
Luis R. Rodriguez
915278e099 cfg80211: remove likely from an 11d hint case
Truth of the matter this was confusing people so mark it as
unlikely as that is the case now.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:56 -05:00
Luis R. Rodriguez
d335fe6391 cfg80211: protect first access of last_request on 11d hint under mutex
We were not protecting last_request there is a small possible race
between an 11d hint and another routine which calls reset_regdomains()
which can prevent a valid country IE from being processed. This is
not critical as it will still be procesed soon after but locking prior
to it is correct.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:56 -05:00
Luis R. Rodriguez
806a9e3967 cfg80211: make regulatory_request use wiphy_idx instead of wiphy
We do this so later on we can move the pending requests onto a
workqueue. By using the wiphy_idx instead of the wiphy we can
later easily check if the wiphy has disappeared or not.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:56 -05:00
Luis R. Rodriguez
761cf7ecff cfg80211: add assert_cfg80211_lock() to ensure proper protection
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:56 -05:00
Luis R. Rodriguez
bcf4f99b7b cfg80211: propagate -ENOMEM during regulatory_init()
Calling kobject_uevent_env() can fail mainly due to out of
memory conditions. We do not want to continue during such
conditions so propagate that as well instead of letting
cfg80211 load as if everything is peachy.

Additionally lets clarify that when CRDA is not called during
cfg80211's initialization _and_ if the error is not an -ENOMEM
its because kobject_uevent_env() failed to call CRDA, not because
CRDA failed. For those who want to find out why we also let you
do so by enabling the kernel config CONFIG_CFG80211_REG_DEBUG --
you'll get an actual stack trace.

So for now we'll treat non -ENOMEM kobject_uevent_env() failures as
non fatal during cfg80211's initialization.

CC: Greg KH <greg@kroah.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:55 -05:00
Luis R. Rodriguez
ba25c14142 cfg80211: add regulatory_hint_core() to separate the core reg hint
This makes the core hint path more readable and allows for us to
later make it obvious under what circumstances we need locking or not.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:55 -05:00
Luis R. Rodriguez
80778f18c0 nl80211: disallow user requests prior to regulatory_init()
If cfg80211 is built into the kernel there is perhaps a small
time window betwen nl80211_init() and regulatory_init() where
cfg80211_regdomain hasn't yet been initialized to let the
wireless core do its work. During that rare case and time
frame (if its even possible) we don't allow user regulatory
changes as cfg80211 is working on enabling its first regulatory
domain.

To check for cfg80211_regdomain we now contend the entire operation
using the cfg80211_mutex.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:55 -05:00
Luis R. Rodriguez
a1794390f1 cfg80211: rename cfg80211_drv_mutex to cfg80211_mutex
cfg80211_drv_mutex is protecting more than the driver list,
this renames it and documents what its currently supposed to
protect.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:55 -05:00
Luis R. Rodriguez
85fd129a72 cfg80211: add wiphy_idx_valid to check for wiphy_idx sanity
This will later be used by others, for now make use of it in
cfg80211_drv_by_wiphy_idx() to return early if an invalid
wiphy_idx has been provided.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:54 -05:00
Luis R. Rodriguez
b5850a7a4f cfg80211: rename cfg80211_registered_device's idx to wiphy_idx
Makes it clearer to read when comparing to ifidx

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:54 -05:00
Alina Friedrichsen
79f6440c52 mac80211: Introduce a generic commit() to apply changes
This patch introduces a generic commit() function which initiate a
new network joining process. It should be called after some interface
config changes, so that the changes get applied more cleanly. Currently
set_ssid() and set_bssid() call it. Others can be added in future
patches.

In version 1 the header files was forgotten, sorry.

Signed-off-by: Alina Friedrichsen <x-alina@gmx.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:54 -05:00
Michael Buesch
80e775bf08 mac80211: Add software scan notifiers
This adds optional notifier functions for software scan.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:51 -05:00
Johannes Berg
4aa188e1a8 mac80211/cfg80211: move iwrange handler to cfg80211
The previous patch made cfg80211 generally aware of the signal
type a given hardware will give, so now it can implement
SIOCGIWRANGE itself, removing more wext stuff from mac80211.
Might need to be a little more parametrized once we have
more hardware using cfg80211 and new hardware capabilities.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:42 -05:00
Johannes Berg
77965c970d cfg80211: clean up signal type
It wasn't a good idea to make the signal type a per-BSS option,
although then it is closer to the actual value. Move it to be
a per-wiphy setting, update mac80211 to match.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:42 -05:00
Johannes Berg
630e64c487 nl80211: remove admin requirement from station get
There's no particular reason to not let untrusted users see
this information -- it's just the stations we're talking to,
packet counters for them and possibly some mesh things.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:41 -05:00
Johannes Berg
0a16ec5f5e mac80211: add missing kernel-doc
Document the new shutdown member.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:41 -05:00
Johannes Berg
a77b855245 cfg80211/mac80211: fill qual.qual value/adjust max_qual.qual
Due to various bugs in the software stack we end up having
to fill qual.qual; level should be used, but wpa_supplicant
doesn't properly ignore qual.qual, NM should use qual.level
regardless of that because qual.qual is 0 but doesn't handle
IW_QUAL_DBM right now.

So fill qual.qual with the qual.level value clamped to
-110..-40 dBm or just the regular 'unspecified' signal level.
This requires a mac80211 change to properly announce the
max_qual.qual and avg_qual.qual values.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:40 -05:00
Dan Williams
cb3a8eec0e cfg80211: age scan results on resume
Scanned BSS entries are timestamped with jiffies, which doesn't
increment across suspend and hibernate.  On resume, every BSS in the
scan list looks like it was scanned within the last 10 seconds,
irregardless of how long the machine was actually asleep.  Age scan
results on resume with the time spent during sleep so userspace has a
clue how old they really are.

Signed-off-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:40 -05:00
Jouni Malinen
98c8a60a04 nl80211: Provide access to STA TX/RX packet counters
The TX/RX packet counters are needed to fill in RADIUS Accounting
attributes Acct-Output-Packets and Acct-Input-Packets. We already
collect the needed information, but only the TX/RX bytes were
previously exposed through nl80211. Allow applications to fetch the
packet counters, too, to provide more complete support for accounting.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:39 -05:00
Jouni Malinen
70692ad292 nl80211: Optional IEs into scan request
This extends the NL80211_CMD_TRIGGER_SCAN command to allow applications
to specify a set of information element(s) to be added into Probe
Request frames with NL80211_ATTR_IE. This provides support for the
MLME-SCAN.request primitive parameter VendorSpecificInfo and can be
used, e.g., to implement WPS scanning.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:38 -05:00
Randy Dunlap
13e967b292 wireless: fix for CONFIG_NL80211=n
Add empty function for case of CONFIG_NL80211=n:

net/wireless/scan.c:35: error: implicit declaration of function 'nl80211_send_scan_aborted'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:52:35 -05:00
Sujith
81cb7623ad mac80211: Extend the rate control API with an update callback
The AP can switch dynamically between 20/40 Mhz channel width,
in which case we switch the local operating channel, but the
rate control algorithm is not notified. This patch adds a new callback
to indicate such changes to the RC algorithm.

Currently, HT channel width change is notified, but this callback
can be used to indicate any new requirements that might come up later on.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:51:45 -05:00
Johannes Berg
469002983f mac80211: split IBSS/managed code
This patch splits out the ibss code and data from managed (station) mode.
The reason to do this is to better separate the state machines, and have
the code be contained better so it gets easier to determine what exactly
a given change will affect, that in turn makes it easier to understand.

This is quite some churn, especially because I split sdata->u.sta into
sdata->u.mgd and sdata->u.ibss, but I think it's easier to maintain that
way. I've also shuffled around some code -- null function sending is only
applicable to managed interfaces so put that into that file, some other
functions are needed from various places so put them into util, and also
rearranged the prototypes in ieee80211_i.h accordingly.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:51:42 -05:00
Johannes Berg
96f5e66e8a mac80211: fix aggregation for hardware with ampdu queues
Hardware with AMPDU queues currently has broken aggregation.

This patch fixes it by making all A-MPDUs go over the regular AC queues,
but keeping track of the hardware queues in mac80211. As a first rough
version, it actually stops the AC queue for extended periods of time,
which can be removed by adding buffering internal to mac80211, but is
currently not a huge problem because people rarely use multiple TIDs
that are in the same AC (and iwlwifi currently doesn't operate as AP).

This is a short-term fix, my current medium-term plan, which I hope to
execute soon as well, but am not sure can finish before .30, looks like
this:
 1) rework the internal queuing layer in mac80211 that we use for
    fragments if the driver stopped queue in the middle of a fragmented
    frame to be able to queue more frames at once (rather than just a
    single frame with its fragments)
 2) instead of stopping the entire AC queue, queue up the frames in a
    per-station/per-TID queue during aggregation session initiation,
    when the session has come up take all those frames and put them
    onto the queue from 1)
 3) push the ampdu queue layer abstraction this patch introduces in
    mac80211 into the driver, and remove the virtual queue stuff from
    mac80211 again

This plan will probably also affect ath9k in that mac80211 queues the
frames instead of passing them down, even when there are no ampdu queues.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:51:42 -05:00
Johannes Berg
076ae609d2 mac80211: disallow moving netns
mac80211 currently assumes init_net for all interfaces,
so really will not cope well with network namespaces,
at least at this time.

To change this, we would have keep track of the netns
in addition to the ifindex, which is not something I
want to think about right now.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:51:39 -05:00
Vasanthakumar Thiagarajan
53d6f81c78 mac80211: Make sure non-HT connection when IEEE80211_STA_TKIP_WEP_USED is set
It is possible that some broken AP might send HT IEs in it's
assoc response even though the STA has not sent them in assoc req
when WEP/TKIP is used as pairwise cipher suite. Also it is important
to check this bit before enabling ht mode in beacon receive path.

Signed-off-by: Vasanthakumar Thiagarajan <vasanth@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-27 14:51:39 -05:00
Jarek Poplawski
1844f74794 pkt_sched: sch_drr: Fix oops in drr_change_class.
drr_change_class lacks a check for NULL of tca[TCA_OPTIONS], so oops
is possible.

Reported-by: Denys Fedoryschenko <denys@visp.net.lb>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-27 02:42:38 -08:00
Andy Grover
fe17f84f5f RDS: Kconfig and Makefile
Add RDS Kconfig and Makefile, and modify net/'s to add
us to the build.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:43:35 -08:00
Andy Grover
cbd151bfc7 RDS: Add RDS to AF key strings
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:43:19 -08:00
Andy Grover
55b7ed0b58 RDS: Common RDMA transport code
Although most of IB and iWARP are separated from each other,
there is some common code required to handle their shared
CM listen port. This code listens for CM events and then
dispatches the event to the appropriate transport, either
IB or iWARP.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:33 -08:00
Andy Grover
fcd8b7c0ec RDS: Add iWARP support
Support for iWARP NICs is implemented as a separate
RDS transport from IB. The code, however, is very
similar to IB (it was forked, basically.) so let's keep
it in one changeset.

The reason for this duplicationis that despite its similarity
to IB, there are a number of places where it has different
semantics. iwarp zcopy support is still under development,
and giving it its own sandbox ensures that IB code isn't
disrupted while iwarp changes. Over time these transports
will re-converge.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:33 -08:00
Andy Grover
e6babe4cc4 RDS/IB: Stats and sysctls
IB-specific stats and sysctls.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:32 -08:00
Andy Grover
1e23b3ee0e RDS/IB: Receive datagrams via IB
Header parsing, ring refill. It puts the incoming data into an
rds_incoming struct, which is passed up to rds-core.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:32 -08:00
Andy Grover
6a0979df32 RDS/IB: Implement IB-specific datagram send.
Specific to IB is a credits-based flow control mechanism, in
addition to the expected usage of the IB API to package outgoing
data into work requests.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:31 -08:00
Andy Grover
08b48a1ed8 RDS/IB: Implement RDMA ops using FMRs
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:31 -08:00
Andy Grover
f528efe276 RDS/IB: Ring-handling code.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:30 -08:00
Andy Grover
ec16227e14 RDS/IB: Infiniband transport
Registers as an RDS transport and an IB client, and uses IB CM
API to allocate ids, queue pairs, and the rest of that fun stuff.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:30 -08:00
Andy Grover
eff5f53bef RDS: RDMA support
Some transports may support RDMA features. This handles the
non-transport-specific parts, like pinning user pages and
tracking mapped regions.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:29 -08:00
Andy Grover
bdbe6fbc6a RDS: recv.c
Upon receiving a datagram from the transport, RDS parses the
headers and potentially queues an ACK.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:29 -08:00
Andy Grover
5c11559046 RDS: send.c
This is the code to send an RDS datagram.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:28 -08:00
Andy Grover
7875e18e09 RDS: Message parsing
Parsing of newly-received RDS message headers (including ext.
headers) and copy-to/from-user routines.

page.c implements a per-cpu page remainder cache, to reduce the
number of allocations needed for small datagrams.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:28 -08:00
Andy Grover
3e5048495c RDS: sysctls
RDS exposes a few tunable parameters via sysctls.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:26 -08:00
Andy Grover
13796bf9ed RDS: loopback
A simple rds transport to handle loopback connections.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:26 -08:00
Andy Grover
00e0f34c61 RDS: Connection handling
While arguably the fact that the underlying transport needs a
connection to convey RDS's datagrame reliably is not important
to rds proper, the transports implemented so far (IB and TCP)
have both been connection-oriented, and so the connection
state machine-related code is in the common rds code.

This patch also includes several work items, to handle connecting,
sending, receiving, and shutdown.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:25 -08:00
Andy Grover
a8c879a7ee RDS: Info and stats
RDS currently generates a lot of stats that are accessible via
the rds-info utility. This code implements the support for this.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:25 -08:00
Andy Grover
0fbc78cbf5 RDS: Transport code
RDS supports multiple transports. While this initial submission
only supports Infiniband transport, this abstraction allows others
to be added. We're working on an iWARP transport, and also see
UDP over DCB as another possibility.

This code handles transport registration.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:24 -08:00
Andy Grover
922cb17a5c RDS: Congestion-handling code
RDS handles per-socket congestion by updating peers with a complete
congestion map (8KB). This code keeps track of these maps for itself
and ones received from peers.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:24 -08:00
Andy Grover
39de828179 RDS: Main header file
RDS's main data structure definitions and exported functions.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:23 -08:00
Andy Grover
639b321b4d RDS: Socket interface
Implement the RDS (Reliable Datagram Sockets) interface.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:39:23 -08:00
Hannes Eder
9ee62630fd wanrouter: fix sparse warnings: context imbalance
Impact: Attribute functions with __acquires(...) resp. __releases(...).

Fix this sparse warnings:
  net/wanrouter/wanproc.c:82:13: warning: context imbalance in 'r_start' - wrong count at exit
  net/wanrouter/wanproc.c:103:13: warning: context imbalance in 'r_stop' - unexpected unlock
  net/wanrouter/wanmain.c:765:13: warning: context imbalance in 'lock_adapter_irq' - wrong count at exit
  net/wanrouter/wanmain.c:771:13: warning: context imbalance in 'unlock_adapter_irq' - unexpected unlock

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:13:36 -08:00
Hannes Eder
56bca31ff1 inet fragments: fix sparse warning: context imbalance
Impact: Attribute function with __releases(...)

Fix this sparse warning:
  net/ipv4/inet_fragment.c:276:35: warning: context imbalance in 'inet_frag_find' - unexpected unlock

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:13:35 -08:00
Hannes Eder
e57c624be8 decnet: fix sparse warnings: symbol shadows an earlier one
Impact: Remove redundant variable declarations, resp. rename
inner scope variable.

Fix this sparse warnings:
  net/decnet/af_decnet.c:1252:40: warning: symbol 'skb' shadows an earlier one
  net/decnet/af_decnet.c:1223:24: originally declared here
  net/decnet/af_decnet.c:1582:29: warning: symbol 'val' shadows an earlier one
  net/decnet/af_decnet.c:1527:22: originally declared here
  net/decnet/dn_dev.c:687:21: warning: symbol 'err' shadows an earlier one
  net/decnet/dn_dev.c:670:13: originally declared here
  net/decnet/sysctl_net_decnet.c:182:21: warning: symbol 'len' shadows an earlier one
  net/decnet/sysctl_net_decnet.c:173:16: originally declared here

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:13:35 -08:00
Hannes Eder
8521c27ee7 decnet: fix sparse warnings: context imbalance
Impact: Attribute functions with __acquires(...) resp. __releases(...).

Fix this sparse warnings:
  net/decnet/dn_dev.c:1324:13: warning: context imbalance in 'dn_dev_seq_start' - wrong count at exit
  net/decnet/dn_dev.c:1366:13: warning: context imbalance in 'dn_dev_seq_stop' - unexpected unlock

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:13:34 -08:00
Hannes Eder
63d819caeb sysctl: fix sparse warning: Should it be static?
Impact: Include header file.

Fix this sparse warning:
  net/core/sysctl_net_core.c:123:32: warning: symbol 'net_core_path' was not declared. Should it be static?

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:13:34 -08:00
Hannes Eder
2db096086e appletalk: fix warning: format not a string literal and no ...
Impact: Use 'static const char[]' instead of 'static char[]', and
since the data is const now it can be placed in __initconst.

Fix this warning:
  net/appletalk/ddp.c: In function 'atalk_init':
  net/appletalk/ddp.c:1894: warning: format not a string literal and no format arguments

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:13:33 -08:00
Hannes Eder
e3db6cb421 9p: fix sparse warning: cast adds address space
Impact: Trust in the comment and add '__force' to the cast.

Fix this sparse warning:
  net/9p/trans_fd.c:420:34: warning: cast adds address space to expression (<asn:1>)

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:13:32 -08:00
Hannes Eder
81c553299f net/802: fix sparse warnings: context imbalance
Impact: Attribute function with __acquires(...) resp. __releases(...).

Fix this sparse warnings:
  net/802/tr.c:492:21: warning: context imbalance in 'rif_seq_start' - wrong count at exit
  net/802/tr.c:519:13: warning: context imbalance in 'rif_seq_stop' - unexpected unlock

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:13:32 -08:00
Wei Yongjun
c3431ea71e llc: remove some pointless conditionals before kfree_skb()
Remove some pointless conditionals before kfree_skb().

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:07:38 -08:00
Wei Yongjun
47a30b26e5 iucv: remove some pointless conditionals before kfree_skb()
Remove some pointless conditionals before kfree_skb().

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:07:37 -08:00
Wei Yongjun
db849df63c decnet: remove some pointless conditionals before kfree_skb()
Remove some pointless conditionals before kfree_skb().

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:07:36 -08:00
Wei Yongjun
f3fbbe0f6f core: remove some pointless conditionals before kfree_skb()
Remove some pointless conditionals before kfree_skb().

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:07:36 -08:00
Wei Yongjun
acb5d75b9b packet: remove some pointless conditionals before kfree_skb()
Remove some pointless conditionals before kfree_skb().

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:07:35 -08:00
Wei Yongjun
ce030edfb4 can: remove some pointless conditionals before kfree_skb()
Remove some pointless conditionals before kfree_skb().

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:07:35 -08:00
Wei Yongjun
91744f6559 netlink: remove some pointless conditionals before kfree_skb()
Remove some pointless conditionals before kfree_skb().

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:07:34 -08:00
Wei Yongjun
40d44446cf unix: remove some pointless conditionals before kfree_skb()
Remove some pointless conditionals before kfree_skb().

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:07:34 -08:00
Wei Yongjun
86dc1ad2be pktgen: remove some pointless conditionals before kfree_skb()
Remove some pointless conditionals before kfree_skb().

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:07:33 -08:00
Wei Yongjun
6f96106867 af_key: remove some pointless conditionals before kfree_skb()
Remove some pointless conditionals before kfree_skb().

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 23:07:32 -08:00
Wei Yongjun
7585b97a48 Bluetooth: Remove some pointless conditionals before kfree_skb()
Remove some pointless conditionals before kfree_skb().

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:49 +01:00
Dave Young
2ae9a6be5f Bluetooth: Move hci_conn_del_sysfs() back to avoid device destruct too early
The following commit introduce a regression:

	commit 7d0db0a373
	Author: Marcel Holtmann <marcel@holtmann.org>
	Date:   Mon Jul 14 20:13:51 2008 +0200

		[Bluetooth] Use a more unique bus name for connections

I get panic as following (by netconsole):

[ 2709.344034] usb 5-1: new full speed USB device using uhci_hcd and address 4
[ 2709.505776] usb 5-1: configuration #1 chosen from 1 choice
[ 2709.569207] Bluetooth: Generic Bluetooth USB driver ver 0.4
[ 2709.570169] usbcore: registered new interface driver btusb
[ 2845.742781] BUG: unable to handle kernel paging request at 6b6b6c2f
[ 2845.742958] IP: [<c015515c>] __lock_acquire+0x6c/0xa80
[ 2845.743087] *pde = 00000000
[ 2845.743206] Oops: 0002 [#1] SMP
[ 2845.743377] last sysfs file: /sys/class/bluetooth/hci0/hci0:6/type
[ 2845.743742] Modules linked in: btusb netconsole snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss rfcomm l2cap bluetooth vfat fuse snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm pl2303 snd_timer psmouse usbserial snd 3c59x e100 serio_raw soundcore i2c_i801 intel_agp mii agpgart snd_page_alloc rtc_cmos rtc_core thermal processor rtc_lib button thermal_sys sg evdev
[ 2845.743742]
[ 2845.743742] Pid: 0, comm: swapper Not tainted (2.6.29-rc5-smp #54) Dell DM051
[ 2845.743742] EIP: 0060:[<c015515c>] EFLAGS: 00010002 CPU: 0
[ 2845.743742] EIP is at __lock_acquire+0x6c/0xa80
[ 2845.743742] EAX: 00000046 EBX: 00000046 ECX: 6b6b6b6b EDX: 00000002
[ 2845.743742] ESI: 6b6b6b6b EDI: 00000000 EBP: c064fd14 ESP: c064fcc8
[ 2845.743742]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 2845.743742] Process swapper (pid: 0, ti=c064e000 task=c05d1400 task.ti=c064e000)
[ 2845.743742] Stack:
[ 2845.743742]  c05d1400 00000002 c05d1400 00000001 00000002 00000000 f65388dc c05d1400
[ 2845.743742]  6b6b6b6b 00000292 c064fd0c c0153732 00000000 00000000 00000001 f700fa50
[ 2845.743742]  00000046 00000000 00000000 c064fd40 c0155be6 00000000 00000002 00000001
[ 2845.743742] Call Trace:
[ 2845.743742]  [<c0153732>] ? trace_hardirqs_on_caller+0x72/0x1c0
[ 2845.743742]  [<c0155be6>] ? lock_acquire+0x76/0xa0
[ 2845.743742]  [<c03e1aad>] ? skb_dequeue+0x1d/0x70
[ 2845.743742]  [<c046c885>] ? _spin_lock_irqsave+0x45/0x80
[ 2845.743742]  [<c03e1aad>] ? skb_dequeue+0x1d/0x70
[ 2845.743742]  [<c03e1aad>] ? skb_dequeue+0x1d/0x70
[ 2845.743742]  [<c03e1f94>] ? skb_queue_purge+0x14/0x20
[ 2845.743742]  [<f8171f5a>] ? hci_conn_del+0x10a/0x1c0 [bluetooth]
[ 2845.743742]  [<f81399c9>] ? l2cap_disconn_ind+0x59/0xb0 [l2cap]
[ 2845.743742]  [<f81795ce>] ? hci_conn_del_sysfs+0x8e/0xd0 [bluetooth]
[ 2845.743742]  [<f8175758>] ? hci_event_packet+0x5f8/0x31c0 [bluetooth]
[ 2845.743742]  [<c03dfe19>] ? sock_def_readable+0x59/0x80
[ 2845.743742]  [<c046c14d>] ? _read_unlock+0x1d/0x20
[ 2845.743742]  [<f8178aa9>] ? hci_send_to_sock+0xe9/0x1d0 [bluetooth]
[ 2845.743742]  [<c015388b>] ? trace_hardirqs_on+0xb/0x10
[ 2845.743742]  [<f816fa6a>] ? hci_rx_task+0x2ba/0x490 [bluetooth]
[ 2845.743742]  [<c0133661>] ? tasklet_action+0x31/0xc0
[ 2845.743742]  [<c013367c>] ? tasklet_action+0x4c/0xc0
[ 2845.743742]  [<c0132eb7>] ? __do_softirq+0xa7/0x170
[ 2845.743742]  [<c0116dec>] ? ack_apic_level+0x5c/0x1c0
[ 2845.743742]  [<c0132fd7>] ? do_softirq+0x57/0x60
[ 2845.743742]  [<c01333dc>] ? irq_exit+0x7c/0x90
[ 2845.743742]  [<c01055bb>] ? do_IRQ+0x4b/0x90
[ 2845.743742]  [<c01333d5>] ? irq_exit+0x75/0x90
[ 2845.743742]  [<c010392c>] ? common_interrupt+0x2c/0x34
[ 2845.743742]  [<c010a14f>] ? mwait_idle+0x4f/0x70
[ 2845.743742]  [<c0101c05>] ? cpu_idle+0x65/0xb0
[ 2845.743742]  [<c045731e>] ? rest_init+0x4e/0x60
[ 2845.743742] Code: 0f 84 69 02 00 00 83 ff 07 0f 87 1e 06 00 00 85 ff 0f 85 08 05 00 00 8b 4d cc 8b 49 04 85 c9 89 4d d4 0f 84 f7 04 00 00 8b 75 d4 <f0> ff 86 c4 00 00 00 89 f0 e8 56 a9 ff ff 85 c0 0f 85 6e 03 00
[ 2845.743742] EIP: [<c015515c>] __lock_acquire+0x6c/0xa80 SS:ESP 0068:c064fcc8
[ 2845.743742] ---[ end trace 4c985b38f022279f ]---
[ 2845.743742] Kernel panic - not syncing: Fatal exception in interrupt
[ 2845.743742] ------------[ cut here ]------------
[ 2845.743742] WARNING: at kernel/smp.c:329 smp_call_function_many+0x151/0x200()
[ 2845.743742] Hardware name: Dell DM051
[ 2845.743742] Modules linked in: btusb netconsole snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss rfcomm l2cap bluetooth vfat fuse snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm pl2303 snd_timer psmouse usbserial snd 3c59x e100 serio_raw soundcore i2c_i801 intel_agp mii agpgart snd_page_alloc rtc_cmos rtc_core thermal processor rtc_lib button thermal_sys sg evdev
[ 2845.743742] Pid: 0, comm: swapper Tainted: G      D    2.6.29-rc5-smp #54
[ 2845.743742] Call Trace:
[ 2845.743742]  [<c012e076>] warn_slowpath+0x86/0xa0
[ 2845.743742]  [<c015041b>] ? trace_hardirqs_off+0xb/0x10
[ 2845.743742]  [<c0146384>] ? up+0x14/0x40
[ 2845.743742]  [<c012e661>] ? release_console_sem+0x31/0x1e0
[ 2845.743742]  [<c046c8ab>] ? _spin_lock_irqsave+0x6b/0x80
[ 2845.743742]  [<c015041b>] ? trace_hardirqs_off+0xb/0x10
[ 2845.743742]  [<c046c900>] ? _read_lock_irqsave+0x40/0x80
[ 2845.743742]  [<c012e7f2>] ? release_console_sem+0x1c2/0x1e0
[ 2845.743742]  [<c0146384>] ? up+0x14/0x40
[ 2845.743742]  [<c015041b>] ? trace_hardirqs_off+0xb/0x10
[ 2845.743742]  [<c046a3d7>] ? __mutex_unlock_slowpath+0x97/0x160
[ 2845.743742]  [<c046a563>] ? mutex_trylock+0xb3/0x180
[ 2845.743742]  [<c046a4a8>] ? mutex_unlock+0x8/0x10
[ 2845.743742]  [<c015b991>] smp_call_function_many+0x151/0x200
[ 2845.743742]  [<c010a1a0>] ? stop_this_cpu+0x0/0x40
[ 2845.743742]  [<c015ba61>] smp_call_function+0x21/0x30
[ 2845.743742]  [<c01137ae>] native_smp_send_stop+0x1e/0x50
[ 2845.743742]  [<c012e0f5>] panic+0x55/0x110
[ 2845.743742]  [<c01065a8>] oops_end+0xb8/0xc0
[ 2845.743742]  [<c010668f>] die+0x4f/0x70
[ 2845.743742]  [<c011a8c9>] do_page_fault+0x269/0x610
[ 2845.743742]  [<c011a660>] ? do_page_fault+0x0/0x610
[ 2845.743742]  [<c046cbaf>] error_code+0x77/0x7c
[ 2845.743742]  [<c015515c>] ? __lock_acquire+0x6c/0xa80
[ 2845.743742]  [<c0153732>] ? trace_hardirqs_on_caller+0x72/0x1c0
[ 2845.743742]  [<c0155be6>] lock_acquire+0x76/0xa0
[ 2845.743742]  [<c03e1aad>] ? skb_dequeue+0x1d/0x70
[ 2845.743742]  [<c046c885>] _spin_lock_irqsave+0x45/0x80
[ 2845.743742]  [<c03e1aad>] ? skb_dequeue+0x1d/0x70
[ 2845.743742]  [<c03e1aad>] skb_dequeue+0x1d/0x70
[ 2845.743742]  [<c03e1f94>] skb_queue_purge+0x14/0x20
[ 2845.743742]  [<f8171f5a>] hci_conn_del+0x10a/0x1c0 [bluetooth]
[ 2845.743742]  [<f81399c9>] ? l2cap_disconn_ind+0x59/0xb0 [l2cap]
[ 2845.743742]  [<f81795ce>] ? hci_conn_del_sysfs+0x8e/0xd0 [bluetooth]
[ 2845.743742]  [<f8175758>] hci_event_packet+0x5f8/0x31c0 [bluetooth]
[ 2845.743742]  [<c03dfe19>] ? sock_def_readable+0x59/0x80
[ 2845.743742]  [<c046c14d>] ? _read_unlock+0x1d/0x20
[ 2845.743742]  [<f8178aa9>] ? hci_send_to_sock+0xe9/0x1d0 [bluetooth]
[ 2845.743742]  [<c015388b>] ? trace_hardirqs_on+0xb/0x10
[ 2845.743742]  [<f816fa6a>] hci_rx_task+0x2ba/0x490 [bluetooth]
[ 2845.743742]  [<c0133661>] ? tasklet_action+0x31/0xc0
[ 2845.743742]  [<c013367c>] tasklet_action+0x4c/0xc0
[ 2845.743742]  [<c0132eb7>] __do_softirq+0xa7/0x170
[ 2845.743742]  [<c0116dec>] ? ack_apic_level+0x5c/0x1c0
[ 2845.743742]  [<c0132fd7>] do_softirq+0x57/0x60
[ 2845.743742]  [<c01333dc>] irq_exit+0x7c/0x90
[ 2845.743742]  [<c01055bb>] do_IRQ+0x4b/0x90
[ 2845.743742]  [<c01333d5>] ? irq_exit+0x75/0x90
[ 2845.743742]  [<c010392c>] common_interrupt+0x2c/0x34
[ 2845.743742]  [<c010a14f>] ? mwait_idle+0x4f/0x70
[ 2845.743742]  [<c0101c05>] cpu_idle+0x65/0xb0
[ 2845.743742]  [<c045731e>] rest_init+0x4e/0x60
[ 2845.743742] ---[ end trace 4c985b38f02227a0 ]---
[ 2845.743742] ------------[ cut here ]------------
[ 2845.743742] WARNING: at kernel/smp.c:226 smp_call_function_single+0x8e/0x110()
[ 2845.743742] Hardware name: Dell DM051
[ 2845.743742] Modules linked in: btusb netconsole snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss rfcomm l2cap bluetooth vfat fuse snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm pl2303 snd_timer psmouse usbserial snd 3c59x e100 serio_raw soundcore i2c_i801 intel_agp mii agpgart snd_page_alloc rtc_cmos rtc_core thermal processor rtc_lib button thermal_sys sg evdev
[ 2845.743742] Pid: 0, comm: swapper Tainted: G      D W  2.6.29-rc5-smp #54
[ 2845.743742] Call Trace:
[ 2845.743742]  [<c012e076>] warn_slowpath+0x86/0xa0
[ 2845.743742]  [<c012e000>] ? warn_slowpath+0x10/0xa0
[ 2845.743742]  [<c015041b>] ? trace_hardirqs_off+0xb/0x10
[ 2845.743742]  [<c0146384>] ? up+0x14/0x40
[ 2845.743742]  [<c012e661>] ? release_console_sem+0x31/0x1e0
[ 2845.743742]  [<c046c8ab>] ? _spin_lock_irqsave+0x6b/0x80
[ 2845.743742]  [<c015041b>] ? trace_hardirqs_off+0xb/0x10
[ 2845.743742]  [<c046c900>] ? _read_lock_irqsave+0x40/0x80
[ 2845.743742]  [<c012e7f2>] ? release_console_sem+0x1c2/0x1e0
[ 2845.743742]  [<c0146384>] ? up+0x14/0x40
[ 2845.743742]  [<c015b7be>] smp_call_function_single+0x8e/0x110
[ 2845.743742]  [<c010a1a0>] ? stop_this_cpu+0x0/0x40
[ 2845.743742]  [<c026d23f>] ? cpumask_next_and+0x1f/0x40
[ 2845.743742]  [<c015b95a>] smp_call_function_many+0x11a/0x200
[ 2845.743742]  [<c010a1a0>] ? stop_this_cpu+0x0/0x40
[ 2845.743742]  [<c015ba61>] smp_call_function+0x21/0x30
[ 2845.743742]  [<c01137ae>] native_smp_send_stop+0x1e/0x50
[ 2845.743742]  [<c012e0f5>] panic+0x55/0x110
[ 2845.743742]  [<c01065a8>] oops_end+0xb8/0xc0
[ 2845.743742]  [<c010668f>] die+0x4f/0x70
[ 2845.743742]  [<c011a8c9>] do_page_fault+0x269/0x610
[ 2845.743742]  [<c011a660>] ? do_page_fault+0x0/0x610
[ 2845.743742]  [<c046cbaf>] error_code+0x77/0x7c
[ 2845.743742]  [<c015515c>] ? __lock_acquire+0x6c/0xa80
[ 2845.743742]  [<c0153732>] ? trace_hardirqs_on_caller+0x72/0x1c0
[ 2845.743742]  [<c0155be6>] lock_acquire+0x76/0xa0
[ 2845.743742]  [<c03e1aad>] ? skb_dequeue+0x1d/0x70
[ 2845.743742]  [<c046c885>] _spin_lock_irqsave+0x45/0x80
[ 2845.743742]  [<c03e1aad>] ? skb_dequeue+0x1d/0x70
[ 2845.743742]  [<c03e1aad>] skb_dequeue+0x1d/0x70
[ 2845.743742]  [<c03e1f94>] skb_queue_purge+0x14/0x20
[ 2845.743742]  [<f8171f5a>] hci_conn_del+0x10a/0x1c0 [bluetooth]
[ 2845.743742]  [<f81399c9>] ? l2cap_disconn_ind+0x59/0xb0 [l2cap]
[ 2845.743742]  [<f81795ce>] ? hci_conn_del_sysfs+0x8e/0xd0 [bluetooth]
[ 2845.743742]  [<f8175758>] hci_event_packet+0x5f8/0x31c0 [bluetooth]
[ 2845.743742]  [<c03dfe19>] ? sock_def_readable+0x59/0x80
[ 2845.743742]  [<c046c14d>] ? _read_unlock+0x1d/0x20
[ 2845.743742]  [<f8178aa9>] ? hci_send_to_sock+0xe9/0x1d0 [bluetooth]
[ 2845.743742]  [<c015388b>] ? trace_hardirqs_on+0xb/0x10
[ 2845.743742]  [<f816fa6a>] hci_rx_task+0x2ba/0x490 [bluetooth]
[ 2845.743742]  [<c0133661>] ? tasklet_action+0x31/0xc0
[ 2845.743742]  [<c013367c>] tasklet_action+0x4c/0xc0
[ 2845.743742]  [<c0132eb7>] __do_softirq+0xa7/0x170
[ 2845.743742]  [<c0116dec>] ? ack_apic_level+0x5c/0x1c0
[ 2845.743742]  [<c0132fd7>] do_softirq+0x57/0x60
[ 2845.743742]  [<c01333dc>] irq_exit+0x7c/0x90
[ 2845.743742]  [<c01055bb>] do_IRQ+0x4b/0x90
[ 2845.743742]  [<c01333d5>] ? irq_exit+0x75/0x90
[ 2845.743742]  [<c010392c>] common_interrupt+0x2c/0x34
[ 2845.743742]  [<c010a14f>] ? mwait_idle+0x4f/0x70
[ 2845.743742]  [<c0101c05>] cpu_idle+0x65/0xb0
[ 2845.743742]  [<c045731e>] rest_init+0x4e/0x60
[ 2845.743742] ---[ end trace 4c985b38f02227a1 ]---
[ 2845.743742] Rebooting in 3 seconds..

My logitec bluetooth mouse trying connect to pc, but
pc side reject the connection again and again. then panic happens.

The reason is due to hci_conn_del_sysfs now called in hci_event_packet,
the del work is done in a workqueue, so it's possible done before
skb_queue_purge called.

I move the hci_conn_del_sysfs after skb_queue_purge just as that before
marcel's commit.

Remove the hci_conn_del_sysfs in hci_conn_hash_flush as well due to
hci_conn_del will deal with the work.

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:49 +01:00
Marcel Holtmann
2526d3d8b2 Bluetooth: Permit BT_SECURITY also for L2CAP raw sockets
Userspace pairing code can be simplified if it doesn't have to fall
back to using L2CAP_LM in the case of L2CAP raw sockets. This patch
allows the BT_SECURITY socket option to be used for these sockets.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:48 +01:00
Marcel Holtmann
37e62f5516 Bluetooth: Fix RFCOMM usage of in-kernel L2CAP sockets
The CID value of L2CAP sockets need to be set to zero. All userspace
applications do this via memset() on the sockaddr_l2 structure. The
RFCOMM implementation uses in-kernel L2CAP sockets and so it has to
make sure that l2_cid is set to zero.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:48 +01:00
Marcel Holtmann
2a517ca687 Bluetooth: Disallow usage of L2CAP CID setting for now
In the future the L2CAP layer will have full support for fixed channels
and right now it already can export the channel assignment, but for the
functions bind() and connect() the usage of only CID 0 is allowed. This
allows an easy detection if the kernel supports fixed channels or not,
because otherwise it would impossible for application to tell.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:47 +01:00
Marcel Holtmann
8bf4794174 Bluetooth: Change RFCOMM to use BT_CONNECT2 for BT_DEFER_SETUP
When BT_DEFER_SETUP is enabled on a RFCOMM socket, then switch its
current state from BT_OPEN to BT_CONNECT2. This gives the Bluetooth
core a unified way to handle L2CAP and RFCOMM sockets. The BT_CONNECT2
state is designated for incoming connections.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:47 +01:00
Marcel Holtmann
d5f2d2be68 Bluetooth: Fix poll() misbehavior when using BT_DEFER_SETUP
When BT_DEFER_SETUP has been enabled on a Bluetooth socket it keeps
signaling POLLIN all the time. This is a wrong behavior. The POLLIN
should only be signaled if the client socket is in BT_CONNECT2 state
and the parent has been BT_DEFER_SETUP enabled.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:46 +01:00
Marcel Holtmann
96a3183322 Bluetooth: Set authentication requirement before requesting it
The authentication requirement got only updated when the security level
increased. This is a wrong behavior. The authentication requirement is
read by the Bluetooth daemon to make proper decisions when handling the
IO capabilities exchange. So set the value that is currently expected by
the higher layers like L2CAP and RFCOMM.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:44 +01:00
Marcel Holtmann
00ae4af91d Bluetooth: Fix authentication requirements for L2CAP security check
The L2CAP layer can trigger the authentication via an ACL connection or
later on to increase the security level. When increasing the security
level it didn't use the same authentication requirements when triggering
a new ACL connection. Make sure that exactly the same authentication
requirements are used. The only exception here are the L2CAP raw sockets
which are only used for dedicated bonding.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:43 +01:00
Marcel Holtmann
2950f21acb Bluetooth: Ask upper layers for HCI disconnect reason
Some of the qualification tests demand that in case of failures in L2CAP
the HCI disconnect should indicate a reason why L2CAP fails. This is a
bluntly layer violation since multiple L2CAP connections could be using
the same ACL and thus forcing a disconnect reason is not a good idea.

To comply with the Bluetooth test specification, the disconnect reason
is now stored in the L2CAP connection structure and every time a new
L2CAP channel is added it will set back to its default. So only in the
case where the L2CAP channel with the disconnect reason is really the
last one, it will propagated to the HCI layer.

The HCI layer has been extended with a disconnect indication that allows
it to ask upper layers for a disconnect reason. The upper layer must not
support this callback and in that case it will nicely default to the
existing behavior. If an upper layer like L2CAP can provide a disconnect
reason that one will be used to disconnect the ACL or SCO link.

No modification to the ACL disconnect timeout have been made. So in case
of Linux to Linux connection the initiator will disconnect the ACL link
before the acceptor side can signal the specific disconnect reason. That
is perfectly fine since Linux doesn't make use of this value anyway. The
L2CAP layer has a perfect valid error code for rejecting connection due
to a security violation. It is unclear why the Bluetooth specification
insists on having specific HCI disconnect reason.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:43 +01:00
Marcel Holtmann
f29972de8e Bluetooth: Add CID field to L2CAP socket address structure
In preparation for L2CAP fixed channel support, the CID value of a
L2CAP connection needs to be accessible via the socket interface. The
CID is the connection identifier and exists as source and destination
value. So extend the L2CAP socket address structure with this field and
change getsockname() and getpeername() to fill it in.

The bind() and connect() functions have been modified to handle L2CAP
socket address structures of variable sizes. This makes them future
proof if additional fields need to be added.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:42 +01:00
Marcel Holtmann
e1027a7c69 Bluetooth: Request L2CAP fixed channel list if available
If the extended features mask indicates support for fixed channels,
request the list of available fixed channels. This also enables the
fixed channel features bit so remote implementations can request
information about it. Currently only the signal channel will be
listed.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:42 +01:00
Marcel Holtmann
435fef20ac Bluetooth: Don't enforce authentication for L2CAP PSM 1 and 3
The recommendation for the L2CAP PSM 1 (SDP) is to not use any kind
of authentication or encryption. So don't trigger authentication
for incoming and outgoing SDP connections.

For L2CAP PSM 3 (RFCOMM) there is no clear requirement, but with
Bluetooth 2.1 the initiator is required to enable authentication
and encryption first and this gets enforced. So there is no need
to trigger an additional authentication step. The RFCOMM service
security will make sure that a secure enough link key is present.

When the encryption gets enabled after the SDP connection setup,
then switch the security level from SDP to low security.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:41 +01:00
Marcel Holtmann
6a8d3010b3 Bluetooth: Fix double L2CAP connection request
If the remote L2CAP server uses authentication pending stage and
encryption is enabled it can happen that a L2CAP connection request is
sent twice due to a race condition in the connection state machine.

When the remote side indicates any kind of connection pending, then
track this state and skip sending of L2CAP commands for this period.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:41 +01:00
Marcel Holtmann
984947dc64 Bluetooth: Fix race condition with L2CAP information request
When two L2CAP connections are requested quickly after the ACL link has
been established there exists a window for a race condition where a
connection request is sent before the information response has been
received. Any connection request should only be sent after an exchange
of the extended features mask has been finished.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:41 +01:00
Marcel Holtmann
657e17b03c Bluetooth: Set authentication requirements if not available
When no authentication requirements are selected, but an outgoing or
incoming connection has requested any kind of security enforcement,
then set these authentication requirements.

This ensures that the userspace always gets informed about the
authentication requirements (if available). Only when no security
enforcement has happened, the kernel will signal invalid requirements.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:40 +01:00
Marcel Holtmann
0684e5f9fb Bluetooth: Use general bonding whenever possible
When receiving incoming connection to specific services, always use
general bonding. This ensures that the link key gets stored and can be
used for further authentications.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:40 +01:00
Marcel Holtmann
efc7688b55 Bluetooth: Add SCO fallback for eSCO connection attempts
When attempting to setup eSCO connections it can happen that some link
manager implementations fail to properly negotiate the eSCO parameters
and thus fail the eSCO setup. Normally the link manager is responsible
for the negotiation of the parameters and actually fallback to SCO if
no agreement can be reached. In cases where the link manager is just too
stupid, then at least try to establish a SCO link if eSCO fails.

For the Bluetooth devices with EDR support this includes handling packet
types of EDR basebands. This is particular tricky since for the EDR the
logic of enabling/disabling one specific packet type is turned around.
This fix contains an extra bitmask to disable eSCO EDR packet when
trying to fallback to a SCO connection.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:37 +01:00
Marcel Holtmann
255c76014a Bluetooth: Don't check encryption for L2CAP raw sockets
For L2CAP sockets with medium and high security requirement a missing
encryption will enforce the closing of the link. For the L2CAP raw
sockets this is not needed, so skip that check.

This fixes a crash when pairing Bluetooth 2.0 (and earlier) devices
since the L2CAP state machine got confused and then locked up the whole
system.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:36 +01:00
Jaikumar Ganesh
6e1031a400 Bluetooth: When encryption is dropped, do not send RFCOMM packets
During a role change with pre-Bluetooth 2.1 devices, the remote side drops
the encryption of the RFCOMM connection. We allow a grace period for the
encryption to be re-established, before dropping the connection. During
this grace period, the RFCOMM_SEC_PENDING flag is set. Check this flag
before sending RFCOMM packets.

Signed-off-by: Jaikumar Ganesh <jaikumar@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:35 +01:00
Dave Young
dd2efd03b4 Bluetooth: Remove CONFIG_DEBUG_LOCK_ALLOC ifdefs
Due to lockdep changes, the CONFIG_DEBUG_LOCK_ALLOC ifdef is not needed
now. So just remove it here.

The following commit fixed the !lockdep build warnings:

commit e8f6fbf62d
Author: Ingo Molnar <mingo@elte.hu>
Date:   Wed Nov 12 01:38:36 2008 +0000

    lockdep: include/linux/lockdep.h - fix warning in net/bluetooth/af_bluetooth.c

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:34 +01:00
Marcel Holtmann
5f9018af00 Bluetooth: Update version numbers
With the support for the enhanced security model and the support for
deferring connection setup, it is a good idea to increase various
version numbers.

This is purely cosmetic and has no effect on the behavior, but can
be really helpful when debugging problems in different kernel versions.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:34 +01:00
Marcel Holtmann
0588d94fd7 Bluetooth: Restrict application of socket options
The new socket options should only be evaluated for SOL_BLUETOOTH level
and not for every other level. Previously this causes some minor issues
when detecting if a kernel with certain features is available.

Also restrict BT_SECURITY to SOCK_SEQPACKET for L2CAP and SOCK_STREAM for
the RFCOMM protocol.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:33 +01:00
Marcel Holtmann
f62e4323ab Bluetooth: Disconnect L2CAP connections without encryption
For L2CAP connections with high security setting, the link will be
immediately dropped when the encryption gets disabled. For L2CAP
connections with medium security there will be grace period where
the remote device has the chance to re-enable encryption. If it
doesn't happen then the link will also be disconnected.

The requirement for the grace period with medium security comes from
Bluetooth 2.0 and earlier devices that require role switching.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:33 +01:00
Marcel Holtmann
8c84b83076 Bluetooth: Pause RFCOMM TX when encryption drops
A role switch with devices following the Bluetooth pre-2.1 standards
or without Encryption Pause and Resume support is not possible if
encryption is enabled. Most newer headsets require the role switch,
but also require that the connection is encrypted.

For connections with a high security mode setting, the link will be
immediately dropped. When the connection uses medium security mode
setting, then a grace period is introduced where the TX is halted and
the remote device gets a change to re-enable encryption after the
role switch. If not re-enabled the link will be dropped.

Based on initial work by Ville Tervo <ville.tervo@nokia.com>

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:33 +01:00
Marcel Holtmann
9f2c8a03fb Bluetooth: Replace RFCOMM link mode with security level
Change the RFCOMM internals to use the new security levels and remove
the link mode details.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:26 +01:00
Marcel Holtmann
2af6b9d518 Bluetooth: Replace L2CAP link mode with security level
Change the L2CAP internals to use the new security levels and remove
the link mode details.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:26 +01:00
Marcel Holtmann
8c1b235594 Bluetooth: Add enhanced security model for Simple Pairing
The current security model is based around the flags AUTH, ENCRYPT and
SECURE. Starting with support for the Bluetooth 2.1 specification this is
no longer sufficient. The different security levels are now defined as
SDP, LOW, MEDIUM and SECURE.

Previously it was possible to set each security independently, but this
actually doesn't make a lot of sense. For Bluetooth the encryption depends
on a previous successful authentication. Also you can only update your
existing link key if you successfully created at least one before. And of
course the update of link keys without having proper encryption in place
is a security issue.

The new security levels from the Bluetooth 2.1 specification are now
used internally. All old settings are mapped to the new values and this
way it ensures that old applications still work. The only limitation
is that it is no longer possible to set authentication without also
enabling encryption. No application should have done this anyway since
this is actually a security issue. Without encryption the integrity of
the authentication can't be guaranteed.

As default for a new L2CAP or RFCOMM connection, the LOW security level
is used. The only exception here are the service discovery sessions on
PSM 1 where SDP level is used. To have similar security strength as with
a Bluetooth 2.0 and before combination key, the MEDIUM level should be
used. This is according to the Bluetooth specification. The MEDIUM level
will not require any kind of man-in-the-middle (MITM) protection. Only
the HIGH security level will require this.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:25 +01:00
Marcel Holtmann
c89b6e6bda Bluetooth: Fix SCO state handling for incoming connections
When the remote device supports only SCO connections, on receipt of
the HCI_EV_CONN_COMPLETE event packet, the connect state is changed to
BT_CONNECTED, but the socket state is not updated. Hence, the connect()
call times out even though the SCO connection has been successfully
established.

Based on a report by Jaikumar Ganesh <jaikumar@google.com>

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:25 +01:00
Marcel Holtmann
71aeeaa1fd Bluetooth: Reject incoming SCO connections without listeners
All SCO and eSCO connection are auto-accepted no matter if there is a
corresponding listening socket for them. This patch changes this and
connection requests for SCO and eSCO without any socket are rejected.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:24 +01:00
Marcel Holtmann
f66dc81f44 Bluetooth: Add support for deferring L2CAP connection setup
In order to decide if listening L2CAP sockets should be accept()ed
the BD_ADDR of the remote device needs to be known. This patch adds
a socket option which defines a timeout for deferring the actual
connection setup.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:24 +01:00
Marcel Holtmann
bb23c0ab82 Bluetooth: Add support for deferring RFCOMM connection setup
In order to decide if listening RFCOMM sockets should be accept()ed
the BD_ADDR of the remote device needs to be known. This patch adds
a socket option which defines a timeout for deferring the actual
connection setup.

The connection setup is done after reading from the socket for the
first time. Until then writing to the socket returns ENOTCONN.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:23 +01:00
Marcel Holtmann
c4f912e155 Bluetooth: Add global deferred socket parameter
The L2CAP and RFCOMM applications require support for authorization
and the ability of rejecting incoming connection requests. The socket
interface is not really able to support this.

This patch does the ground work for a socket option to defer connection
setup. Setting this option allows calling of accept() and then the
first read() will trigger the final connection setup. Calling close()
would reject the connection.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:23 +01:00
Marcel Holtmann
d58daf42d2 Bluetooth: Preparation for usage of SOL_BLUETOOTH
The socket option levels SOL_L2CAP, SOL_RFOMM and SOL_SCO are currently
in use by various Bluetooth applications. Going forward the common
option level SOL_BLUETOOTH should be used. This patch prepares the clean
split of the old and new option levels while keeping everything backward
compatibility.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:22 +01:00
Victor Shcherbatyuk
91aa35a5aa Bluetooth: Fix issue with return value of rfcomm_sock_sendmsg()
In case of connection failures the rfcomm_sock_sendmsg() should return
an error and not a 0 value.

Signed-off-by: Victor Shcherbatyuk <victor.shcherbatyuk@tomtom.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-02-27 06:14:21 +01:00
Pavel Emelyanov
3f53a38131 ipv6: don't use tw net when accounting for recycled tw
We already have a valid net in that place, but this is not just a
cleanup - the tw pointer can be NULL there sometimes, thus causing
an oops in NET_NS=y case.

The same place in ipv4 code already works correctly using existing 
net, rather than tw's one.

The bug exists since 2.6.27.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 03:35:13 -08:00
Ingo Molnar
0dcec8c27b alloc_percpu: add align argument to __alloc_percpu, fix
Impact: build fix

API was changed, but not all usage sites were converted:

 net/ipv4/route.c: In function ‘ip_rt_init’:
 net/ipv4/route.c:3379: error: too few arguments to function ‘__alloc_percpu’

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 14:09:41 +01:00
David S. Miller
f11c179eea Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/orinoco/orinoco.c
2009-02-25 00:02:05 -08:00
Wei Yongjun
bb80087a94 sit: used time_before for comparing jiffies
The functions time_before is more robust for comparing
jiffies against other values.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-24 23:37:19 -08:00
Wei Yongjun
26d94b46d0 ipip: used time_before for comparing jiffies
The functions time_before is more robust for comparing
jiffies against other values.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-24 23:36:47 -08:00
Wei Yongjun
da6185d874 gre: used time_before for comparing jiffies
The functions time_before is more robust for comparing
jiffies against other values.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-24 23:34:48 -08:00
Wei Yongjun
800d55f146 ipv6: Remove some pointless conditionals before kfree_skb()
Remove some pointless conditionals before kfree_skb().

The semantic match that finds the problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@
expression E;
@@
- if (E)
- 	kfree_skb(E);
+ kfree_skb(E);
// </smpl>

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-24 23:33:52 -08:00
Pablo Neira Ayuso
1ce85fe402 netlink: change nlmsg_notify() return value logic
This patch changes the return value of nlmsg_notify() as follows:

If NETLINK_BROADCAST_ERROR is set by any of the listeners and
an error in the delivery happened, return the broadcast error;
else if there are no listeners apart from the socket that
requested a change with the echo flag, return the result of the
unicast notification. Thus, with this patch, the unicast
notification is handled in the same way of a broadcast listener
that has set the NETLINK_BROADCAST_ERROR socket flag.

This patch is useful in case that the caller of nlmsg_notify()
wants to know the result of the delivery of a netlink notification
(including the broadcast delivery) and take any action in case
that the delivery failed. For example, ctnetlink can drop packets
if the event delivery failed to provide reliable logging and
state-synchronization at the cost of dropping packets.

This patch also modifies the rtnetlink code to ignore the return
value of rtnl_notify() in all callers. The function rtnl_notify()
(before this patch) returned the error of the unicast notification
which makes rtnl_set_sk_err() reports errors to all listeners. This
is not of any help since the origin of the change (the socket that
requested the echoing) notices the ENOBUFS error if the notification
fails and should resync itself.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-24 23:18:28 -08:00
Joe Perches
a52b8bd338 tcp_scalable: Update malformed & dead url
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-24 16:40:16 -08:00
David S. Miller
8b6f92b1bd Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2009-02-24 13:49:05 -08:00
Ingo Molnar
0edcf8d692 Merge branch 'tj-percpu' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/percpu
Conflicts:
	arch/x86/include/asm/pgtable.h
2009-02-24 21:52:45 +01:00
Eric Dumazet
28337ff543 netfilter: xt_hashlimit fix
Commit 784544739a
(netfilter: iptables: lock free counters) broke xt_hashlimit netfilter module :

This module was storing a pointer inside its xt_hashlimit_info, and this pointer
is not relocated when we temporarly switch tables (iptables -L).

This hack is not not needed at all (probably a leftover from
ancient time), as each cpu should and can access to its own copy.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-24 15:30:29 +01:00
Josef Drexler
325fb5b4d2 netfilter: xt_recent: fix proc-file addition/removal of IPv4 addresses
Fix regression introduded by commit 079aa88 (netfilter: xt_recent: IPv6 support):

From http://bugzilla.kernel.org/show_bug.cgi?id=12753:

Problem Description:
An uninitialized buffer causes IPv4 addresses added manually (via the +IP
command to the proc interface) to never match any packets. Similarly, the -IP
command fails to remove IPv4 addresses.

Details:
In the function recent_entry_lookup, the xt_recent module does comparisons of
the entire nf_inet_addr union value, both for IPv4 and IPv6 addresses. For
addresses initialized from actual packets the remaining 12 bytes not occupied
by the IPv4 are zeroed so this works correctly. However when setting the
nf_inet_addr addr variable in the recent_mt_proc_write function, only the IPv4
bytes are initialized and the remaining 12 bytes contain garbage.

Hence addresses added in this way never match any packets, unless these
uninitialized 12 bytes happened to be zero by coincidence. Similarly, addresses
cannot consistently be removed using the proc interface due to mismatch of the
garbage bytes (although it will sometimes work to remove an address that was
added manually).

Reading the /proc/net/xt_recent/ entries hides this problem because this only
uses the first 4 bytes when displaying IPv4 addresses.

Steps to reproduce:
$ iptables -I INPUT -m recent --rcheck -j LOG
$ echo +169.254.156.239 > /proc/net/xt_recent/DEFAULT
$ cat /proc/net/xt_recent/DEFAULT
src=169.254.156.239 ttl: 0 last_seen: 119910 oldest_pkt: 1 119910

[At this point no packets from 169.254.156.239 are being logged.]

$ iptables -I INPUT -s 169.254.156.239 -m recent --set
$ cat /proc/net/xt_recent/DEFAULT
src=169.254.156.239 ttl: 0 last_seen: 119910 oldest_pkt: 1 119910
src=169.254.156.239 ttl: 255 last_seen: 126184 oldest_pkt: 4 125434, 125684, 125934, 126184

[At this point, adding the address via an iptables rule, packets are being
logged correctly.]

$ echo -169.254.156.239 > /proc/net/xt_recent/DEFAULT
$ cat /proc/net/xt_recent/DEFAULT
src=169.254.156.239 ttl: 0 last_seen: 119910 oldest_pkt: 1 119910
src=169.254.156.239 ttl: 255 last_seen: 126992 oldest_pkt: 10 125434, 125684, 125934, 126184, 126434, 126684, 126934, 126991, 126991, 126992
$ echo -169.254.156.239 > /proc/net/xt_recent/DEFAULT
$ cat /proc/net/xt_recent/DEFAULT
src=169.254.156.239 ttl: 0 last_seen: 119910 oldest_pkt: 1 119910
src=169.254.156.239 ttl: 255 last_seen: 126992 oldest_pkt: 10 125434, 125684, 125934, 126184, 126434, 126684, 126934, 126991, 126991, 126992

[Removing the address via /proc interface failed evidently.]

Possible solutions:
- initialize the addr variable in recent_mt_proc_write
- compare only 4 bytes for IPv4 addresses in recent_entry_lookup

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-24 14:53:12 +01:00
Pablo Neira Ayuso
7d1e04598e netfilter: nf_conntrack: account packets drop by tcp_packet()
Since tcp_packet() may return -NF_DROP in two situations, the
packet-drop stats must be increased.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-24 14:48:01 +01:00
David S. Miller
e70049b9e7 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2009-02-24 03:50:29 -08:00
Jesper Dangaard Brouer
d18921a0e3 Doc: Refer to ip-sysctl.txt for strict vs. loose rp_filter mode
The IP_ADVANCED_ROUTER Kconfig describes the rp_filter
proc option.  Recent changes added a loose mode.
Instead of documenting this change too places, refer to
the document describing it:
 Documentation/networking/ip-sysctl.txt

I'm considering moving the rp_filter description away
from the Kconfig file into ip-sysctl.txt.

Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-24 03:47:42 -08:00
Linus Torvalds
f7e603ad8f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  net: amend the fix for SO_BSDCOMPAT gsopt infoleak
  netns: build fix for net_alloc_generic
2009-02-23 20:29:21 -08:00
Eugene Teo
50fee1dec5 net: amend the fix for SO_BSDCOMPAT gsopt infoleak
The fix for CVE-2009-0676 (upstream commit df0bca04) is incomplete. Note
that the same problem of leaking kernel memory will reappear if someone
on some architecture uses struct timeval with some internal padding (for
example tv_sec 64-bit and tv_usec 32-bit) --- then, you are going to
leak the padded bytes to userspace.

Signed-off-by: Eugene Teo <eugeneteo@kernel.sg>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-23 15:38:41 -08:00
Clemens Noss
ebe47d47b7 netns: build fix for net_alloc_generic
net_alloc_generic was defined in #ifdef CONFIG_NET_NS, but used
unconditionally. Move net_alloc_generic out of #ifdef.

Signed-off-by: Clemens Noss <cnoss@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-23 15:37:35 -08:00
Linus Torvalds
d38e84ee39 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  netns: fix double free at netns creation
  veth : add the set_mac_address capability
  sunlance: Beyond ARRAY_SIZE of ib->btx_ring
  sungem: another error printed one too early
  ISDN: fix sc/shmem printk format warning
  SMSC: timeout reaches -1
  smsc9420: handle magic field of ethtool_eeprom
  sundance: missing parentheses?
  smsc9420: fix another postfixed timeout
  wimax/i2400m: driver loads firmware v1.4 instead of v1.3
  vlan: Update skb->mac_header in __vlan_put_tag().
  cxgb3: Add support for PCI ID 0x35.
  tcp: remove obsoleted comment about different passes
  TG3: &&/|| confusion
  ATM: misplaced parentheses?
  net/mv643xx: don't disable the mib timer too early and lock properly
  net/mv643xx: use GFP_ATOMIC while atomic
  atl1c: Atheros L1C Gigabit Ethernet driver
  net: Kill skb_truesize_check(), it only catches false-positives.
  net: forcedeth: Fix wake-on-lan regression
2009-02-23 14:36:05 -08:00
Eric W. Biederman
ce16c5337a netns: Remove net_alive
It turns out that net_alive is unnecessary, and the original problem
that led to it being added was simply that the icmp code thought
it was a network device and wound up being unable to handle packets
while there were still packets in the network namespace.

Now that icmp and tcp have been fixed to properly register themselves
this problem is no longer present and we have a stronger guarantee
that packets will not arrive in a network namespace then that provided
by net_alive in netif_receive_skb.  So remove net_alive allowing
packet reception run a little faster.

Additionally document the strong reason why network namespace cleanup
is safe so that if something happens again someone else will have
a chance of figuring it out.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-22 19:54:50 -08:00
Eric W. Biederman
6a1b3054d9 tcp: Like icmp use register_pernet_subsys
To remove the possibility of packets flying around when network
devices are being cleaned up use reisger_pernet_subsys instead of
register_pernet_device.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-22 19:54:49 -08:00
Eric W. Biederman
959d272649 netns: Fix icmp shutdown.
Recently I had a kernel panic in icmp_send during a network namespace
cleanup.  There were packets in the arp queue that failed to be sent
and we attempted to generate an ICMP host unreachable message, but
failed because icmp_sk_exit had already been called.

The network devices are removed from a network namespace and their
arp queues are flushed before we do attempt to shutdown subsystems
so this error should have been impossible.

It turns out icmp_init is using register_pernet_device instead
of register_pernet_subsys.  Which resulted in icmp being shut down
while we still had the possibility of packets in flight, making
a nasty NULL pointer deference in interrupt context possible.

Changing this to register_pernet_subsys fixes the problem in
my testing.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-22 19:54:48 -08:00
Jesper Dangaard Brouer
a6e8f27f3c ipv4: Clean whitespaces in net/ipv4/Kconfig.
While going through net/ipv4/Kconfig cleanup whitespaces.

Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-22 19:54:48 -08:00
Jesper Dangaard Brouer
b2cc46a8ee ipv4: Fix rp_filter description in net/ipv4/Kconfig.
The reverse path filter (rp_filter) will NOT get enabled
when enabling forwarding.  Read the code and tested in
in practice.

Most distributions do enable it in startup scripts.

Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-22 19:54:47 -08:00
Stephen Hemminger
0117cfabe3 snap: handle registration error and compile warning
If this module can't load, it is almost certainly because something else
is already bound to that SAP. So in that case, return the same error code
as other SAP usage, and fail the module load.

Also fixes a compiler warning about printk of non const.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-22 19:54:47 -08:00
Stephen Hemminger
01af4a0e3c llc: fix non-const printk warning
Mark some strings as const.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-22 19:54:46 -08:00
Stephen Hemminger
5747a1aacd ip: ipip compile warning
Get rid of compile warning about non-const format

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-22 19:54:45 -08:00
Stephen Hemminger
c1cf8422f0 ip: add loose reverse path filtering
Extend existing reverse path filter option to allow strict or loose
filtering. (See http://en.wikipedia.org/wiki/Reverse_path_filtering).

For compatibility with existing usage, the value 1 is chosen for strict mode
and 2 for loose mode.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-22 19:54:45 -08:00
Paul Moore
586c250037 cipso: Fix documentation comment
The CIPSO protocol engine incorrectly stated that the FIPS-188 specification
could be found in the kernel's Documentation directory.  This patch corrects
that by removing the comment and directing users to the FIPS-188 documented
hosted online.  For the sake of completeness I've also included a link to the
CIPSO draft specification on the NetLabel website.

Thanks to Randy Dunlap for spotting the error and letting me know.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-02-23 10:05:54 +11:00
Daniel Lezcano
486a87f1e5 netns: fix double free at netns creation
This patch fix a double free when a network namespace fails.
The previous code does a kfree of the net_generic structure when
one of the init subsystem initialization fails.
The 'setup_net' function does kfree(ng) and returns an error.
The caller, 'copy_net_ns', call net_free on error, and this one
calls kfree(net->gen), making this pointer freed twice.

This patch make the code symetric, the net_alloc does the net_generic
allocation and the net_free frees the net_generic.

Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-22 00:07:53 -08:00
Herbert Xu
7691367d71 tcp: Always set urgent pointer if it's beyond snd_nxt
Our TCP stack does not set the urgent flag if the urgent pointer
does not fit in 16 bits, i.e., if it is more than 64K from the
sequence number of a packet.

This behaviour is different from the BSDs, and clearly contradicts
the purpose of urgent mode, which is to send the notification
(though not necessarily the associated data) as soon as possible.
Our current behaviour may in fact delay the urgent notification
indefinitely if the receiver window does not open up.

Simply matching BSD however may break legacy applications which
incorrectly rely on the out-of-band delivery of urgent data, and
conversely the in-band delivery of non-urgent data.

Alexey Kuznetsov suggested a safe solution of following BSD only
if the urgent pointer itself has not yet been transmitted.  This
way we guarantee that when the remote end sees the packet with
non-urgent data marked as urgent due to wrap-around we would have
advanced the urgent pointer beyond, either to the actual urgent
data or to an as-yet untransmitted packet.

The only potential downside is that applications on the remote
end may see multiple SIGURG notifications.  However, this would
occur anyway with other TCP stacks.  More importantly, the outcome
of such a duplicate notification is likely to be harmless since
the signal itself does not carry any information other than the
fact that we're in urgent mode.

Thanks to Ilpo Järvinen for fixing a critical bug in this and
Jeff Chua for reporting that bug.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-21 23:52:29 -08:00
Hannes Eder
66da8c529a ipv6: fix sparse warning: Using plain integer as NULL pointer
Fix this sparse warning:
  net/ipv6/xfrm6_state.c:72:26: warning: Using plain integer as NULL pointer

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-21 23:37:10 -08:00
Patrick Ohly
cd4d8fdad1 net: kernel panic in dev_hard_start_xmit: remove faulty software TX time stamping
The current implementation of the TX software time stamping fallback is
faulty because it accesses the skb after ndo_start_xmit() returns
successfully. This patch removes the fallback, which fixes kernel panics
seen during stress tests. Hardware time stamping is not affected by this
removal.

Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-21 02:42:18 -08:00
Eric Dumazet
08361aa807 netfilter: ip_tables: unfold two critical loops in ip_packet_match()
While doing oprofile tests I noticed two loops are not properly unrolled by gcc

Using a hand coded unrolled loop provides nice speedup : ipt_do_table
credited of 2.52 % of cpu instead of 3.29 % in tbench.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-20 11:03:33 +01:00
Adam Nielsen
268cb38e18 netfilter: x_tables: add LED trigger target
Kernel module providing implementation of LED netfilter target.  Each
instance of the target appears as a led-trigger device, which can be
associated with one or more LEDs in /sys/class/leds/

Signed-off-by: Adam Nielsen <a.nielsen@shikadi.net>
Acked-by: Richard Purdie <rpurdie@linux.intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-20 10:55:14 +01:00
Hagen Paul Pfeifer
af07d241dc netfilter: fix hardcoded size assumptions
get_random_bytes() is sometimes called with a hard coded size assumption
of an integer. This could not be true for next centuries. This patch
replace it with a compile time statement.

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-20 10:48:06 +01:00
Hagen Paul Pfeifer
e478075c6f netfilter: nf_conntrack: table max size should hold at least table size
Table size is defined as unsigned, wheres the table maximum size is
defined as a signed integer. The calculation of max is 8 or 4,
multiplied the table size. Therefore the max value is aligned to
unsigned.

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-20 10:47:09 +01:00
Stephen Hemminger
784544739a netfilter: iptables: lock free counters
The reader/writer lock in ip_tables is acquired in the critical path of
processing packets and is one of the reasons just loading iptables can cause
a 20% performance loss. The rwlock serves two functions:

1) it prevents changes to table state (xt_replace) while table is in use.
   This is now handled by doing rcu on the xt_table. When table is
   replaced, the new table(s) are put in and the old one table(s) are freed
   after RCU period.

2) it provides synchronization when accesing the counter values.
   This is now handled by swapping in new table_info entries for each cpu
   then summing the old values, and putting the result back onto one
   cpu.  On a busy system it may cause sampling to occur at different
   times on each cpu, but no packet/byte counts are lost in the process.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Sucessfully tested on my dual quad core machine too, but iptables only (no ipv6 here)
BTW, my new "tbench 8" result is 2450 MB/s, (it was 2150 MB/s not so long ago)

Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-20 10:35:32 +01:00
Pablo Neira Ayuso
be0c22a46c netlink: add NETLINK_BROADCAST_ERROR socket option
This patch adds NETLINK_BROADCAST_ERROR which is a netlink
socket option that the listener can set to make netlink_broadcast()
return errors in the delivery to the caller. This option is useful
if the caller of netlink_broadcast() do something with the result
of the message delivery, like in ctnetlink where it drops a network
packet if the event delivery failed, this is used to enable reliable
logging and state-synchronization. If this socket option is not set,
netlink_broadcast() only reports ESRCH errors and silently ignore
ENOBUFS errors, which is what most netlink_broadcast() callers
should do.

This socket option is based on a suggestion from Patrick McHardy.
Patrick McHardy can exchange this patch for a beer from me ;).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-20 01:01:08 -08:00
Santwona Behera
59089d8d16 ethtool: Add RX pkt classification interface
Signed-off-by: Santwona Behera <santwona.behera@sun.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-20 00:58:13 -08:00
Rusty Russell
313e458f81 alloc_percpu: add align argument to __alloc_percpu.
This prepares for a real __alloc_percpu, by adding an alignment argument.
Only one place uses __alloc_percpu directly, and that's for a string.

tj: af_inet also uses __alloc_percpu(), update it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
2009-02-20 16:29:08 +09:00
Eric Dumazet
323dbf9638 netfilter: ip6_tables: unfold two loops in ip6_packet_match()
ip6_tables netfilter module can use an ifname_compare() helper
so that two loops are unfolded.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-19 11:18:23 +01:00
Eric Dumazet
eacc17fb64 netfilter: xt_physdev: unfold two loops in physdev_mt()
xt_physdev netfilter module can use an ifname_compare() helper
so that two loops are unfolded.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-19 11:17:17 +01:00
Jan Engelhardt
4323362e49 netfilter: xtables: add backward-compat options
Concern has been expressed about the changing Kconfig options.
Provide the old options that forward-select.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-19 11:16:03 +01:00
Krishna Kumar
e88721f87d net: Optimize skb_tx_hash() by eliminating a comparison
Optimize skb_tx_hash() by eliminating a comparison that executes for
every packet. skb_tx_hashrnd initialization is moved to a later part of
the startup sequence, namely after the "random" driver is initialized.

Rebooted the system three times and verified that the code generates
different random numbers each time.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-18 17:55:02 -08:00
Ilpo Järvinen
5209921cf1 tcp: remove obsoleted comment about different passes
This is obsolete since the passes got combined.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-18 17:45:44 -08:00