The rx->count reference is used to track reference counts to the
number of rx-queue kobjects created for the device. This patch
eliminates initialization of the counter in netif_alloc_rx_queues
and instead increments the counter each time a kobject is created.
This is now symmetric with the decrement that is done when an object is
released.
Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are a bunch of issues that need to be fixed, including:
- GFP_KERNEL allocations from atomic context
(and GFP_ATOMIC in process context),
- abuse of the setsockopt() call convention,
- unprotected/unlocked static variables...
IMHO, we will need to alter the userspace ABI when we fix it. So mark
the configuration option as EXPERIMENTAL for the time being (or should
it be BROKEN instead?).
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code works like this:
int garbage, status;
socklen_t len = sizeof(status);
/* enable pipe */
setsockopt(fd, SOL_PNPIPE, PNPIPE_ENABLE, &garbage, sizeof(garbage));
/* disable pipe */
setsockopt(fd, SOL_PNPIPE, PNPIPE_DISABLE, &garbage, sizeof(garbage));
/* get status */
getsockopt(fd, SOL_PNPIPE, PNPIPE_INQ, &status, &len);
...which does not follow the usual socket option pattern. This patch
merges all three "options" into a single gettable&settable option,
before Linux 2.6.37 gets out:
int status;
socklen_t len = sizeof(status);
/* enable pipe */
status = 1;
setsockopt(fd, SOL_PNPIPE, PNPIPE_ENABLE, &status, sizeof(status));
/* disable pipe */
status = 0;
setsockopt(fd, SOL_PNPIPE, PNPIPE_ENABLE, &status, sizeof(status));
/* get status */
getsockopt(fd, SOL_PNPIPE, PNPIPE_ENABLE, &status, &len);
This also fixes the error code from EFAULT to ENOTCONN.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Cc: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As it currently is, the new code path is not compatible with existing
Nokia modems. This would break existing userspace for Nokia modem, such
as the existing oFono ISI driver.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit "fib: RCU conversion of fib_lookup()" removed rcu_read_lock() from
__mkroute_output but left a couple of calls to rcu_read_unlock() in there.
This causes lockdep to complain that the rcu_read_unlock() call in
__ip_route_output_key causes a lock inbalance and quickly crashes the
kernel. The below fixes this for me.
Signed-off-by: Dimitris Michailidis <dm@chelsio.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calling ETHTOOL_GRXCLSRLALL with a large rule_cnt will allocate kernel
heap without clearing it. For the one driver (niu) that implements it,
it will leave the unused portion of heap unchanged and copy the full
contents back to userspace.
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Synchronise the comment with the preceding implementation change.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no need for the WDS peer address
to not be const, so make it const.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Bill Jordan's patch to allow setting the WDS
peer crossed with my patch removing all the
boilerplate code in nl80211, and consequently
he didn't make use of it yet. Fix that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The recent scan overhaul broke locking
because now we can jump to code that
attempts to unlock, while we don't have
the mutex held. Fix this by holding the
mutex around all the relevant code.
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This reverts commit 5ed3bc7288.
It turns-out that not all drivers are calling ieee80211_tx_status from a
compatible context. Revert this for now and try again later...
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Do not set num_rx_queues in netif_set_real_num_rx_queues() some
drivers will increase the real_num_rx_queues later due to a feature
changes or available interrupts increasing. By setting num_rx_queues
here this ends up creating a cap on the number of rx queues
available.
For example the ixgbe driver sets the max number of queues it intends
to use ever then sets the current number in use with the
netif_set_num_{rx|tx}_queues calls. With the current implementation
the number of rx queues gets limited so when a feature such as DCB
or FCoE is enabled the queues are no longer available.
kobjects will only be allocated for real_num_rx_queues so the waste
in memory is minimal.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove dead code and make some functions static.
Compile tested only.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This looks like a simple typo that has gone unnoticed for some time. The
impact is relatively low but it's clearly wrong.
Signed-off-by: John Heffner <johnwheffner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the second step for neighbour RCU conversion.
(first was commit d6bf7817 : RCU conversion of neigh hash table)
neigh_lookup() becomes lockless, but still take a reference on found
neighbour. (no more read_lock()/read_unlock() on tbl->lock)
struct neighbour gets an additional rcu_head field and is freed after an
RCU grace period.
Future work would need to eventually not take a reference on neighbour
for temporary dst (DST_NOCACHE), but this would need dst->_neighbour to
use a noref bit like we did for skb->_dst.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/mac80211/scan.c: In function ‘ieee80211_scan_cancel’:
net/mac80211/scan.c:794: warning: ‘finish’ may be used uninitialized in this function
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When multiple interfaces are actively trying
to associate with the same BSS, they may both
find that the BSS isn't there and then try to
unlink it. This can cause errors since the
unlinking code can't currently deal with items
that have already been unlinked.
Normally this doesn't happen as most people
don't try to use multiple station interfaces
that associate at the same time too.
Fix this by using the list entry as a flag to
see if the item is still on a list.
Cc: stable@kernel.org
Reported-by: Ben Greear <greearb@candelatech.com>
Tested-by: Hun-Kyi Wynn <hkwynn@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This information is already available in mac80211, we just need to export it
via cfg80211 and nl80211.
Signed-off-by: Bruno Randolf <br1@einfach.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We nulify local->scan_req on failure in __ieee80211_start_scan, so
__ieee80211_scan_completed will not call cfg80211_scan_done. Fix that.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When performing hw scan and not abort it, __ieee80211_scan_completed()
is currently called from scan work, so does not need to reschedule work
to call drv_hw_scan().
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This is partial revert and fix for commit
85f72bc839 "mac80211: only cancel
software-based scans on suspend"
When cfg80211 request the scan and mac80211 perform some management work,
we defer the scan request. We do not canceling such requests when calling
ieee80211_scan_cancel(), because of SCAN_SW_SCANNING bit check just
before the call. So fix that problem.
Another problem, which commit 85f72bc839
tries to solve, is we can not cancel HW scan. Hence patch make
ieee80211_scan_cancel() ignore HW scan (see code comments). Keeping
local->mtx lock assures that the deferred scan will not become
"working" HW scan.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We are taking local->mtx inside __ieee80211_scan_completed(), but just
before call to that function we drop the lock. Dropping/taking lock is not
good, because can lead to hard to understand race conditions.
Patch split scan_completed() code into two functions, first must be called
with local->mtx taken and second without it.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Use goto instruction to call __ieee80211_scan_completed only ones in
ieee80211_scan_work. This is prepare for the next patch.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Yikes! The error return keeps a netdev reference
and the rdev mutex locked, fix that!
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds API to allow adding per-station GTKs,
updates mac80211 to support it, and also allows
drivers to remove a key from hwaccel again when
this may be necessary due to multiple GTKs.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When roaming while we have active BA session,
we can end up transmitting delBA frames to
the old AP while we're already on the new AP's
channel, which can cause warnings.
Simply avoid sending those frames, but still
tear down the internal session state, since
they are not really necessary anyway as we
will implicitly disassociate when sending the
association to the new AP.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We never delete the addBA response timer, which
is typically fine, but if the station it belongs
to is deleted very quickly after starting the BA
session, before the peer had a chance to reply,
the timer may fire after the station struct has
been freed already. Therefore, we need to delete
the timer in a suitable spot -- best when the
session is being stopped (which will happen even
then) in which case the delete will be a no-op
most of the time.
I've reproduced the scenario and tested the fix.
This fixes the crash reported at
http://mid.gmane.org/4CAB6F96.6090701@candelatech.com
Cc: stable@kernel.org
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
caif_connect() might dereference a netdevice after dev_put() it.
It also doesnt check dev_get_by_index() return value and could
dereference a NULL pointer.
Fix it, using RCU to avoid taking a reference.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IGMP specs states that if the system receives a
membership report, it shouldn't send another for the
next minute. However, if a link failure happens right
after that, the backup slave and the switch connected
to this slave will not know about the multicast and
the traffic will hang for about a minute.
This patch fixes it to rejoin multicast groups immediately
after a failover restoring the multicast traffic.
Signed-off-by: Flavio Leitner <fleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Userspace applications can already request to receive timestamps with:
setsockopt(sockfd, SOL_SOCKET, SO_TIMESTAMP, ...)
Although setsockopt() returns zero (success), timestamps are not added to the
ancillary data. This patch fixes that on SOCK_DGRAM and SOCK_SEQPACKET Unix
sockets.
Signed-off-by: Alban Crequy <alban.crequy@collabora.co.uk>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David
This is the first step for RCU conversion of neigh code.
Next patches will convert hash_buckets[] and "struct neighbour" to RCU
protected objects.
Thanks
[PATCH net-next] net neigh: RCU conversion of neigh hash table
Instead of storing hash_buckets, hash_mask and hash_rnd in "struct
neigh_table", a new structure is defined :
struct neigh_hash_table {
struct neighbour **hash_buckets;
unsigned int hash_mask;
__u32 hash_rnd;
struct rcu_head rcu;
};
And "struct neigh_table" has an RCU protected pointer to such a
neigh_hash_table.
This means the signature of (*hash)() function changed: We need to add a
third parameter with the actual hash_rnd value, since this is not
anymore a neigh_table field.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
neigh_delete() and neigh_add() dont need to touch device refcount,
we hold RTNL when calling them, so device cannot disappear under us.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In various situations, a device provides a packet to our stack and we
drop it before it enters protocol stack :
- softnet backlog full (accounted in /proc/net/softnet_stat)
- bad vlan tag (not accounted)
- unknown/unregistered protocol (not accounted)
We can handle a per-device counter of such dropped frames at core level,
and automatically adds it to the device provided stats (rx_dropped), so
that standard tools can be used (ifconfig, ip link, cat /proc/net/dev)
This is a generalization of commit 8990f468a (net: rx_dropped
accounting), thus reverting it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Be consistent and use the wk->chan instead of the
local->hw.conf.channel for the association done work.
This prevents any possible races against channel changes
while we run this work.
In the case that the race did happen we would be initializing
the bit rates for the new AP under the assumption of a wrong
channel and in the worst case, wrong band. This could lead
to trying to assuming we could use CCK frames on 5 GHz, for
example.
This patch has a fix for kernels >= v2.6.34
Cc: stable@kernel.org
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The locking around ieee80211_recalc_smps is
buggy -- it cannot acquire another interface's
mutex while the iflist mutex is held because
another code path could be holding the iface
mutex and trying to acquire the iflist mutex.
But the locking is also unnecessary, we only
check "ifmgd->associated" as a bool, and don't
use the pointer (in check_mgd_smps).
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Consolidate boilerplate code needed for .dumpit
calls operating on netdevs.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Use a new flag that requires the netdev to be
UP and use it to check instead of coding the
check into all functions that require it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This makes nl80211 use the new genetlink
pre_doit/post_doit hooks for locking and
checking the interface/wiphy index.
This significantly reduces the code size
and the likelihood of locking errors.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Each family may have some amount of boilerplate
locking code that applies to most, or even all,
commands.
This allows a family to handle such things in
a more generic way, by allowing it to
a) include private flags in each operation
b) specify a pre_doit hook that is called,
before an operation's doit() callback and
may return an error directly,
c) specify a post_doit hook that can undo
locking or similar things done by pre_doit,
and finally
d) include two private pointers in each info
struct passed between all these operations
including doit(). (It's two because I'll
need two in nl80211 -- can be extended.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Enable WME QoS in IBSS mode by adding a WME information element to beacons and
probe respones and by checking for it and marking stations as WME capable if it
is present.
Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Some drivers cannot handle multiple retry rates specified by the rc
algorithm but instead use their own retry table (for example rt2800).
However, if such a device registers itself with a max_rates value of 1
the rc algorithm cannot make use of the extended information the device
can provide about retried rates. On the other hand, if a device
registers itself with a max_rates value > 1 the rc algorithm assumes
that the device can handle multi rate retries.
Fix this issue by introducing another hw parameter max_report_rates that
can be set to a different value then max_rates to indicate if a device
is capable of reporting more rates then specified in max_rates.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Initialize the rate table for WDS interfaces, and
add cases to allow WDS packets to pass the xmit and receive
tests.
Signed-off-by: Bill Jordan <bjordan@rajant.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Added a nl interface to set the peer bssid of a WDS interface.
Signed-off-by: Bill Jordan <bjordan@rajant.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
On association to an AP, after receiving beacons, the beacon_crc value is set.
The beacon_crc value is not reset in disassociation, but the BSS data may be
expired at a later point. When associating again, it's possible that a
beacon for the AP is not received, resulting in the beacon_ies to remain NULL.
After association, further beacons will not update the beacon data, as the
crc value of the beacon has not changed, and the beacon_crc still holds a
value matching the beacon. The beacon_ies will remain forever null.
One of the results of this is that WLAN power save cannot be entered, the STA
will remain foreven in active mode.
Fix this by adding a validation flag for the beacon_crc, which is cleared on
association.
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Move ieee80211_scan_cancel() and all other related code to
ieee80211_restart_work() as ieee80211_restart_hw() is intended to be
callable from any context.
Fix a bug that RTNL lock is not taken during ieee80211_cancel_scan().
Take local->mtx before WARN(test_bit(SCAN_HW_SCANNING, &local->scanning)
to prevent the race condition with __ieee80211_start_scan() described
here: http://marc.info/?l=linux-wireless&m=128516716810537&w=2
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add missing unlocking of the wiphy in set_channel,
and don't try to unlock a non-existing wiphy in
set_cqm.
Cc: stable@kernel.org [2.6.35+]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Enable management frame transmission and subscribing
to management frames through nl80211 in both cfg80211
and mac80211. Also update a few places that I forgot
to update for P2P-client mode previously, and fix a
small bug with non-action frames in this API.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Some user space applications only want to display survey data for
the operating channel, however there is no API to get that yet.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch fixes an refcounting bug. Previously it
was possible to corrupt the per-device recv. filter
and monitor management counters when:
iw dev wlanX set monitor [new flags]
was issued on an active monitor interface.
Acked-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Choose first available phyX name when creating phy devices. This
means that reloading a wifi driver will not cause a change in the
name of it's phy device.
Also, allow users to rename a phy to any un-used name, including
phy%d.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Another exported symbol only used in one file
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fib_rules_cleanup_ups is only defined and used in one place.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Code style cleanups before upcoming functional changes.
C99 initializer for fib_props array.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove duplicated include.
Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_headroom() is unsigned so "skb_headroom(skb) + toff" is also
unsigned and can't be less than zero. This test was added in 66d50d25:
"u32: negative offset fix" It was supposed to fix a regression.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ingress being not used very much, and net_device->ingress_queue being
quite a big object (128 or 256 bytes), use a dynamic allocation if
needed (tc qdisc add dev eth0 ingress ...)
dev_ingress_queue(dev) helper should be used only with RTNL taken.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
L2CAP doesn't permit change like MTU, FCS, TxWindow values while the
connection is alive, we can only set that before the
connection/configuration process. That can lead to bugs in the L2CAP
operation.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
Since we register nf hooks, matches and targets in order, we'd better
unregister them in the reverse order.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
ipt_LOG & ip6t_LOG use lot of calls to printk() and use a lock in a hope
several cpus wont mix their output in syslog.
printk() being very expensive [1], its better to call it once, on a
prebuilt and complete line. Also, with mixed IPv4 and IPv6 trafic,
separate IPv4/IPv6 locks dont avoid garbage.
I used an allocation of a 1024 bytes structure, sort of seq_printf() but
with a fixed size limit.
Use a static buffer if dynamic allocation failed.
Emit a once time alert if buffer size happens to be too short.
[1]: printk() has various features like printk_delay()...
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The functions nf_nat_proto_find_get and nf_nat_proto_put are
only used internally in nf_nat_core. This might break some out
of tree NAT module.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
vlan: dont drop packets from unknown vlans in promiscuous mode
Phonet: Correct header retrieval after pskb_may_pull
um: Proper Fix for f25c80a4: remove duplicate structure field initialization
ip_gre: Fix dependencies wrt. ipv6.
net-2.6: SYN retransmits: Add new parameter to retransmits_timed_out()
iwl3945: queue the right work if the scan needs to be aborted
mac80211: fix use-after-free
Add the SIP callid as a key for persistence.
This allows multiple connections from the same IP address to be
differentiated on the basis of the callid.
When used in conjunction with the persistence mask, it allows connections
from different IP addresses to be aggregated on the basis of the callid.
It is envisaged that a persistence mask of 0.0.0.0 will be a useful
setting. That is, ignore the source IP address when checking for
persistence.
It is envisaged that this option will be used in conjunction with
one-packet scheduling.
This only works with UDP and cannot be made to work with TCP
within the current framework.
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Julian Anastasov <ja@ssi.bg>
Fall back to normal persistence handling if the persistence
engine fails to recognise a packet.
This way, at least the packet will go somewhere.
It is envisaged that iptables could be used to block packets
such if this is not desired although nf_conntrack_sip would
likely need to be enhanced first.
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Julian Anastasov <ja@ssi.bg>
Allow the persistence engine of a virtual service to be set, edited
and unset.
This feature only works with the netlink user-space interface.
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Julian Anastasov <ja@ssi.bg>
This shouldn't break compatibility with userspace as the new data
is at the end of the line.
I have confirmed that this doesn't break ipvsadm, the main (only?)
user-space user of this data.
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Julian Anastasov <ja@ssi.bg>
In general NULL arguments aren't passed by the few callers that exist,
so don't test for them.
The exception is to make passing NULL to ip_vs_unbind_scheduler() a noop.
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Julian Anastasov <ja@ssi.bg>
Compact ip_vs_sched_persist() by setting up parameters
and calling functions once.
Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: Julian Anastasov <ja@ssi.bg>
While doing stress tests with IP route cache disabled, and multi queue
devices, I noticed a very high contention on one rwlock used in
neighbour code.
When many cpus are trying to send frames (possibly using a high
performance multiqueue device) to the same neighbour, they fight for the
neigh->lock rwlock in order to call neigh_hh_init(), and fight on
hh->hh_refcnt (a pair of atomic_inc/atomic_dec_and_test())
But we dont need to call neigh_hh_init() for dst that are used only
once. It costs four atomic operations at least, on two contended cache
lines, plus the high contention on neigh->lock rwlock.
Introduce a new dst flag, DST_NOCACHE, that is set when dst was not
inserted in route cache.
With the stress test bench, sending 160000000 frames on one neighbour,
results are :
Before patch:
real 2m28.406s
user 0m11.781s
sys 36m17.964s
After patch:
real 1m26.532s
user 0m12.185s
sys 20m3.903s
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sctp_asoc_get_hmac() function iterates through a peer's hmac_ids
array and attempts to ensure that only a supported hmac entry is
returned. The current code fails to do this properly - if the last id
in the array is out of range (greater than SCTP_AUTH_HMAC_ID_MAX), the
id integer remains set after exiting the loop, and the address of an
out-of-bounds entry will be returned and subsequently used in the parent
function, causing potentially ugly memory corruption. This patch resets
the id integer to 0 on encountering an invalid id so that NULL will be
returned after finishing the loop if no valid ids are found.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two user-controlled allocations in SCTP are subsequently dereferenced as
sockaddr structs, without checking if the dereferenced struct members fall
beyond the end of the allocated chunk. There doesn't appear to be any
information leakage here based on how these members are used and
additional checking, but it's still worth fixing.
[akpm@linux-foundation.org: remove unfashionable newlines, fix gmail tab->space conversion]
Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A recent patch to allow IGMPv2 responses to IGMPv3 queries
bypasses length checks for valid query lengths, incorrectly
resets the v2_seen timer, and does not support IGMPv1.
The following patch responds with a v2 report as required
by IGMPv2 while correcting the other problems introduced
by the patch.
Signed-Off-By: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use RCU & RTNL protection for mfc_cache_array[]
ipmr_cache_find() is called under rcu_read_lock();
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use RCU and RTNL to protect (struct mr_table)->mroute_sk
Readers use RCU, writers use RTNL.
ip_ra_control() already use an RCU grace period before
ip_ra_destroy_rcu(), so we dont need synchronize_rcu() in
mrtsock_destruct()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to get a reference on reg_dev and release it, we are in a
rcu_read_lock() protected section.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit e81963b180.
LRO is now deprecated in favour of GRO, and only a few drivers use it,
so it is desirable to build it as a module in distribution kernels.
The original change to prevent building it as a module was made in an
attempt to avoid the case where some dependents are set to y and some
to m, and INET_LRO can be set to m rather than y. However, the
Kconfig system will reliably set INET_LRO=y in this case.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the condition (3rd arg) passed to sk_wait_event() in
sk_stream_wait_memory(). The incorrect check in sk_stream_wait_memory()
causes the following soft lockup in tcp_sendmsg() when the global tcp
memory pool has exhausted.
>>> snip <<<
localhost kernel: BUG: soft lockup - CPU#3 stuck for 11s! [sshd:6429]
localhost kernel: CPU 3:
localhost kernel: RIP: 0010:[sk_stream_wait_memory+0xcd/0x200] [sk_stream_wait_memory+0xcd/0x200] sk_stream_wait_memory+0xcd/0x200
localhost kernel:
localhost kernel: Call Trace:
localhost kernel: [sk_stream_wait_memory+0x1b1/0x200] sk_stream_wait_memory+0x1b1/0x200
localhost kernel: [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel: [ipv6:tcp_sendmsg+0x6e6/0xe90] tcp_sendmsg+0x6e6/0xce0
localhost kernel: [sock_aio_write+0x126/0x140] sock_aio_write+0x126/0x140
localhost kernel: [xfs:do_sync_write+0xf1/0x130] do_sync_write+0xf1/0x130
localhost kernel: [<ffffffff802557c0>] autoremove_wake_function+0x0/0x40
localhost kernel: [hrtimer_start+0xe3/0x170] hrtimer_start+0xe3/0x170
localhost kernel: [vfs_write+0x185/0x190] vfs_write+0x185/0x190
localhost kernel: [sys_write+0x50/0x90] sys_write+0x50/0x90
localhost kernel: [system_call+0x7e/0x83] system_call+0x7e/0x83
>>> snip <<<
What is happening is, that the sk_wait_event() condition passed from
sk_stream_wait_memory() evaluates to true for the case of tcp global memory
exhaustion. This is because both sk_stream_memory_free() and vm_wait are true
which causes sk_wait_event() to *not* call schedule_timeout().
Hence sk_stream_wait_memory() returns immediately to the caller w/o sleeping.
This causes the caller to again try allocation, which again fails and again
calls sk_stream_wait_memory(), and so on.
[ Bug introduced by commit c1cbe4b7ad
("[NET]: Avoid atomic xchg() for non-error case") -DaveM ]
Signed-off-by: Nagendra Singh Tomar <tomer_iisc@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFSv4.1 needs warning when a client tcp connection goes down, if that
connection is being used as a backchannel, so that it can warn the
client that it has lost the backchannel connection.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Unfortunately, spkm3 never got very far; while interoperability with one
other implementation was demonstrated at some point, problems were found
with the spec that were deemed not worth fixing.
The kernel code is useless on its own without nfs-utils patches which
were never merged into nfs-utils, and were only ever available from
citi.umich.edu. They appear not to have been updated since 2005.
Therefore it seems safe to assume that this code has no users, and never
will.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If we set up to wait for a cache item to be filled in, and then find
that it is no longer pending, it could be that some other thread is
in 'cache_revisit_request' and has moved our request to its 'pending' list.
So when our setup_deferral calls cache_revisit_request it will find nothing to
put on the pending list, and do nothing.
We then return from cache_wait_req, thus leaving the 'sleeper'
on-stack structure open to being corrupted by subsequent stack usage.
However that 'sleeper' could still be on the 'pending' list that the
other thread is looking at and so any corruption could cause it to behave badly.
To avoid this race we simply take the same path as if the
'wait_for_completion_interruptible_timeout' was interrupted and if the
sleeper is no longer on the list (which it won't be) we wait on the
completion - which will ensure that any other cache_revisit_request
will have let go of the sleeper.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The context is already known in all the sock_create callers.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The net is known from the xprt_create and this tagging will also
give un the context in the conntection workers where real sockets
are created.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
After this the socket creation in it knows the context.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
ip_dev_find(net, addr) finds a device given an IPv4 source address and
takes a reference on it.
Introduce __ip_dev_find(), taking a third argument, to optionally take
the device reference. Callers not asking the reference to be taken
should be in an rcu_read_lock() protected section.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roger Luethi noticed packets for unknown VLANs getting silently dropped
even in promiscuous mode.
Check for promiscuous mode in __vlan_hwaccel_rx() and vlan_gro_common()
before drops.
As suggested by Patrick, mark such packets to have skb->pkt_type set to
PACKET_OTHERHOST to make sure they are dropped by IP stack.
Reported-by: Roger Luethi <rl@hellgate.ch>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
While doing stress tests with a disabled IP route cache, I found
__mkroute_output() was touching three times in_device atomic refcount.
Use RCU to touch it once to reduce cache line ping pongs.
Before patch
time to perform the test
real 1m42.009s
user 0m12.545s
sys 25m0.726s
Profile :
16109.00 26.4% ip_route_output_slow vmlinux
7434.00 12.2% dst_destroy vmlinux
3280.00 5.4% fib_rules_lookup vmlinux
3252.00 5.3% fib_semantic_match vmlinux
2622.00 4.3% fib_table_lookup vmlinux
2535.00 4.1% dst_alloc vmlinux
1750.00 2.9% _raw_read_lock vmlinux
1532.00 2.5% rt_set_nexthop vmlinux
After patch
real 1m36.503s
user 0m12.977s
sys 23m25.608s
14234.00 22.4% ip_route_output_slow vmlinux
8717.00 13.7% dst_destroy vmlinux
4052.00 6.4% fib_rules_lookup vmlinux
3951.00 6.2% fib_semantic_match vmlinux
3191.00 5.0% dst_alloc vmlinux
1764.00 2.8% fib_table_lookup vmlinux
1692.00 2.7% _raw_read_lock vmlinux
1605.00 2.5% rt_set_nexthop vmlinux
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch restores the below flow control patch submitted by Rémi
Denis-Courmont, which accidentaly got lost due to Pipe controller patch
on Phonet.
commit 1a98214fee
Author: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Date: Mon Aug 30 12:57:03 2010 +0000
Phonet: restore flow control credits when sending fails
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 8cb8e6f168.
That commit introduced a regression with the Bluetooth Profile Tuning
Suite(PTS), Reverting this make sure that L2CAP is in a qualificable
state.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
As we don't have any error control on the Streaming mode, i.e., we don't
need to keep a copy of the skb for later resending we don't need to
call skb_clone() on it.
Then we can go one further here, and dequeue the skb before sending it,
that also means we don't need to look to sk->sk_send_head anymore.
The patch saves memory and time when sending Streaming mode data, so
it is good to mainline.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
When receiving L2CAP negative configuration response with respect
to MTU parameter we modify wrong field. MTU here means proposed
value of MTU that the remote device intends to transmit. So for local
L2CAP socket it is pi->imtu.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Acked-by: Ville Tervo <ville.tervo@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
This fixes a bug which caused the FCS setting to show L2CAP_FCS_CRC16
with L2CAP modes other than ERTM or streaming. At present, this only
affects the FCS value shown with getsockopt() for basic mode.
Signed-off-by: Mat Martineau <mathewm@codeaurora.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
HARD_TX_LOCK no longer protects tunnels from dead loops,
but xmit_recursion percpu counter.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Retrieve the header after doing pskb_may_pull since, pskb_may_pull
could change the buffer structure.
This is based on the comment given by Eric Dumazet on Phonet
Pipe controller patch for a similar problem.
Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is some confusion with rx_queue name after RPS, and net drivers
private rx_queue fields.
I suggest to rename "struct net_device"->rx_queue to ingress_queue.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.
Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.
This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPIP tunnels can benefit from lockless xmits, using NETIF_F_LLTX
Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one ipip tunnel (size:200 bytes per frame)
Before patch :
real 2m53.321s
user 0m10.277s
sys 46m0.597s
After patch:
real 0m32.063s
user 0m9.237s
sys 8m16.255s
Last problem to solve is the contention on dst :
16118.00 28.3% __ip_route_output_key vmlinux
6135.00 10.8% dst_release vmlinux
3220.00 5.6% ip_finish_output vmlinux
2149.00 3.8% ip_route_output_flow vmlinux
1575.00 2.8% ip_append_data vmlinux
1481.00 2.6% ip_push_pending_frames vmlinux
1349.00 2.4% __xfrm_lookup vmlinux
1216.00 2.1% csum_partial_copy_generic vmlinux
1208.00 2.1% udp_sendmsg vmlinux
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX
Note: If tunnels are created with the "oseq" option, LLTX is not
enabled :
Even using an atomic_t o_seq, we would increase chance for packets being
out of order at receiver.
Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one gre tunnel (size:200 bytes per frame)
Before patch :
real 3m0.094s
user 0m9.365s
sys 47m50.103s
After patch:
real 0m29.756s
user 0m11.097s
sys 7m33.012s
Last problem to solve is the contention on dst :
38660.00 21.4% __ip_route_output_key vmlinux
20786.00 11.5% dst_release vmlinux
14191.00 7.8% __xfrm_lookup vmlinux
12410.00 6.9% ip_finish_output vmlinux
4540.00 2.5% ip_push_pending_frames vmlinux
4427.00 2.4% ip_append_data vmlinux
4265.00 2.4% __alloc_skb vmlinux
4140.00 2.3% __ip_local_out vmlinux
3991.00 2.2% dev_queue_xmit vmlinux
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 15fc1f7056 (sit: percpu stats accounting) forgot the fallback
tunnel case (sit0), and can crash pretty fast.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 3c97af99a5 (ipip: percpu stats accounting) forgot the fallback
tunnel case (tunl0), and can crash pretty fast.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As tunnel devices are going to be lockless, we need to make sure a
misconfigured machine wont enter an infinite loop.
Add a percpu variable, and limit to three the number of stacked xmits.
Reported-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On Wed, 29 Sep 2010 14:02:38 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> After merging the final tree, today's linux-next build (powerpc
> ppc44x_defconfig) produced tis warning:
>
> WARNING: net/sunrpc/sunrpc.o(.init.text+0x110): Section mismatch in reference from the function init_sunrpc() to the function .exit.text:rpcauth_remove_module()
> The function __init init_sunrpc() references
> a function __exit rpcauth_remove_module().
> This is often seen when error handling in the init function
> uses functionality in the exit path.
> The fix is often to remove the __exit annotation of
> rpcauth_remove_module() so it may be used outside an exit section.
>
> Probably caused by commit 2f72c9b737
> ("sunrpc: The per-net skeleton").
This actually causes a build failure on a sparc32 defconfig build:
`rpcauth_remove_module' referenced in section `.init.text' of net/built-in.o: defined in discarded section `.exit.text' of net/built-in.o
I applied the following patch for today:
Fixes:
`rpcauth_remove_module' referenced in section `.init.text' of net/built-in.o: defined in discarded section `.exit.text' of net/built-in.o
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
AnyIP is the capability to receive packets and establish incoming
connections on IPs we have not explicitly configured on the machine.
An example use case is to configure a machine to accept all incoming
traffic on eth0, and leave the policy of whether traffic for a given IP
should be delivered to the machine up to the load balancer.
Can be setup as follows:
ip -6 rule from all iif eth0 lookup 200
ip -6 route add local default dev lo table 200
(in this case for all IPv6 addresses)
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows a host to be configured to respond to any address in
a specified range as if it were local, without actually needing to
configure the address on an interface. This is done through routing
table configuration. For instance, to configure a host to respond
to any address in 10.1/16 received on eth0 as a local address we can do:
ip rule add from all iif eth0 lookup 200
ip route add local 10.1/16 dev lo proto kernel scope host src 127.0.0.1 table 200
This host is now reachable by any 10.1/16 address (route lookup on
input for packets received on eth0 can find the route). On output, the
rule will not be matched so that this host can still send packets to
10.1/16 (not sent on loopback). Presumably, external routing can be
configured to make sense out of this.
To make this work, we needed to modify the logic in finding the
interface which is assigned a given source address for output
(dev_ip_find). We perform a normal fib_lookup instead of just a
lookup on the local table, and in the lookup we ignore the input
interface for matching.
This patch is useful to implement IP-anycast for subnets of virtual
addresses.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The GRE tunnel driver needs to invoke icmpv6 helpers in the
ipv6 stack when ipv6 support is enabled.
Therefore if IPV6 is enabled, we have to enforce that GRE's
enabling (modular or static) matches that of ipv6.
Reported-by: Patrick McHardy <kaber@trash.net>
Reported-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes kernel Bugzilla Bug 18952
This patch adds a syn_set parameter to the retransmits_timed_out()
routine and updates its callers. If not set, TCP_RTO_MIN is taken
as the calculation basis as before. If set, TCP_TIMEOUT_INIT is
used instead, so that sysctl_syn_retries represents the actual
amount of SYN retransmissions in case no SYNACKs are received when
establishing a new connection.
Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The WMM parameter configuration function (ieee80211_sta_wmm_params) only
configures the WMM parameters to the driver is the wmm_last_param_set
counter value is changed by the AP.
The wmm_last_param_set is initialized to -1 on association in order to ensure
the configuration is made to the driver at least once on association, but
currently this initialization is done *after* the WMM parameter configuration
function was called.
This leads to unreliability in the driver getting properly configured on first
association (depending on what counter value the AP happens to use.) When
disassociating (the wmm default parameters are configured to the driver) and
then reassociating, due to the above the WMM configuration is not set to the
driver at all.
On drivers without beacon filtering the problem is corrected by later beacons,
but on drivers with beacon filtering the WMM will remain permanently incorrectly
configured.
Fix this by moving the initialization of wmm_last_param_set to -1 before
ieee80211_sta_wmm_params is called on association.
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
IBSS started from wireless extensions is currently
missing basic rate configuration, fix this by moving
the code to generate the default to the common code
that gets invoked for both nl80211 and wext.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Association is dealt with as an atomic offchannel operation,
we do this because we don't know we are associated until we
get the associatin response from the AP. When we do get the
associatin response though we were never clearing the offchannel
state. This has a few implications, we told drivers we were
still offchannel, and the first configured TX power for the
channel does not take into account any power constraints.
For ath9k this meant ANI calibration would not start upon
association, and we'd have to wait until the first bgscan
to be triggered. There may be other issues this resolves
but I'm too lazy to comb the code to check.
Cc: stable@kernel.org
Cc: Amod Bodas <amod.bodas@atheros.com>
Cc: Vasanth Thiagarajan <vasanth.thiagarajan@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds the basic infrastructure to support user-space
expectation helpers via ctnetlink and the netfilter queuing
infrastructure NFQUEUE. Basically, this patch:
* adds NF_CT_EXPECT_USERSPACE flag to identify user-space
created expectations. I have also added a sanity check in
__nf_ct_expect_check() to avoid that kernel-space helpers
may create an expectation if the master conntrack has no
helper assigned.
* adds some branches to check if the master conntrack helper
exists, otherwise we skip the code that refers to kernel-space
helper such as the local expectation list and the expectation
policy.
* allows to set the timeout for user-space expectations with
no helper assigned.
* a list of expectations created from user-space that depends
on ctnetlink (if this module is removed, they are deleted).
* includes USERSPACE in the /proc output for expectations
that have been created by a user-space helper.
This patch also modifies ctnetlink to skip including the helper
name in the Netlink messages if no kernel-space helper is set
(since no user-space expectation has not kernel-space kernel
assigned).
You can access an example user-space FTP conntrack helper at:
http://people.netfilter.org/pablo/userspace-conntrack-helpers/nf-ftp-helper-userspace-POC.tar.bz
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
tcp: Fix >4GB writes on 64-bit.
net/9p: Mount only matching virtio channels
de2104x: fix ethtool
tproxy: check for transparent flag in ip_route_newports
ipv6: add IPv6 to neighbour table overflow warning
tcp: fix TSO FACK loss marking in tcp_mark_head_lost
3c59x: fix regression from patch "Add ethtool WOL support"
ipv6: add a missing unregister_pernet_subsys call
s390: use free_netdev(netdev) instead of kfree()
sgiseeq: use free_netdev(netdev) instead of kfree()
rionet: use free_netdev(netdev) instead of kfree()
ibm_newemac: use free_netdev(netdev) instead of kfree()
smsc911x: Add MODULE_ALIAS()
net: reset skb queue mapping when rx'ing over tunnel
br2684: fix scheduling while atomic
de2104x: fix TP link detection
de2104x: fix power management
de2104x: disable autonegotiation on broken hardware
net: fix a lockdep splat
e1000e: 82579 do not gate auto config of PHY by hardware during nominal use
...
This covers RX if necessary, as well as TX.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For RPS, we create a kobject for each RX queue based on the number of
queues passed to alloc_netdev_mq(). However, drivers generally do not
determine the numbers of hardware queues to use until much later, so
this usually represents the maximum number the driver may use and not
the actual number in use.
For TX queues, drivers can update the actual number using
netif_set_real_num_tx_queues(). Add a corresponding function for RX
queues, netif_set_real_num_rx_queues().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_attach_filter() and sk_detach_filter() are run with socket locked.
Use the appropriate rcu_dereference_protected() instead of blocking BH,
and rcu_dereference_bh().
There is no point adding BH prevention and memory barrier.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems we dont use appropriate refcount increment in an
rcu_read_lock() protected section.
fib_rule_get() might increment a null refcount and bad things could
happen.
While fib_nl_delrule() respects an rcu grace period before calling
fib_rule_put(), fib_rules_cleanup_ops() calls fib_rule_put() without a
grace period.
Note : after this patch, we might avoid the synchronize_rcu() call done
in fib_nl_delrule()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.
Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.
This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.
Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.
This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Le lundi 27 septembre 2010 à 14:29 +0100, Ben Hutchings a écrit :
> > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> > index 5d6ddcb..de39b22 100644
> > --- a/net/ipv4/ip_gre.c
> > +++ b/net/ipv4/ip_gre.c
> [...]
> > @@ -377,7 +405,7 @@ static struct ip_tunnel *ipgre_tunnel_locate(struct net *net,
> > if (parms->name[0])
> > strlcpy(name, parms->name, IFNAMSIZ);
> > else
> > - sprintf(name, "gre%%d");
> > + strcpy(name, "gre%d");
> >
> > dev = alloc_netdev(sizeof(*t), name, ipgre_tunnel_setup);
> > if (!dev)
> [...]
>
> This is a valid fix, but doesn't belong in this patch!
>
Sorry ? It was not a fix, but at most a cleanup ;)
Anyway I forgot the gretap case...
[PATCH 2/4 v2] ip_gre: percpu stats accounting
Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.
Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.
This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phonet stack assumes the presence of Pipe Controller, either in Modem or
on Application Processing Engine user-space for the Pipe data.
Nokia Slim Modems like WG2.5 used in ST-Ericsson U8500 platform do not
implement Pipe controller in them.
This patch adds Pipe Controller implemenation to Phonet stack to support
Pipe data over Phonet stack for Nokia Slim Modems.
Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes kernel bugzilla #16603
tcp_sendmsg() truncates iov_len to an 'int' which a 4GB write to write
zero bytes, for example.
There is also the problem higher up of how verify_iovec() works. It
wants to prevent the total length from looking like an error return
value.
However it does this using 'int', but syscalls return 'long' (and
thus signed 64-bit on 64-bit machines). So it could trigger
false-positives on 64-bit as written. So fix it to use 'long'.
Reported-by: Olaf Bonorden <bono@onlinehome.de>
Reported-by: Daniel Büse <dbuese@gmx.de>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
p9_virtio_create will only compare the the channel's tag characters
against the device name till the end of the channel's tag but not till
the end of the device name. This means that if a user defines channels
with the tags foo and foobar then he would mount foo when he requested
foonot and may mount foo when he requested foobar.
Thus it is necessary to check both string lengths against each other in
case of a successful partial string match.
Signed-off-by: Sven Eckelmann <sven.eckelmann@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv4 and IPv6 have separate neighbour tables, so
the warning messages should be distinguishable.
[ Add a suitable message prefix on the ipv4 side as well -DaveM ]
Signed-off-by: Ulrich Weber <uweber@astaro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When TCP uses FACK algorithm to mark lost packets in
tcp_mark_head_lost(), if the number of packets in the (TSO) skb is
greater than the number of packets that should be marked lost, TCP
incorrectly exits the loop and marks no packets lost in the skb. This
underestimates tp->lost_out and affects the recovery/retransmission.
This patch fargments the skb and marks the correct amount of packets
lost.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 8c0c709eea
Author: Johannes Berg <johannes@sipsolutions.net>
Date: Wed Nov 25 17:46:15 2009 +0100
mac80211: move cmntr flag out of rx flags
moved the CMNTR flag into the skb RX flags for
some aggregation cleanups, but this was wrong
since the optimisation this flag tried to make
requires that it is kept across the processing
of multiple interfaces -- which isn't true for
flags in the skb. The patch not only broke the
optimisation, it also introduced a bug: under
some (common!) circumstances the flag will be
set on an already freed skb!
However, investigating this in more detail, I
found that most of the flags that we set should
be per packet, _except_ for this one, due to
a-MPDU processing. Additionally, the flags used
for processing (currently just this one) need
to be reset before processing a new packet.
Since we haven't actually seen bugs reported as
a result of the wrong flags handling (which is
not too surprising -- the only real bug case I
can come up with is an a-MSDU contained in an
a-MPDU), I'll make a different fix for rc.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Even if the reorder timeout timer fires while
scanning, the frames weren't received during
scanning and therefore shouldn't be dropped.
To implement this, changes to the passive scan
RX handler simplify understanding it, because
it currently checks HW_SCANNING independently
of a packet's in-scan receive status (which
doesn't make a big difference, since scan_rx()
will only pick up probe responses and beacons,
which can't be aggregated.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If a station was found, then we'll have exited
the function already, so it is not necessary to
have a variable keeping track of it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There are now four instances of vaguely the same
code that does packet preparation, checking for
MMIC errors and reporting them, and then invoking
packet processing. Consolidate all of these.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The first argument to prepare_for_handlers is always
the sdata that can just be stored in rx data directly
(and even already is, in two of four code paths.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This reverts commit cd87a2d3a3.
Author reports it conflicts with proper fixes, applied hereafter.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If interface does not existk, when nl80211_set_power_save is called, (eg.
module has been unloaded) it has been causing kernel panic. Added new
goto target to avoid crash if get_rdev_dev_by_info_ifindex does not
return dev and rdev pointers.
Signed-off-by: Teemu Paasikivi <ext-teemu.3.paasikivi@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When using multiple STA interfaces on the same radio, some
data packets need to be received on all interfaces
(broadcast, for instance).
Make the STA loop look similar to the mgt-data loop.
Also, add logic to check RX_FLAG_MMIC_ERROR for last
interface in mgt-data loop.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The old ieee80211_find_sta_by_hw method didn't properly
find VIFS when there was more than one per AP. This caused
AMPDU logic in ath9k to get the wrong VIF when trying to
account for transmitted SKBs.
This patch changes ieee80211_find_sta_by_hw to take a
localaddr argument to distinguish between VIFs with the
same AP but different local addresses. The method name
is changed to ieee80211_find_sta_by_ifaddr.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Create 'stations' sub-directory under each netdev:[vif-name]
directory to hold all stations for that network device.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Everything that is required for that already exists:
* the per-net cache registration with respective proc entries
* the context (struct net) is available in all the users
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Register empty per-net operations for the sunrpc layer.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The transport representation should be per-net of course.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Existing calls do the same, but for the init_net.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
There are two calls that operate on ip_map_cache and are
directly called from the nfsd code. Other places will be
handled in a different way.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
They do not require the rqst actually and having the xprt simplifies
further patching.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This is done in order to facilitate getting the ip_map_cache from
which to put the ip_map.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The target is to have many ip_map_cache-s in the system. This particular
patch handles its usage by the ip_map_parse callback.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Return -ENOMEM when erroring on kmalloc and fix memory leaks when returning on error.
Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Clean up a missing exit path in the ipv6 module init routines. In
addrconf_init we call ipv6_addr_label_init which calls register_pernet_subsys
for the ipv6_addr_label_ops structure. But if module loading fails, or if the
ipv6 module is removed, there is no corresponding unregister_pernet_subsys call,
which leaves a now-bogus address on the pernet_list, leading to oopses in
subsequent registrations. This patch cleans up both the failed load path and
the unload path. Tested by myself with good results.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
include/net/addrconf.h | 1 +
net/ipv6/addrconf.c | 11 ++++++++---
net/ipv6/addrlabel.c | 5 +++++
3 files changed, 14 insertions(+), 3 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
__in_dev_get_rtnl(dev_out) is called while RTNL is not held, thus
triggers a lockdep fault.
At this point, we only perform a raw test of dev_out->ip_ptr being NULL,
we dont need to make sure ip_ptr cant changed right after.
We can use rcu_dereference_raw() for this.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of having two places were we allocate dev->_rx, introduce
netif_alloc_rx_queues() helper and call it only from
register_netdevice(), not from alloc_netdev_mq()
Goal is to let drivers change dev->num_rx_queues after allocating netdev
and before registering it.
This also removes a lot of ifdefs in net/core/dev.c
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
You can't call atomic_notifier_chain_unregister() while in atomic context.
Fix, call un/register_atmdevice_notifier in module __init and __exit.
Bug report:
http://comments.gmane.org/gmane.linux.network/172603
Reported-by: Mikko Vinni <mmvinni@yahoo.com>
Tested-by: Mikko Vinni <mmvinni@yahoo.com>
Signed-off-by: Karl Hiramoto <karl@hiramoto.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Automatically allows vlans to get NETIF_F_HIGHDMA if underlying device
supports it.
On 32bit arches (and more precisely if CONFIG_HIGHMEM is enabled), it
can help to reduce cost of illegal_highdma() and __skb_linearize()
calls.
Tested on tg3 , bnx2, bonding, this worked very well.
This is a generalization of a patch provided by Yi Zou & Jeff Kirsher.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have for each socket :
One spinlock (sk_slock.slock)
One rwlock (sk_callback_lock)
Possible scenarios are :
(A) (this is used in net/sunrpc/xprtsock.c)
read_lock(&sk->sk_callback_lock) (without blocking BH)
<BH>
spin_lock(&sk->sk_slock.slock);
...
read_lock(&sk->sk_callback_lock);
...
(B)
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)
(C)
spin_lock_bh(&sk->sk_slock)
...
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)
spin_unlock_bh(&sk->sk_slock)
This (C) case conflicts with (A) :
CPU1 [A] CPU2 [C]
read_lock(callback_lock)
<BH> spin_lock_bh(slock)
<wait to spin_lock(slock)>
<wait to write_lock_bh(callback_lock)>
We have one problematic (C) use case in inet_csk_listen_stop() :
local_bh_disable();
bh_lock_sock(child); // spin_lock_bh(&sk->sk_slock)
WARN_ON(sock_owned_by_user(child));
...
sock_orphan(child); // write_lock_bh(&sk->sk_callback_lock)
lockdep is not happy with this, as reported by Tetsuo Handa
It seems only way to deal with this is to use read_lock_bh(callbacklock)
everywhere.
Thanks to Jarek for pointing a bug in my first attempt and suggesting
this solution.
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While investigating a bit, I found ip_fragment() slow path was taken
because ip_append_data() provides following layout for a send(MTU +
N*(MTU - 20)) syscall :
- one skb with 1500 (mtu) bytes
- N fragments of 1480 (mtu-20) bytes (before adding IP header)
last fragment gets 17 bytes of trail data because of following bit:
if (datalen == length + fraggap)
alloclen += rt->dst.trailer_len;
Then esp4 adds 16 bytes of data (while trailer_len is 17... hmm...
another bug ?)
In ip_fragment(), we notice last fragment is too big (1496 + 20) > mtu,
so we take slow path, building another skb chain.
In order to avoid taking slow path, we should correct ip_append_data()
to make sure last fragment has real trail space, under mtu...
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes stale mac80211_tx_control_flags for
filtered / retried frames.
Because ieee80211_handle_filtered_frame feeds skbs back
into the tx path, they have to be stripped of some tx
flags so they won't confuse the stack, driver or device.
Cc: <stable@kernel.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
IEEE Std 802.11k-2008 added DS Parameter Set information element into
Probe Request frames as an optional information on 2.4 GHz band (and
mandatory, if radio measurements are enabled). This allows APs to
filter out Probe Request frames that may be received from neighboring
overlapping channels and by doing so, reduce the number of unnecessary
frames in the air. Make mac80211 add this IE into Probe Request frames
whenever the channel is known (i.e., whenever hwscan is not used).
Signed-off-by: Jouni Malinen <j@w1.fi>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If the TX rate set has been masked, the removed rates can also be
removed from the Supported Rates and Extended Supported Rates IEs in
Probe Request frames.
Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
commit 8c0c709eea
Author: Johannes Berg <johannes@sipsolutions.net>
Date: Wed Nov 25 17:46:15 2009 +0100
mac80211: move cmntr flag out of rx flags
moved the CMTR flag into the skb's status, and
in doing so introduced a use-after-free -- when
the skb has been handed to cooked monitors the
status setting will touch now invalid memory.
Additionally, moving it there has effectively
discarded the optimisation -- since the bit is
only ever set on freed SKBs, and those were a
copy, it could never be checked.
For the current release, fixing this properly
is a bit too involved, so let's just remove the
problematic code and leave userspace with one
copy of each frame for each virtual interface.
Cc: stable@kernel.org [2.6.33+]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Change "return (EXPR);" to "return EXPR;"
return is not a function, parentheses are not required.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
otherwise ECT(1) bit will get interpreted as RTO_ONLINK
and routing will fail with XfrmOutBundleGenError.
Signed-off-by: Ulrich Weber <uweber@astaro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The x25_datagram_poll didn't add anything, removed it.
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
we need to check proper socket type within ipv4_conntrack_defrag
function before referencing the nodefrag flag.
For example the tun driver receive path produces skbs with
AF_UNSPEC socket type, and so current code is causing unwanted
fragmented packets going out.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix checksum calculation in nf_nat_snmp_basic.
Based on patches by Clark Wang <wtweeker@163.com> and
Stephen Hemminger <shemminger@vyatta.com>.
https://bugzilla.kernel.org/show_bug.cgi?id=17622
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
As soon as rcu_read_unlock() is called, there is no guarantee current
thread can safely derefence t pointer, rcu protected.
Fix is to copy t->alloc_size in a temporary variable.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_route_me_harder can't create the route cache when the outdev is the same
with the indev for the skbs whichout a valid protocol set.
__mkroute_input functions has this check:
1998 if (skb->protocol != htons(ETH_P_IP)) {
1999 /* Not IP (i.e. ARP). Do not create route, if it is
2000 * invalid for proxy arp. DNAT routes are always valid.
2001 *
2002 * Proxy arp feature have been extended to allow, ARP
2003 * replies back to the same interface, to support
2004 * Private VLAN switch technologies. See arp.c.
2005 */
2006 if (out_dev == in_dev &&
2007 IN_DEV_PROXY_ARP_PVLAN(in_dev) == 0) {
2008 err = -EINVAL;
2009 goto cleanup;
2010 }
2011 }
This patch gives the new skb a valid protocol to bypass this check. In order
to make ipt_REJECT work with bridges, you also need to enable ip_forward.
This patch also fixes a regression. When we used skb_copy_expand(), we
didn't have this issue stated above, as the protocol was properly set.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I initially noticed this because of the compiler warning below, but it
does seem to be a valid concern in the case where ct_sip_get_header()
returns 0 in the first iteration of the while loop.
net/netfilter/nf_conntrack_sip.c: In function 'sip_help_tcp':
net/netfilter/nf_conntrack_sip.c:1379: warning: 'ret' may be used uninitialized in this function
Signed-off-by: Simon Horman <horms@verge.net.au>
[Patrick: changed NF_DROP to NF_ACCEPT]
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
transparent field of a socket is either inet_twsk(sk)->tw_transparent
for timewait sockets, or inet_sk(sk)->transparent for other sockets
(TCP/UDP).
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 6610f720e9
broke cache_clean_deferred as entries are no longer added to the
pending list for subsequent revisiting.
So put those requests back on the pending list.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
With this patch, you can specify the expectation flags for user-space
created expectations.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch adds the missing validation of the CTA_EXPECT_ZONE
attribute in the ctnetlink code.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch improves the situation in which the expectation table is
full for conntrack NAT helpers. Basically, we give up if we don't
find a place in the table instead of looping over nf_ct_expect_related()
with a different port (we should only do this if it returns -EBUSY, for
-EMFILE or -ESHUTDOWN I think that it's better to skip this).
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Being a hash table, hlist is the best option.
There is currently some ugliness were we treat "->next == NULL" as
a special case to avoid having to initialise the whole array.
This change nicely gets rid of that case.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
CAIF sockets should use socket's default send and receive buffers sizes.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check that receive function pointer is not null before calling it.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use pr_debug for flow control printouts, and refine an error printout.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove debugging quirk redefining pr_debug to pr_warning.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Code like:
switch(xxx) {
case -error1:
case -error2:
..
return;
case 0:
stuff;
}
can more naturally be written:
if (xxx < 0)
return;
stuff;
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
net/core/ethtool.c: In function 'ethtool_get_regs':
net/core/ethtool.c:818:2: error: implicit declaration of function 'vmalloc'
net/core/ethtool.c:818:9: warning: assignment makes pointer from integer without a cast
net/core/ethtool.c:833:2: error: implicit declaration of function 'vfree'
Signed-off-by: David S. Miller <davem@davemloft.net>
Special care should be taken when slow path is hit in ip_fragment() :
When walking through frags, we transfert truesize ownership from skb to
frags. Then if we hit a slow_path condition, we must undo this or risk
uncharging frags->truesize twice, and in the end, having negative socket
sk_wmem_alloc counter, or even freeing socket sooner than expected.
Many thanks to Nick Bowler, who provided a very clean bug report and
test program.
Thanks to Jarek for reviewing my first patch and providing a V2
While Nick bisection pointed to commit 2b85a34e91 (net: No more
expensive sock_hold()/sock_put() on each tx), underlying bug is older
(2.6.12-rc5)
A side effect is to extend work done in commit b2722b1c3a
(ip_fragment: also adjust skb->truesize for packets not owned by a
socket) to ipv6 as well.
Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com>
Tested-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some NICs have huge register files which exceed the maximum heap
allocation size.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we drop a request in the sunrpc layer, either due kmalloc failure,
or due to a cache miss when we could not queue the request for later
replay, then close the connection to encourage the client to retry sooner.
Note that if the drop happens in the NFS layer, NFSERR_JUKEBOX
(aka NFS4ERR_DELAY) is returned to guide the client concerning
replay.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Clean up: Introduce a helper to '\0'-terminate XDR strings
that are placed in a page in the page cache.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Change the usage of svc usecnt during command execution:
- we check if svc is registered but we do not need to hold usecnt
reference while under __ip_vs_mutex, only the packet handling needs
it during scheduling
- change __ip_vs_service_get to __ip_vs_service_find and
__ip_vs_svc_fwm_get to __ip_vs_svc_fwm_find because now caller
will increase svc->usecnt
- put common code that calls update_service in __ip_vs_update_dest
- put common code in ip_vs_unlink_service() and use it to unregister
the service
- add comment that svc should not be accessed after ip_vs_del_service
anymore
- all IP_VS_WAIT_WHILE calls are now unified: usecnt > 0
- Properly log the app ports
As result, some problems are fixed:
- possible use-after-free of svc in ip_vs_genl_set_cmd after
ip_vs_del_service because our usecnt reference does not guarantee that
svc is not freed on refcnt==0, eg. when no dests are moved to trash
- possible usecnt leak in do_ip_vs_set_ctl after ip_vs_del_service
when the service is not freed now, for example, when some
destionations are moved into trash and svc->refcnt remains above 0.
It is harmless because svc is not in hash anymore.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Since we don't change the tuple in the original direction, we can save it
in ct->tuplehash[IP_CT_DIR_REPLY].hnode.pprev for __nf_conntrack_confirm()
use.
__hash_conntrack() is split into two steps: hash_conntrack_raw() is used
to get the raw hash, and __hash_bucket() is used to get the bucket id.
In SYN-flood case, early_drop() doesn't need to recompute the hash again.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Add new sysctl flag "snat_reroute". Recent kernels use
ip_route_me_harder() to route LVS-NAT responses properly by
VIP when there are multiple paths to client. But setups
that do not have alternative default routes can skip this
routing lookup by using snat_reroute=0.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Add more code to IPVS to work with Netfilter connection
tracking and fix some problems.
- Allow IPVS to be compiled without connection tracking as in
2.6.35 and before. This can avoid keeping conntracks for all
IPVS connections because this costs memory. ip_vs_ftp still
depends on connection tracking and NAT as implemented for 2.6.36.
- Add sysctl var "conntrack" to enable connection tracking for
all IPVS connections. For loaded IPVS directors it needs
tuning of nf_conntrack_max limit.
- Add IP_VS_CONN_F_NFCT connection flag to request the connection
to use connection tracking. This allows user space to provide this
flag, for example, in dest->conn_flags. This can be useful to
request connection tracking per real server instead of forcing it
for all connections with the "conntrack" sysctl. This flag is
set currently only by ip_vs_ftp and of course by "conntrack" sysctl.
- Add ip_vs_nfct.c file to hold all connection tracking code,
by this way main code should not depend of netfilter conntrack
support.
- Return back the ip_vs_post_routing handler as in 2.6.35 and use
skb->ipvs_property=1 to allow IPVS to work without connection
tracking
Connection tracking:
- most of the code is already in 2.6.36-rc
- alter conntrack reply tuple for LVS-NAT connections when first packet
from client is forwarded and conntrack state is NEW or RELATED.
Additionally, alter reply for RELATED connections from real server,
again for packet in original direction.
- add IP_VS_XMIT_TUNNEL to confirm conntrack (without altering
reply) for LVS-TUN early because we want to call nf_reset. It is
needed because we add IPIP header and the original conntrack
should be preserved, not destroyed. The transmitted IPIP packets
can reuse same conntrack, so we do not set skb->ipvs_property.
- try to destroy conntrack when the IPVS connection is destroyed.
It is not fatal if conntrack disappears before that, it depends
on the used timers.
Fix problems from long time:
- add skb->ip_summed = CHECKSUM_NONE for the LVS-TUN transmitters
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The `options_received' struct is redundant, since it re-duplicates the existing
`p' and `x_recv' fields. This patch removes the sub-struct and migrates the
format conversion operations to ccid3_hc_tx_parse_options().
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This adds a function to take care of the following, separate cases occurring in
the computation of the Loss Rate p:
* 1/(2^32-1) is mapped into 0% as per RFC 4342, 8.5;
* 1/0 is mapped into 100%, the maximum;
* to avoid that p = 1/x is rounded down to 0 when x is very large, since this
means accidentally re-entering slow-start indicated by p == 0, the minimum
resolution value of p is now returned instead;
* a bug in ccid3_hc_rx_getsockopt is fixed: 1/0 was mapped into ~0U.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This patch is thanks to an investigation by Leandro Sales de Melo and his
colleagues. They worked out two state diagrams which highlight the fact that
the xxx_TERM states in CCID-3/4 are in fact not necessary.
And this can be confirmed by in turn looking at the code: the xxx_TERM states
are only ever set in ccid3_hc_{rx,tx}_exit(): when CCID-3 sets the state
to xxx_TERM, it is at a time where no more processing should be going on,
hence it is not necessary to introduce a dedicated exit state - this is already
implied by unloading the CCID.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
The constants DCCPO_{MIN,MAX}_CCID_SPECIFIC are nowhere used in the code, but
instead for the CCID-specific options numbers are used.
This patch unifies the use of CCID-specific option numbers, by adding symbolic
names reflecting the definitions in RFC 4340, 10.3.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
This
1. adds packet type information to ccid_hc_{rx,tx}_parse_options(). This is
necessary, since table 3 in RFC 4340, 5.8 leaves it to the CCIDs to state
which options may (not) appear on what packet type.
2. adds such a check for CCID-3's {Loss Event, Receive} Rate as specified in
RFC 4340 8.3 ("Receive Rate options MUST NOT be sent on DCCP-Data packets")
and 8.5 ("Loss Event Rate options MUST NOT be sent on DCCP-Data packets").
3. removes an unused argument `idx' from ccid_hc_{rx,tx}_parse_options(). This
is also no longer necessary, since the CCID-specific option-parsing routines
are passed every single parameter of the type-length-value option encoding.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
If a RST comes in immediately after checking sk->sk_err, tcp_poll will
return POLLIN but not POLLOUT. Fix this by checking sk->sk_err at the end
of tcp_poll. Additionally, ensure the correct order of operations on SMP
machines with memory barriers.
Signed-off-by: Tom Marshall <tdm.code@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just use explicit casts, since we really can't change the
types of structures exported to userspace which have been
around for 15 years or so.
Reported-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The family parameter xfrm_state_find is used to find a state matching a
certain policy. This value is set to the template's family
(encap_family) right before xfrm_state_find is called.
The family parameter is however also used to construct a temporary state
in xfrm_state_find itself which is wrong for inter-family scenarios
because it produces a selector for the wrong family. Since this selector
is included in the xfrm_user_acquire structure, user space programs
misinterpret IPv6 addresses as IPv4 and vice versa.
This patch splits up the original init_tempsel function into a part that
initializes the selector respectively the props and id of the temporary
state, to allow for differing ip address families whithin the state.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a driver doesn't fill the entire buffer, old
heap contents may remain, and if it also doesn't
update the length properly, this old heap content
will be copied back to userspace.
It is very unlikely that this happens in any of
the drivers using private ioctls since it would
show up as junk being reported by iwpriv, but it
seems better to be safe here, so use kzalloc.
Reported-by: Jeff Mahoney <jeffm@suse.com>
Cc: stable@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Under load, netif_rx() can drop incoming packets but administrators dont
have a chance to spot which device needs some tuning (RPS activation for
example)
This patch adds rx_dropped accounting in vlans and tunnels.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6 can be a module, we should test CONFIG_IPV6 and CONFIG_IPV6_MODULE
to enable ipv6 bits in ip_gre.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>