On a allyesconfig'ured kernel:
Size Uses Wasted Name and definition
===== ==== ====== ================================================
95 162 12075 netif_wake_queue include/linux/netdevice.h
129 86 9265 dev_kfree_skb_any include/linux/netdevice.h
127 56 5885 netif_device_attach include/linux/netdevice.h
73 86 4505 dev_kfree_skb_irq include/linux/netdevice.h
46 60 1534 netif_device_detach include/linux/netdevice.h
119 16 1485 __netif_rx_schedule include/linux/netdevice.h
143 5 492 netif_rx_schedule include/linux/netdevice.h
81 7 366 netif_schedule include/linux/netdevice.h
netif_wake_queue is big because __netif_schedule is a big inline:
static inline void __netif_schedule(struct net_device *dev)
{
if (!test_and_set_bit(__LINK_STATE_SCHED, &dev->state)) {
unsigned long flags;
struct softnet_data *sd;
local_irq_save(flags);
sd = &__get_cpu_var(softnet_data);
dev->next_sched = sd->output_queue;
sd->output_queue = dev;
raise_softirq_irqoff(NET_TX_SOFTIRQ);
local_irq_restore(flags);
}
}
static inline void netif_wake_queue(struct net_device *dev)
{
#ifdef CONFIG_NETPOLL_TRAP
if (netpoll_trap())
return;
#endif
if (test_and_clear_bit(__LINK_STATE_XOFF, &dev->state))
__netif_schedule(dev);
}
By de-inlining __netif_schedule we are saving a lot of text
at each callsite of netif_wake_queue and netif_schedule.
__netif_rx_schedule is also big, and it makes more sense to keep
both of them out of line.
Patch also deinlines dev_kfree_skb_any. We can deinline dev_kfree_skb_irq
instead... oh well.
netif_device_attach/detach are not hot paths, we can deinline them too.
Signed-off-by: Denis Vlasenko <vda@ilport.com.ua>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Randy Dunlap <rdunlap@xenotime.net>
Use NULL instead of 0 for pointers.
Fix these sparse warnings:
net/dccp/feat.c:207:20: warning: Using plain integer as NULL pointer
net/dccp/feat.c:325:21: warning: Using plain integer as NULL pointer
net/dccp/feat.c:526:20: warning: Using plain integer as NULL pointer
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Patrick Caulfield <patrick@tykepenguin.com>
This patch fixes a bug in the reference counting for the default
DECnet device.
If the device is changed, then the new device had its refcount
decremented rather than the old one!
Signed-off-by: David S. Miller <davem@davemloft.net>
Every netfilter module uses `init' for its module_init() function and
`fini' or `cleanup' for its module_exit() function.
Problem is, this creates uninformative initcall_debug output and makes
ctags rather useless.
So go through and rename them all to $(filename)_init and
$(filename)_fini.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Basically this patch moves the generic tunnel protocol stuff out of
xfrm4_tunnel/xfrm6_tunnel and moves it into the new files of tunnel4.c
and tunnel6 respectively.
The reason for this is that the problem that Hugo uncovered is only
the tip of the iceberg. The real problem is that when we removed the
dependency of ipip on xfrm4_tunnel we didn't really consider the module
case at all.
For instance, as it is it's possible to build both ipip and xfrm4_tunnel
as modules and if the latter is loaded then ipip simply won't load.
After considering the alternatives I've decided that the best way out of
this is to restore the dependency of ipip on the non-xfrm-specific part
of xfrm4_tunnel. This is acceptable IMHO because the intention of the
removal was really to be able to use ipip without the xfrm subsystem.
This is still preserved by this patch.
So now both ipip/xfrm4_tunnel depend on the new tunnel4.c which handles
the arbitration between the two. The order of processing is determined
by a simple integer which ensures that ipip gets processed before
xfrm4_tunnel.
The situation for ICMP handling is a little bit more complicated since
we may not have enough information to determine who it's for. It's not
a big deal at the moment since the xfrm ICMP handlers are basically
no-ops. In future we can deal with this when we look at ICMP caching
in general.
The user-visible change to this is the removal of the TUNNEL Kconfig
prompts. This makes sense because it can only be used through IPCOMP
as it stands.
The addition of the new modules shouldn't introduce any problems since
module dependency will cause them to be loaded.
Oh and I also turned some unnecessary pskb's in IPv6 related to this
patch to skb's.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix kernel oopses whenever somebody issues compatible ioctl on AppleTalk,
Econet, IPX or IRDA socket. For AppleTalk/Econet/IRDA it restores state
in which these sockets were before compat_ioctl was introduced to the socket
ops, for IPX it implements support for 4 ioctls which were not implemented
before - as these ioctls use structures which match between 32bit and 64bit
userspace, no special code is needed, just call 64bit ioctl handler.
Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a lot of typos. Eyeballed by jmc@ in OpenBSD.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a conversion to make the various file_operations structs in fs/
const. Basically a regexp job, with a few manual fixups
The goal is both to increase correctness (harder to accidentally write to
shared datastructures) and reducing the false sharing of cachelines with
things that get dirty in .data (while .rodata is nicely read only and thus
cache clean)
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mark the f_ops members of inodes as const, as well as fix the
ripple-through this causes by places that copy this f_ops and then "do
stuff" with it.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We don't make much of an attempt to fall back to lower rates, and 54M
just isn't reliable enough for many people. In fact, it's not clear we
even set it to 11M if we're trying to associate with an 802.11b AP.
This patch makes us default to 11M, which ought to work for most people.
When we actually handle dynamic rate adjustment, we can reconsider the
defaults -- but even then, probably it makes as much sense to start at
11M and adjust it upwards as it does to start at 54M and reduce it.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
It currently takes something like 8 seconds to do a scan, because we
spend half a second on each channel. Reduce that time to 20ms per
channel.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The attached patch removes a potential problem from ieee80211_wx.c, by changing the name of routine
ipw2100_translate_scan to ieee80211_translate_scan. The problem is minor as the routine is declared
static; however, if it were made global, it would pollute the namespace.
Signed-Off-By: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[NET]: drop duplicate assignment in request_sock
[IPSEC]: Fix tunnel error handling in ipcomp6
The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
We noticed that notifier chains in the kernel fall into two basic usage
classes:
"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;
"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.
We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.
With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)
There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)
Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.
Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.
ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain
BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain
It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)
The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.
[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We shouldn't really compare &new->h with anything when new ==NULL, and gather
three different if statements that all start
if (rv ...
into one large if.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We can now make some code static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
.. it makes some of the code nicer.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Cache_fresh is now only used in cache.c, so unexport it.
Part of cache_fresh (setting CACHE_VALID) should really be done under the
lock, while part (calling cache_revisit_request etc) must be done outside the
lock. So we split it up appropriately.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- in cache_check, h must be non-NULL as it has been de-referenced,
so don't bother checking for NULL.
- When a cache-item is updated, we need to call cache_revisit_request to see
if there is a pending request waiting for that item. We were using
a transition to CACHE_VALID to see if that was needed, however that is
wrong as an expired entry will still be marked 'valid' (as the data is valid
and will need to be released). So instead use an off transition for
CACHE_PENDING which is exactly the right thing to test.
- Add a little bit more debugging info.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The C++-like 'template' approach proves to be too ugly and hard to work with.
The old 'template' won't go away until all users are updated.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These were an unnecessary wart. Also only have one 'DefineSimpleCache..'
instead of two.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The 'auth_domain's are simply handles on internal data structures. They do
not cache information from user-space, and forcing them into the mold of a
'cache' misrepresents their true nature and causes confusion.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Just noticed that request_sock.[ch] contain a useless assignment of
rskq_accept_head to itself. I assume this is a typo and the 2nd one
was supposed to be _tail. However, setting _tail to NULL is not
needed, so the patch below just drops the 2nd assignment.
Signed-off-By: Norbert Kiesel <nkiesel@tbdnetworks.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The error handling in ipcomp6_tunnel_create is broken in two ways:
1) If we fail to allocate an SPI (this should never happen in practice
since there are plenty of 32-bit SPI values for us to use), we will
still go ahead and create the SA.
2) When xfrm_init_state fails, we first of all may trigger the BUG_TRAP
in __xfrm_state_destroy because we didn't set the state to DEAD. More
importantly we end up returning the freed state as if we succeeded!
This patch fixes them both.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Modify well over a dozen mempool users to call mempool_create_slab_pool()
rather than calling mempool_create() with extra arguments, saving about 30
lines of code and increasing readability.
Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jens Axboe <axboe@suse.de>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Cc: Greg KH <greg@kroah.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (22 commits)
[PATCH] fix audit_init failure path
[PATCH] EXPORT_SYMBOL patch for audit_log, audit_log_start, audit_log_end and audit_format
[PATCH] sem2mutex: audit_netlink_sem
[PATCH] simplify audit_free() locking
[PATCH] Fix audit operators
[PATCH] promiscuous mode
[PATCH] Add tty to syscall audit records
[PATCH] add/remove rule update
[PATCH] audit string fields interface + consumer
[PATCH] SE Linux audit events
[PATCH] Minor cosmetic cleanups to the code moved into auditfilter.c
[PATCH] Fix audit record filtering with !CONFIG_AUDITSYSCALL
[PATCH] Fix IA64 success/failure indication in syscall auditing.
[PATCH] Miscellaneous bug and warning fixes
[PATCH] Capture selinux subject/object context information.
[PATCH] Exclude messages by message type
[PATCH] Collect more inode information during syscall processing.
[PATCH] Pass dentry, not just name, in fsnotify creation hooks.
[PATCH] Define new range of userspace messages.
[PATCH] Filter rule comparators
...
Fixed trivial conflict in security/selinux/hooks.c
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (103 commits)
SUNRPC,RPCSEC_GSS: spkm3--fix config dependencies
SUNRPC,RPCSEC_GSS: spkm3: import contexts using NID_cast5_cbc
LOCKD: Make nlmsvc_traverse_shares return void
LOCKD: nlmsvc_traverse_blocks return is unused
SUNRPC,RPCSEC_GSS: fix krb5 sequence numbers.
NFSv4: Dont list system.nfs4_acl for filesystems that don't support it.
SUNRPC,RPCSEC_GSS: remove unnecessary kmalloc of a checksum
SUNRPC: Ensure rpc_call_async() always calls tk_ops->rpc_release()
SUNRPC: Fix memory barriers for req->rq_received
NFS: Fix a race in nfs_sync_inode()
NFS: Clean up nfs_flush_list()
NFS: Fix a race with PG_private and nfs_release_page()
NFSv4: Ensure the callback daemon flushes signals
SUNRPC: Fix a 'Busy inodes' error in rpc_pipefs
NFS, NLM: Allow blocking locks to respect signals
NFS: Make nfs_fhget() return appropriate error values
NFSv4: Fix an oops in nfs4_fill_super
lockd: blocks should hold a reference to the nlm_file
NFSv4: SETCLIENTID_CONFIRM should handle NFS4ERR_DELAY/NFS4ERR_RESOURCE
NFSv4: Send the delegation stateid for SETATTR calls
...
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[NETFILTER] x_table.c: sem2mutex
[IPV4]: Aggregate route entries with different TOS values
[TCP]: Mark tcp_*mem[] __read_mostly.
[TCP]: Set default max buffers from memory pool size
[SCTP]: Fix up sctp_rcv return value
[NET]: Take RTNL when unregistering notifier
[WIRELESS]: Fix config dependencies.
[NET]: Fill in a 32-bit hole in struct sock on 64-bit platforms.
[NET]: Ensure device name passed to SO_BINDTODEVICE is NULL terminated.
[MODULES]: Don't allow statically declared exports
[BRIDGE]: Unaligned accesses in the ethernet bridge
Implement the half-closed devices notifiation, by adding a new POLLRDHUP
(and its alias EPOLLRDHUP) bit to the existing poll/select sets. Since the
existing POLLHUP handling, that does not report correctly half-closed
devices, was feared to be changed, this implementation leaves the current
POLLHUP reporting unchanged and simply add a new bit that is set in the few
places where it makes sense. The same thing was discussed and conceptually
agreed quite some time ago:
http://lkml.org/lkml/2003/7/12/116
Since this new event bit is added to the existing Linux poll infrastruture,
even the existing poll/select system calls will be able to use it. As far
as the existing POLLHUP handling, the patch leaves it as is. The
pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing
archs and sets the bit in the six relevant files. The other attached diff
is the simple change required to sys/epoll.h to add the EPOLLRDHUP
definition.
There is "a stupid program" to test POLLRDHUP delivery here:
http://www.xmailserver.org/pollrdhup-test.c
It tests poll(2), but since the delivery is same epoll(2) will work equally.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
net/rxrpc/main.c: In function `rxrpc_initialise':
net/rxrpc/main.c:83: warning: label `error_proc' defined but not used
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we get an ICMP need-to-frag message, the original TOS value in the
ICMP payload cannot be used as a key to look up the routes to update.
This is because the TOS field may have been modified by routers on the
way. Similarly, ip_rt_redirect should also ignore the TOS as the router
that gave us the message may have modified the TOS value.
The patch achieves this objective by aggregating entries with different
TOS values (but are otherwise identical) into the same bucket. This
makes it easy to update them at the same time when an ICMP message is
received.
In future we should use a twin-hashing scheme where teh aggregation
occurs at the entry level. That is, the TOS goes back into the hash
for normal lookups while ICMP lookups will end up with a node that
gives us a list that contains all other route entries that differ
only by TOS.
Signed-off-by: Ilia Sotnikov <hostcc@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch sets the maximum TCP buffer sizes (available to automatic
buffer tuning, not to setsockopt) based on the TCP memory pool size.
The maximum sndbuf and rcvbuf each will be up to 4 MB, but no more
than 1/128 of the memory pressure threshold.
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
I was working on the ipip/xfrm problem and as usual I get side-tracked by
other problems.
As part of an attempt to change the IPv4 protocol handler calling
convention I found that SCTP violated the existing convention.
It's returning non-zero values after freeing the skb. This is doubly bad
as 1) the skb gets resubmitted; 2) the return value is interpreted as a
protocol number.
This patch changes those return values to zero.
IPv6 doesn't suffer from this problem because it uses a positive return
value as an indication for resubmission. So the only effect of this patch
there is to increment the IPSTATS_MIB_INDELIVERS counter which IMHO is
the right thing to do.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netdev notifier call chain is currently unregistered without taking
any locks outside the notifier system. Because the notifier system itself
does not synchronise unregistration with respect to the calling of the
chain, we as its user need to do our own locking.
We are supposed to take the RTNL for all calls to netdev notifiers, so
taking the RTNL should be sufficient to protect it.
The registration path in dev.c already takes the RTNL so it's OK.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
I see lots of
kernel unaligned access to 0xa0000001009dbb6f, ip=0xa000000100811591
kernel unaligned access to 0xa0000001009dbb6b, ip=0xa0000001008115c1
kernel unaligned access to 0xa0000001009dbb6d, ip=0xa0000001008115f1
messages in my logs on IA64 when using the ethernet bridge with 2.6.16.
Appended is a patch to fix them.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rewrap the overly long source code lines resulting from the previous
patch's addition of the slab cache flag SLAB_MEM_SPREAD. This patch
contains only formatting changes, and no function change.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mark file system inode and similar slab caches subject to SLAB_MEM_SPREAD
memory spreading.
If a slab cache is marked SLAB_MEM_SPREAD, then anytime that a task that's
in a cpuset with the 'memory_spread_slab' option enabled goes to allocate
from such a slab cache, the allocations are spread evenly over all the
memory nodes (task->mems_allowed) allowed to that task, instead of favoring
allocation on the node local to the current cpu.
The following inode and similar caches are marked SLAB_MEM_SPREAD:
file cache
==== =====
fs/adfs/super.c adfs_inode_cache
fs/affs/super.c affs_inode_cache
fs/befs/linuxvfs.c befs_inode_cache
fs/bfs/inode.c bfs_inode_cache
fs/block_dev.c bdev_cache
fs/cifs/cifsfs.c cifs_inode_cache
fs/coda/inode.c coda_inode_cache
fs/dquot.c dquot
fs/efs/super.c efs_inode_cache
fs/ext2/super.c ext2_inode_cache
fs/ext2/xattr.c (fs/mbcache.c) ext2_xattr
fs/ext3/super.c ext3_inode_cache
fs/ext3/xattr.c (fs/mbcache.c) ext3_xattr
fs/fat/cache.c fat_cache
fs/fat/inode.c fat_inode_cache
fs/freevxfs/vxfs_super.c vxfs_inode
fs/hpfs/super.c hpfs_inode_cache
fs/isofs/inode.c isofs_inode_cache
fs/jffs/inode-v23.c jffs_fm
fs/jffs2/super.c jffs2_i
fs/jfs/super.c jfs_ip
fs/minix/inode.c minix_inode_cache
fs/ncpfs/inode.c ncp_inode_cache
fs/nfs/direct.c nfs_direct_cache
fs/nfs/inode.c nfs_inode_cache
fs/ntfs/super.c ntfs_big_inode_cache_name
fs/ntfs/super.c ntfs_inode_cache
fs/ocfs2/dlm/dlmfs.c dlmfs_inode_cache
fs/ocfs2/super.c ocfs2_inode_cache
fs/proc/inode.c proc_inode_cache
fs/qnx4/inode.c qnx4_inode_cache
fs/reiserfs/super.c reiser_inode_cache
fs/romfs/inode.c romfs_inode_cache
fs/smbfs/inode.c smb_inode_cache
fs/sysv/inode.c sysv_inode_cache
fs/udf/super.c udf_inode_cache
fs/ufs/super.c ufs_inode_cache
net/socket.c sock_inode_cache
net/sunrpc/rpc_pipe.c rpc_inode_cache
The choice of which slab caches to so mark was quite simple. I marked
those already marked SLAB_RECLAIM_ACCOUNT, except for fs/xfs, dentry_cache,
inode_cache, and buffer_head, which were marked in a previous patch. Even
though SLAB_RECLAIM_ACCOUNT is for a different purpose, it marks the same
potentially large file system i/o related slab caches as we need for memory
spreading.
Given that the rule now becomes "wherever you would have used a
SLAB_RECLAIM_ACCOUNT slab cache flag before (usually the inode cache), use
the SLAB_MEM_SPREAD flag too", this should be easy enough to maintain.
Future file system writers will just copy one of the existing file system
slab cache setups and tend to get it right without thinking.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
After a scan, we weren't switching back to the original channel if we
were associated with an AP. So NetworkManager's periodic scans would
disrupt connectivity until the ESSID was manually set again. Fix that.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This is version 20 of the Wireless Extensions. This is the
completion of the RtNetlink work I started early 2004, it enables the
full Wireless Extension API over RtNetlink.
Few comments on the patch :
o totally driver transparent, no change in drivers needed.
o iwevent were already RtNetlink based since they were created
(around 2.5.7). This adds all the regular SET and GET requests over
RtNetlink, using the exact same mechanism and data format as iwevents.
o This is a Kconfig option, as currently most people have no
need for it. Surprisingly, patch is actually small and well
encapsulated.
o Tested on SMP, attention as been paid to make it 64 bits clean.
o Code do probably too many checks and could be further
optimised, but better safe than sorry.
o RtNetlink based version of the Wireless Tools available on
my web page for people inclined to try out this stuff.
I would also like to thank Alexey Kuznetsov for his helpful
suggestions to make this patch better.
Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The sk argument to ip6_xmit is never NULL nowadays since the skb->priority
assigment expects a valid socket.
Coverity #354
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In both cases n can't be NULL without crashing anyway.
Coverity #78
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
To really make sense of route notifications in the presence of
multiple tables, userspace also needs to be notified about routing
rule updates. Notifications are sent to the so far unused
RTNLGRP_NOP1 (now RTNLGRP_RULE) group.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Softmac scanning fails because the stop flag is not cleared before
scanning is started. The attached one-line patch fixes this.
Signed-Off-By: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch removes ieee80211softmac_reassoc which is neither implemented
nor used nor necessary.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds handling of reassociation to softmac when the AP
requests it. Patch from Larry Finger.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Recently the deauth packet handler was updated to use a deauth packet
struct (identical to the auth packet struct) and this now gives a
warning. This patch updates the code to properly use a deauth struct and
deauth variable.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch removes a blank line that shouldn't be there and fixes a
spelling error in softmac.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch updates the copyright statements in softmac that I
erroneously added for 2005 only (when we already had 2006).
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Version 2 of the patch. Added checks for version 0
and proper from/to DS bits. Even in promisc
mode we won't receive packets from another BSSes.
bcm43xx_rx() contains code to filter out packets from
foreign BSSes and decide whether to call ieee80211_rx
or ieee80211_rx_mgt. This is not bcm specific.
Patch adapts that code and adds it to 80211
as ieee80211_rx_any() function.
Signed-off-by: Denis Vlasenko <vda@ilport.com.ua>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Properly check return value of ieee80211softmac_alloc_mgt
in ieee80211softmac_disassoc_deauth (patch by Denis Vlasenko)
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The problem is in ip_push_pending_frames(), which uses:
if (!df) {
__ip_select_ident(iph, &rt->u.dst, 0);
} else {
iph->id = htons(inet->id++);
}
instead of ip_select_ident().
Right now I think the code is a nonsense. Most likely, I copied it from
old ip_build_xmit(), where it was really special, we had to decide
whether to generate unique ID when generating the first (well, the last)
fragment.
In ip_push_pending_frames() it does not make sense, it should use plain
ip_select_ident() instead.
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
get_h225_addr is exported, but declared static, which fails when
linking statically.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix missing inversion in address matching, it was broken during the
conversion to x_tables.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
x_tables matches and targets that require nf_conntrack_ipv[4|6] to work
don't have enough information to load on demand these modules. This
patch introduces the following changes to solve this issue:
o nf_ct_l3proto_try_module_get: try to load the layer 3 connection
tracker module and increases the refcount.
o nf_ct_l3proto_module put: drop the refcount of the module.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the family field in xt_[matches|targets] registered.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the first conntrack ID assigned is 2, use 1 instead.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix oversized message, use NLMSG_SPACE just one since it reserves space
for the netlink header and NFA_SPACE for every attribute.
Thanks to Harald Welte for the feedback
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The expectation mask has some particularities that requires a different
handling. The protocol number fields can be set to non-valid protocols,
ie. l3num is set to 0xFFFF. Since that protocol does not exist, the mask
tuple will not be dumped. Moreover, this results in a kernel panic when
nf_conntrack accesses the array of protocol handlers, that is PF_MAX (0x1F)
long.
This patch introduces the function ctnetlink_exp_dump_mask, that correctly
dumps the expectation mask. Such function uses the l3num value from the
expectation tuple that is a valid layer 3 protocol number. The value of the
l3num mask isn't dumped since it is meaningless from the userspace side.
Thanks to Yasuyuki Kozakai and Patrick McHardy for the feedback.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
do_ipv6_getsockopt returns -EINVAL for unknown options, not
-ENOPROTOOPT as do_ipv6_setsockopt.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allows dte facility patch to use 32 64 bit ioctl conversion mechanism
Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allows use of the optional user facility to insert ITU-T
(http://www.itu.int/ITU-T/) specified DTE facilities in call set-up x25
packets. This feature is optional; no facilities will be added if the ioctl
is not used, and call setup packet remains the same as before.
If the ioctls provided by the patch are used, then a facility marker will be
added to the x25 packet header so that the called dte address extension
facility can be differentiated from other types of facilities (as described in
the ITU-T X.25 recommendation) that are also allowed in the x25 packet header.
Facility markers are made up of two octets, and may be present in the x25
packet headers of call-request, incoming call, call accepted, clear request,
and clear indication packets. The first of the two octets represents the
facility code field and is set to zero by this patch. The second octet of the
marker represents the facility parameter field and is set to 0x0F because the
marker will be inserted before ITU-T type DTE facilities.
Since according to ITU-T X.25 Recommendation X.25(10/96)- 7.1 "All networks
will support the facility markers with a facility parameter field set to all
ones or to 00001111", therefore this patch should work with all x.25 networks.
While there are many ITU-T DTE facilities, this patch implements only the
called and calling address extension, with placeholders in the
x25_dte_facilities structure for the rest of the facilities.
Testing:
This patch was tested using a cisco xot router connected on its serial ports
to an X.25 network, and on its lan ports to a host running an xotd daemon.
It is also possible to test this patch using an xotd daemon and an x25tap
patch, where the xotd daemons work back-to-back without actually using an x.25
network. See www.fyonne.net for details on how to do this.
Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Acked-by: Andrew Hendry <ahendry@tusc.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following error from kernel
T2 kernel: schedule_timeout:
wrong timeout value ffffffffffffffff from ffffffff88164796
Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
To allow 32 bit x25 module structures to be passed to a 64 bit kernel via
ioctl using the new compat_sock_ioctl registration mechanism instead of the
obsolete 'register_ioctl32_conversion into hash table' mechanism
Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get socket timestamp handler function that does not use the
ioctl32_hash_table.
Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the register_ioctl32_conversion() patch in the kernel is now obsolete,
provide another method to allow 32 bit user space ioctls to reach the kernel.
Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Return negative error constant.
Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jing Min Zhao <zhaojignmin@hotmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here are some possible (and trivial) cleanups.
- use kzalloc() where possible
- invert allocation failure test like
if (object) {
/* Rest of function here */
}
to
if (object == NULL)
return NULL;
/* Rest of function here */
Signed-off-by: Ingo Oeser <ioe-lkml@rameria.de>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stupidly use kzalloc() instead of kmalloc()/memset()
everywhere where this is possible in net/ipv6/*.c .
Signed-off-by: Ingo Oeser <ioe-lkml@rameria.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two minor cleanups:
1. Using kzalloc() in fraq_alloc_queue()
saves the memset() in ipv6_frag_create().
2. Invert sense of if-statements to streamline code.
Inverts the comment, too.
Signed-off-by: Ingo Oeser <ioe-lkml@rameria.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Coverity checker noted this inconsequent NULL checking in
dnrt_drop().
Since all callers ensure that NULL isn't passed, we can simply remove
the check.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bridge code can use existing LLC output code when building
spanning tree protocol packets.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cleanup of LLC. llc_mac_hdr_init can take constant arguments,
and it is defined twice once in llc_output.h that is otherwise unused.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Acked-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bridge's communicate with each other using Spanning Tree Protocol
over a standard multicast address. There are times when testing or
layering bridges over existing topologies or tunnels, when it is
useful to use alternative multicast addresses for STP packets.
The 802.1d standard has some unused addresses, that can be used for this.
This patch is restrictive in that it only allows one of the possible
addresses in the standard.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use LLC for the receive path of Spanning Tree Protocol packets.
This allows link local multicast packets to be received by
other protocols (if they care), and uses the existing LLC
code to get STP packets back into bridge code.
The bridge multicast address is also checked, so bridges using
other link local multicast addresses are ignored. This allows
for use of different multicast addresses to define separate STP
domains.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cleanup the get/set of bridge timer value in the packets.
It is clearer not to bury the conversion in macro.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Optimize the forwarding and transmit paths. Both places are
called with bottom half/no preempt so there is no need to use
spin_lock_bh or rcu_read_lock.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move nf_bridge_alloc from header file to the one place it is
used and optimize it.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the VLAN macros in bridge netfilter code. Macros should
not depend on magic variables.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only use__constant_htons() for initializers and switch cases.
For other uses, it is just as efficient and clearer to use htons
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Run br_netfilter through Lindent to fix whitespace.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netfilter hook that is used to receive frames doesn't need to be a
stub. It is only called in two ways, both of which ignore the return
value.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use kzalloc versus kmalloc+memset. Also don't need to do
memset() of bridge address since it is in netdev private data
that is already zero'd in alloc_netdev.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The STP timers run off softirq (kernel timers), so there is no need to
disable bottom half in the spin locks.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/bridge/br_netfilter.c: In function `br_nf_pre_routing':
net/bridge/br_netfilter.c:427: warning: unused variable `vhdr'
net/bridge/br_netfilter.c:445: warning: unused variable `vhdr'
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/bridge/netfilter/ebtables.c:1481: warning: initialization makes pointer from integer without a cast
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is in preparation for having a dccp_minisock embedded into
dccp_request_sock so that feature negotiation can be done prior to
creating the full blown dccp_sock.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will later be included in struct dccp_request_sock so that we can
have per connection feature negotiation state while in the 3way
handshake, when we clone the DCCP_ROLE_LISTEN socket (in
dccp_create_openreq_child) we'll just copy this state from
dreq_minisock to dccps_minisock.
Also the feature negotiation and option parsing code will mostly touch
dccps_minisock, which will simplify some stuff.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No code changes, just tidying up, in some cases moving EXPORT_SYMBOLs
to just after the function exported, etc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch extends {get|set}sockopt compatibility layer in order to
move protocol specific parts to their place and avoid huge universal
net/compat.c file in the future.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We've already dereferenced 'np' a dozen
times at this point, so it's safe to say it's not null.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This option should IMHO no longer depend on EXPERIMENTAL.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
ACKed-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
We're now starting to have quite a number of places that do skb_pull
followed immediately by an skb_postpull_rcsum. We can merge these two
operations into one function with skb_pull_rcsum. This makes sense
since most pull operations on receive skb's need to update the
checksum.
I've decided to make this out-of-line since it is fairly big and the
fast path where hardware checksums are enabled need to call
csum_partial anyway.
Since this is a brand new function we get to add an extra check on the
len argument. As it is most callers of skb_pull ignore its return
value which essentially means that there is no check on the len
argument.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As per Robert Olsson's patch for ipv4, this is the DECnet
version to keep the code "in step". It changes the list
of rules to use RCU rather than an rwlock.
Inspired-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch means that 64bit kernel/32bit userland platforms will now
work correctly with DECnet.
Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The typedef for dn_address has been removed in favour of using __le16
or __u16 directly as appropriate. All the DECnet header files are
updated accordingly.
The byte ordering of dn_eth2dn() and dn_dn2eth() are both changed
since just about all their callers wanted network order rather than
host order, so the conversion is now done in the functions themselves.
Several missed endianess conversions have been picked up during the
conversion process. The nh_gw field in struct dn_fib_info has been
changed from a 32 bit field to 16 bits as it ought to be.
One or two cases of using htons rather than dn_htons in the routing
code have been found and fixed.
There are still a few warnings to fix, but this patch deals with the
important cases.
Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements an application of the LSM-IPSec networking
controls whereby an application can determine the label of the
security association its TCP or UDP sockets are currently connected to
via getsockopt and the auxiliary data mechanism of recvmsg.
Patch purpose:
This patch enables a security-aware application to retrieve the
security context of an IPSec security association a particular TCP or
UDP socket is using. The application can then use this security
context to determine the security context for processing on behalf of
the peer at the other end of this connection. In the case of UDP, the
security context is for each individual packet. An example
application is the inetd daemon, which could be modified to start
daemons running at security contexts dependent on the remote client.
Patch design approach:
- Design for TCP
The patch enables the SELinux LSM to set the peer security context for
a socket based on the security context of the IPSec security
association. The application may retrieve this context using
getsockopt. When called, the kernel determines if the socket is a
connected (TCP_ESTABLISHED) TCP socket and, if so, uses the dst_entry
cache on the socket to retrieve the security associations. If a
security association has a security context, the context string is
returned, as for UNIX domain sockets.
- Design for UDP
Unlike TCP, UDP is connectionless. This requires a somewhat different
API to retrieve the peer security context. With TCP, the peer
security context stays the same throughout the connection, thus it can
be retrieved at any time between when the connection is established
and when it is torn down. With UDP, each read/write can have
different peer and thus the security context might change every time.
As a result the security context retrieval must be done TOGETHER with
the packet retrieval.
The solution is to build upon the existing Unix domain socket API for
retrieving user credentials. Linux offers the API for obtaining user
credentials via ancillary messages (i.e., out of band/control messages
that are bundled together with a normal message).
Patch implementation details:
- Implementation for TCP
The security context can be retrieved by applications using getsockopt
with the existing SO_PEERSEC flag. As an example (ignoring error
checking):
getsockopt(sockfd, SOL_SOCKET, SO_PEERSEC, optbuf, &optlen);
printf("Socket peer context is: %s\n", optbuf);
The SELinux function, selinux_socket_getpeersec, is extended to check
for labeled security associations for connected (TCP_ESTABLISHED ==
sk->sk_state) TCP sockets only. If so, the socket has a dst_cache of
struct dst_entry values that may refer to security associations. If
these have security associations with security contexts, the security
context is returned.
getsockopt returns a buffer that contains a security context string or
the buffer is unmodified.
- Implementation for UDP
To retrieve the security context, the application first indicates to
the kernel such desire by setting the IP_PASSSEC option via
getsockopt. Then the application retrieves the security context using
the auxiliary data mechanism.
An example server application for UDP should look like this:
toggle = 1;
toggle_len = sizeof(toggle);
setsockopt(sockfd, SOL_IP, IP_PASSSEC, &toggle, &toggle_len);
recvmsg(sockfd, &msg_hdr, 0);
if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
cmsg_hdr->cmsg_level == SOL_IP &&
cmsg_hdr->cmsg_type == SCM_SECURITY) {
memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
}
}
ip_setsockopt is enhanced with a new socket option IP_PASSSEC to allow
a server socket to receive security context of the peer. A new
ancillary message type SCM_SECURITY.
When the packet is received we get the security context from the
sec_path pointer which is contained in the sk_buff, and copy it to the
ancillary message space. An additional LSM hook,
selinux_socket_getpeersec_udp, is defined to retrieve the security
context from the SELinux space. The existing function,
selinux_socket_getpeersec does not suit our purpose, because the
security context is copied directly to user space, rather than to
kernel space.
Testing:
We have tested the patch by setting up TCP and UDP connections between
applications on two machines using the IPSec policies that result in
labeled security associations being built. For TCP, we can then
extract the peer security context using getsockopt on either end. For
UDP, the receiving end can retrieve the security context using the
auxiliary data mechanism of recvmsg.
Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When xfrm_user isn't loaded xfrm_nl is NULL, which makes IPsec crash because
xfrm_aevent_is_on passes the NULL pointer to netlink_has_listeners as socket.
A second problem is that the xfrm_nl pointer is not cleared when the socket
is releases at module unload time.
Protect references of xfrm_nl from outside of xfrm_user by RCU, check
that the socket is present in xfrm_aevent_is_on and set it to NULL
when unloading xfrm_user.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Back in the dark ages, we had to be conservative and only allow 15-bit
window fields if the window scale option was not negotiated. Some
ancient stacks used a signed 16-bit quantity for the window field of
the TCP header and would get confused.
Those days are long gone, so we can use the full 16-bits by default
now.
There is a sysctl added so that we can still interact with such old
stacks
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The node_map struct can be quite large (516 bytes) and allocating two of
them on the stack is not a good idea since we might only have a 4K stack
to start with.
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch contains the following possible cleanups:
- make needlessly global code static
- #if 0 the following unused global functions:
- name_table.c: tipc_nametbl_print()
- name_table.c: tipc_nametbl_dump()
- net.c: tipc_net_next_node()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With reference to latest discussions on linux-kernel with respect to
inline here is a patch for tipc to remove all inlines as used in
the .c files. See also chapter 14 in Documentation/CodingStyle.
Before:
text data bss dec hex filename
102990 5292 1752 110034 1add2 tipc.o
Now:
text data bss dec hex filename
101190 5292 1752 108234 1a6ca tipc.o
This is a nice text size reduction which will improve icache usage.
In some cases bigger (> 4 lines) functions where declared inline
and used in many places, they are most probarly no longer inlined by gcc
resulting in the size reduction.
There are several one liners that no longer are declared inline, but gcc
should inline these just fine without the inline hint.
With this patch applied one warning is added about an unused static
function - that was hidded by utilising inline before.
The function in question were kept so this patch is solely a
inline removal patch.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tried to run the new tipc stack through sparse.
Following patch fixes all cases where 0 was used
as replacement of NULL.
Use NULL to document this is a pointer and to silence sparse.
This brough sparse warning count down with 127 to 24 warnings.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Per Liden <per.liden@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/ipv4/netfilter/ip_nat_snmp_basic.c: In function 'asn1_header_decode':
net/ipv4/netfilter/ip_nat_snmp_basic.c:248: warning: 'len' may be used uninitialized in this function
net/ipv4/netfilter/ip_nat_snmp_basic.c:248: warning: 'def' may be used uninitialized in this function
net/ipv4/netfilter/ip_nat_snmp_basic.c: In function 'snmp_translate':
net/ipv4/netfilter/ip_nat_snmp_basic.c:672: warning: 'l' may be used uninitialized in this function
net/ipv4/netfilter/ip_nat_snmp_basic.c:668: warning: 'type' may be used uninitialized in this function
Signed-off-by: David S. Miller <davem@davemloft.net>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of the old __dev_put macro that is just a hold over from pre 2.6
kernel. And turn dev_hold into an inline instead of a macro.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
And not the silly LIMIT_NETDEBUG and silently return without inserting
the option requested.
Also drop some old debugging messages associated to option insertion.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merging it with its only user: dccp_v[46]_reqsk_send_ack.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using this also provides opportunities for introducing
inet_csk_alloc_skb that would call alloc_skb, account it to the sock
and skb_reserve(max_header), but I'll leave this for later, for now
using sk_prot->max_header consistently is enough.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I.e. they should be just ignored, but we have to use 'break', not 'continue',
as we have to possibly reset the mandatory flag.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The functions list_del followed by list_add_tail is equivalent to the
existing inline list_move_tail. list_move_tail avoids unnecessary
_LIST_POISON.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct neigh_ops currently has a destructor field, which no in-kernel
drivers outside of infiniband use. The infiniband/ulp/ipoib in-tree
driver stashes some info in the neighbour structure (the results of
the second-stage lookup from ARP results to real link-level path), and
it uses neigh->ops->destructor to get a callback so it can clean up
this extra info when a neighbour is freed. We've run into problems
with this: since the destructor is in an ops field that is shared
between neighbours that may belong to different net devices, there's
no way to set/clear it safely.
The following patch moves this field to neigh_parms where it can be
safely set, together with its twin neigh_setup. Two additional
patches in the patch series update ipoib to use this new interface.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to the thread's lock changes, we're at a new version now.
Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
As suggested by Arnaldo, this patch replaces the
thread_lock()/thread_unlock() by directly calls to
mutex_lock()/mutex_unlock().
This change makes the code a bit more readable, and the direct calls
are used everywhere in the kernel.
Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
pktgen's thread semaphores are strict mutexes, convert them to the
mutex implementation.
Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch turns the RTNL from a semaphore to a new 2.6.16 mutex and
gets rid of some of the leftover legacy.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
First, it warns when PAGE_SIZE >= 64K because the ctx_len
field is 16-bits.
Secondly, if there are any real length limitations it can
be verified by the security layer security_xfrm_state_alloc()
call.
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of estimating the time since the last congestion event, count
it directly.
Signed-off-by: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Account for delayed-ACKs in H-TCP.
Delayed-ACKs cause H-TCP to be less aggressive than its design calls
for. It is especially true when the receiver is a Linux machine where
the average delayed ack is over 3 packets with values of 7 not unheard
of.
Signed-off-By: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use functions to calculate jiffies from milliseconds and not the old,
crude method of dividing HZ by a value. Ensures more accurate values
even in the face of strange HZ values.
Signed-off-By: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
With all the previous changes, we're at a new version now.
Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch ports the per-thread interface list list to the in-kernel
linked list implementation. In the general, the resulting code is a
bit simpler.
Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even if pktgen's thread initialization fails for all CPUs, the module
will be successfully loaded.
This patch changes that behaivor, by returning an error on module load time,
and also freeing all the resources allocated. It also prints a warning if a
thread initialization has failed.
Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
Free all the alocated resources if kernel_thread() call fails.
Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
The final result is a simpler and smaller code.
Note that I'm adding a new member in the struct pktgen_thread called
'removed'. The reason is that I didn't find a better wait condition to
be used in the place of the replaced one.
Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lindet run, with some fixes made by hand.
Signed-off-by: Luiz Capitulino <lcapitulino@mandriva.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to dccp draft (draft-ietf-dccp-spec-13.txt) section 5.8.2
(Mandatory Option) the following patch correct the handling of the
following cases:
1) "... and any Mandatory options received on DCCP-Data packets MUST be
ignored."
2) "The connection is in error and should be reset with Reset Code 5, ...
if option O is absent (Mandatory was the last byte of the option list), or
if option O equals Mandatory."
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
No changes in the logic were made, just removing trailing whitespaces,
etc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consolidating open coded sequences in tcp and dccp, v4 and v6.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I guess I forgot to add it, nah, now it just works:
18:04:33.274066 IP6 ::1.1476 > ::1.5001: request (service=0)
18:04:33.334482 IP6 ::1.5001 > ::1.1476: reset (code=bad_service_code)
Ditched IP_DCCP_UNLOAD_HACK, as now we would have to do it for both
IPv6 and IPv4, so I'll come up with another way for freeing the
control sockets in upcoming changesets.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no reason for struct dccp_v4_prot being global.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fib_triestats has been buggy and caused oopses some platforms as
openwrt. The patch below should cure those problems.
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some kernel configs /proc functions seems to be accessed before the
trie is initialized. The patch below checks for this.
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
This moves some TCP-specific MTU probing state out of
inet_connection_sock back to tcp_sock.
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch below replaces a divide by 2 with a shift -- sk_sndbuf is an
integer, so gcc emits an idiv, which takes 10x longer than a shift by 1.
This improves af_unix bandwidth by ~6-10K/s. Also, tidy up the comment
to fit in 80 columns while we're at it.
Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
o Uninline kfree_skb, which saves some 15k of object code on my notebook.
o Allow kfree_skb to be called with a NULL argument.
Subsequent patches can remove conditional from drivers and further
reduce source and object size.
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thanks to Leslie Harlley Watter <leslie@watter.org> for reporting the
problem an testing this patch.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a race in pktgen which can lead to a double
free of a pktgen_dev's skb. If a worker thread is in
the midst of doing fill_packet(), and the controlling
thread gets a "stop" message, the already freed skb
can be freed once again in pktgen_stop_device(). This
patch gives all responsibility for cleaning up a
pktgen_dev's skb to the associated worker thread.
Signed-off-by: Arthur Kepner <akepner@sgi.com>
Acked-by: Robert Olsson <Robert.Olsson@data.slu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this patch in place we can break down the complexity by better
compartmentalizing the code that is common to ipv6 and ipv4.
Now we have these modules:
Module Size Used by
dccp_diag 1344 0
inet_diag 9448 1 dccp_diag
dccp_ccid3 15856 0
dccp_tfrc_lib 12320 1 dccp_ccid3
dccp_ccid2 5764 0
dccp_ipv4 16996 2
dccp 48208 4 dccp_diag,dccp_ccid3,dccp_ccid2,dccp_ipv4
dccp_ipv6 still requires dccp_ipv4 due to dccp_ipv6_mapped, that is
the next target to work on the "hey, ipv4 is legacy, I only want ipv6
dude!" direction.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And introduce dccp_mib_exit grouping previously open coded sequence.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dccp_make_response is shared by ipv4/6 and the ipv6 code was
recalculating the checksum, not good, so move the dccp_v4_checksum
call to dccp_v4_send_response.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As this is used by both ipv4 and ipv6 and is not ipv4 specific.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removing one more ipv6 uses ipv4 stuff case in dccp land.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>