Right now, some modules such as bonding use proc_create
to create proc entries under /proc/net/, and other modules
such as ipv4 use proc_net_fops_create.
It looks a little chaos.this patch changes all of
proc_net_fops_create to proc_create. we can remove
proc_net_fops_create after this patch.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
1) Remove a duplicated call to skb_orphan() in pf_key, from Cong Wang.
2) Prepare xfrm and pf_key for algorithms without pf_key support,
from Jussi Kivilinna.
3) Fix an unbalanced lock in xfrm_output_one(), from Li RongQing.
4) Add an IPsec state resolution packet queue to handle
packets that are send before the states are resolved.
5) xfrm4_policy_fini() is unused since 2.6.11, time to remove it.
From Michal Kubecek.
6) The xfrm gc threshold was configurable just in the initial
namespace, make it configurable in all namespaces. From
Michal Kubecek.
7) We currently can not insert policies with mark and mask
such that some flows would be matched from both policies.
Allow this if the priorities of these policies are different,
the one with the higher priority is used in this case.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark existing algorithms as pfkey supported and make pfkey only use algorithms
that have pfkey_supported set.
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
All users of xfrm_addr_cmp() use its result as boolean.
Introduce xfrm_addr_equal() (which is equal to !xfrm_addr_cmp())
and convert all users.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_set_owner_r() will call skb_orphan(), I don't
see any reason to call it twice.
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
Allow creation of af_key sockets.
Allow creation of llc sockets.
Allow creation of af_packet sockets.
Allow sending xfrm netlink control messages.
Allow binding to netlink multicast groups.
Allow sending to netlink multicast groups.
Allow adding and dropping netlink multicast groups.
Allow sending to all netlink multicast groups and port ids.
Allow reading the netfilter SO_IP_SET socket option.
Allow sending netfilter netlink messages.
Allow setting and getting ip_vs netfilter socket options.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because sizeof() is size_t then if "len" is negative, it counts as a
large positive value.
The call tree looks like:
pfkey_sendmsg()
-> pfkey_process()
-> pfkey_spdadd()
-> parse_ipsecrequests()
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is a frequent mistake to confuse the netlink port identifier with a
process identifier. Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.
I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.
I have successfully built an allyesconfig kernel with this change.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is an initial merge in of Eric Biederman's work to start adding
user namespace support to the networking.
Signed-off-by: David S. Miller <davem@davemloft.net>
Sematically speaking, xfrm_mgr.acquire is called when kernel intends to ask
user space IKE daemon to negotiate SAs with peers. IOW the direction will
*always* be XFRM_POLICY_OUT, so remove int dir for clarity.
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use of "unsigned int" is preferred to bare "unsigned" in net tree.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the point of this error-handling code, alloc_skb has succeded, so free
the resulting skb by jumping to the err label.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
C assignment can handle struct in6_addr copying.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unnecessary casts of void * clutter the code.
These are the remainder casts after several specific
patches to remove netdev_priv and dev_priv.
Done via coccinelle script:
$ cat cast_void_pointer.cocci
@@
type T;
T *pt;
void *pv;
@@
- pt = (T *)pv;
+ pt = pv;
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@conan.davemloft.net>
The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces. Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers. The behavior of %pK depends on the kptr_restrict sysctl.
If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs. If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
(currently in the LSM tree), kernel pointers using %pK are printed as 0's.
If kptr_restrict is set to 2, kernel pointers using %pK are printed as
0's regardless of privileges. Replacing with 0's was chosen over the
default "(null)", which cannot be parsed by userland %p, which expects
"(nil)".
The supporting code for kptr_restrict and %pK are currently in the -mm
tree. This patch converts users of %p in net/ to %pK. Cases of printing
pointers to the syslog are not covered, since this would eliminate useful
information for postmortem debugging and the reading of the syslog is
already optionally protected by the dmesg_restrict sysctl.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: James Morris <jmorris@namei.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Thomas Graf <tgraf@infradead.org>
Cc: Eugene Teo <eugeneteo@kernel.org>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If CONFIG_NET_KEY_MIGRATE is not defined the arguments of
pfkey_migrate stub do not match causing warning.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This actually pointed out a (seemingly known) bug where we mangle the
pfkey header in a potentially shared SKB, which is fixed here.
Signed-off-by: David S. Miller <davem@davemloft.net>
Change "return (EXPR);" to "return EXPR;"
return is not a function, parentheses are not required.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Put severity level on pfkey printk messages
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
The original code saved the error value but just returned 0 in the end.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pass mark to all SP lookups to prepare them for when we add code
to have them search.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
pass mark to all SA lookups to prepare them for when we add code
to have them search.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of custom locking that was using wait queue, lock, and atomic
to basically build a queued mutex. Use RCU for read side.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To see the effect make sure you have an empty SPD.
On window1 "ip xfrm mon" and on window2 issue "ip xfrm policy flush"
You get prompt back in window2 and you see the flush event on window1.
With this fix, you still get prompt on window1 but no event on window2.
Thanks to Alexey Dobriyan for finding a bug in earlier version
when using pfkey to do the flushing.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
To see the effect make sure you have an empty SAD.
On window1 "ip xfrm mon" and on window2 issue "ip xfrm state flush"
You get prompt back in window2 and you see the flush event on window1.
With this fix, you still get prompt on window1 but no event on window2.
Thanks to Alexey Dobriyan for finding a bug in earlier version
when using pfkey to do the flushing.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 2367 says flushing behavior should be:
1) user space -> kernel: flush
2) kernel: flush
3) kernel -> user space: flush event to ALL listeners
This is not realistic today in the presence of selinux policies
which may reject the flush etc. So we make the sequence become:
1) user space -> kernel: flush
2) kernel: flush
3) kernel -> user space: flush response to originater from #1
4) if there were no errors then:
kernel -> user space: flush event to ALL listeners
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
As reported by Alexey Dobriyan:
--------------------
setkey now takes several seconds to run this simple script
and it spits "recv: Resource temporarily unavailable" messages.
#!/usr/sbin/setkey -f
flush;
spdflush;
add A B ipcomp 44 -m tunnel -C deflate;
add B A ipcomp 45 -m tunnel -C deflate;
spdadd A B any -P in ipsec
ipcomp/tunnel/192.168.1.2-192.168.1.3/use;
spdadd B A any -P out ipsec
ipcomp/tunnel/192.168.1.3-192.168.1.2/use;
--------------------
Obviously applications want the events even when the table
is empty. So we cannot make this behavioral change.
Signed-off-by: David S. Miller <davem@davemloft.net>
Observed similar behavior on SPD as previouly seen on SAD flushing..
This fixes it.
cheers,
jamal
commit 428b20432dc31bc2e01a94cd451cf5a2c00d2bf4
Author: Jamal Hadi Salim <hadi@cyberus.ca>
Date: Thu Feb 11 05:49:38 2010 -0500
xfrm: Flushing empty SPD generates false events
To see the effect make sure you have an empty SPD.
On window1 "ip xfrm mon" and on window2 issue "ip xfrm policy flush"
You get prompt back in window1 and you see the flush event on window2.
With this fix, you still get prompt on window1 but no event on window2.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
To see the effect make sure you have an empty SAD.
-On window1 "ip xfrm mon"
-on window2 issue "ip xfrm state flush"
You get prompt back in window1
and you see the flush event on window2.
With this fix, you still get prompt on window1 but no
event on window2.
I was tempted to return -ESRCH on window1 (which would
show "RTNETLINK answers: No such process") but didnt want
to change current behavior.
cheers,
jamal
commit 5f3dd4a772326166e1bcf54acc2391df00dc7ab5
Author: Jamal Hadi Salim <hadi@cyberus.ca>
Date: Thu Feb 11 04:41:36 2010 -0500
xfrm: Flushing empty SAD generates false events
To see the effect make sure you have an empty SAD.
On window1 "ip xfrm mon" and on window2 issue "ip xfrm state flush"
You get prompt back in window1 and you see the flush event on window2.
With this fix, you still get prompt on window1 but no event on window2.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. After sock_register() returns, it's possible to create sockets,
even if module still not initialized fully (blame generic module code
for that!)
2. Consequently, pfkey_create() can be called with pfkey_net_id still not
initialized which will BUG_ON in net_generic():
kernel BUG at include/net/netns/generic.h:43!
3. During netns shutdown, netns ops should be unregistered after
key manager unregistered because key manager calls can be triggered
from xfrm_user module:
general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
pfkey_broadcast+0x111/0x210 [af_key]
pfkey_send_notify+0x16a/0x300 [af_key]
km_state_notify+0x41/0x70
xfrm_flush_sa+0x75/0x90 [xfrm_user]
4. Unregister netns ops after socket ops just in case and for symmetry.
Reported by Luca Tettamanti.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Tested-by: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use atomic_inc_return() in get_acqseq() to avoid taking a spinlock
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__net_init/__net_exit are apparently not going away, so use them
to full extent.
In some cases __net_init was removed, because it was called from
__net_exit code.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 4447bb33f0 ("xfrm: Store aalg in
xfrm_state with a user specified truncation length") breaks
installation of authentication algorithms via PF_KEY, as the state
specific truncation length is not installed with the algorithms
default truncation length. This patch initializes state properly to
the default if installed via PF_KEY.
Signed-off-by: Martin Willi <martin@strongswan.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Take advantage of the new pernet automatic storage management,
and stop using compatibility network namespace functions.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The generic __sock_create function has a kern argument which allows the
security system to make decisions based on if a socket is being created by
the kernel or by userspace. This patch passes that flag to the
net_proto_family specific create function, so it can do the same thing.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create a new socket level option to report number of queue overflows
Recently I augmented the AF_PACKET protocol to report the number of frames lost
on the socket receive queue between any two enqueued frames. This value was
exported via a SOL_PACKET level cmsg. AFter I completed that work it was
requested that this feature be generalized so that any datagram oriented socket
could make use of this option. As such I've created this patch, It creates a
new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
overflowed between any two given frames. It also augments the AF_PACKET
protocol to take advantage of this new feature (as it previously did not touch
sk->sk_drops, which this patch uses to record the overflow count). Tested
successfully by me.
Notes:
1) Unlike my previous patch, this patch simply records the sk_drops value, which
is not a number of drops between packets, but rather a total number of drops.
Deltas must be computed in user space.
2) While this patch currently works with datagram oriented protocols, it will
also be accepted by non-datagram oriented protocols. I'm not sure if thats
agreeable to everyone, but my argument in favor of doing so is that, for those
protocols which aren't applicable to this option, sk_drops will always be zero,
and reporting no drops on a receive queue that isn't used for those
non-participating protocols seems reasonable to me. This also saves us having
to code in a per-protocol opt in mechanism.
3) This applies cleanly to net-next assuming that commit
977750076d (my af packet cmsg patch) is reverted
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All usages of structure net_proto_ops should be declared const.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All instances of file_operations should be const.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 2b85a34e91
(net: No more expensive sock_hold()/sock_put() on each tx)
changed initial sk_wmem_alloc value.
We need to take into account this offset when reporting
sk_wmem_alloc to user, in PROC_FS files or various
ioctls (SIOCOUTQ/TIOCOUTQ)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove some pointless conditionals before kfree_skb().
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently encap_oa is left uninitialized, so it contains garbage data which
is visible to userland via Netlink. Initialize it by zeroing it out.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* interaction with userspace -- take netns from userspace socket.
* in ->notify hook take netns either from SA or explicitly passed --
we don't know if SA/SPD flush is coming.
* stub policy migration with init_net for now.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* netns boilerplate
* keep per-netns socket list
* keep per-netns number of sockets
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SA and SPD flush are executed with NULL SA and SPD respectively, for
these cases pass netns explicitly from userspace socket.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add netns parameter to xfrm_policy_bysel_ctx(), xfrm_policy_byidx().
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Again, to avoid complications with passing netns when not necessary.
Again, ->xp_net is set-once field, once set it never changes.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Disallow spurious wakeups in __xfrm_lookup().
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To avoid unnecessary complications with passing netns around.
* set once, very early after allocating
* once set, never changes
For a while create every xfrm_state in init_net.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
xfrm_policy_destroy() will oops if not dead policy is passed to it.
On error path in pfkey_compile_policy() exactly this happens.
Oopsable for CAP_NET_ADMIN owners.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When deleting an SPD entry using SADB_X_SPDDELETE, c.data.byid is not
initialized to zero in pfkey_spddelete(). Thus, key_notify_policy()
responds with a PF_KEY message of type SADB_X_SPDDELETE2 instead of
SADB_X_SPDDELETE.
Signed-off-by: Tobias Brunner <tobias.brunner@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provides implementation of the enhancements of XFRM/PF_KEY MIGRATE mechanism
specified in draft-ebalard-mext-pfkey-enhanced-migrate-00. Defines associated
PF_KEY SADB_X_EXT_KMADDRESS extension and XFRM/netlink XFRMA_KMADDRESS
attribute.
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu came up with the idea and the original patch to make
xfrm_state dump list contain also dumpers:
As it is we go to extraordinary lengths to ensure that states
don't go away while dumpers go to sleep. It's much easier if
we just put the dumpers themselves on the list since they can't
go away while they're going.
I've also changed the order of addition on new states to prevent
a never-ending dump.
Timo Teräs improved the patch to apply cleanly to latest tree,
modified iteration code to be more readable by using a common
struct for entries in the list, implemented the same idea for
xfrm_policy dumping and moved the af_key specific "last" entry
caching to af_key.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a xfrm_{state,policy}_walk leak if pfkey socket is closed while
dumping is on-going.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.
I could make at least one BUILD_BUG_ON conversion.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
When pfkey has no km listeners, it still does a lot of work
before finding out there aint nobody out there.
If a tree falls in a forest and no one is around to hear it, does it make
a sound? In this case it makes a lot of noise:
With this short-circuit adding 10s of thousands of SAs using
netlink improves performance by ~10%.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
This propagates the xfrm_user fix made in commit
bcf0dda8d2 ("[XFRM]: xfrm_user: fix
selector family initialization")
Based upon a bug report from, and tested by, Alan Swanson.
Signed-off-by: Kazunori MIYAZAWA <kazunori@miyazawa.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously I added sessionid output to all audit messages where it was
available but we still didn't know the sessionid of the sender of
netlink messages. This patch adds that information to netlink messages
so we can audit who sent netlink messages.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
net/key/af_key.c: In function ‘pfkey_spddelete’:
net/key/af_key.c:2359: warning: ‘pol_ctx’ may be used uninitialized in
this function
When CONFIG_SECURITY_NETWORK_XFRM isn't set,
security_xfrm_policy_alloc() is an inline that doesn't set pol_ctx, so
this seemed like the easiest fix short of using *uninitialized_var(pol_ctx).
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As it stands it's impossible to use any authentication algorithms
with an ID above 31 portably. It just happens to work on x86 but
fails miserably on ppc64.
The reason is that we're using a bit mask to check the algorithm
ID but the mask is only 32 bits wide.
After looking at how this is used in the field, I have concluded
that in the long term we should phase out state matching by IDs
because this is made superfluous by the reqid feature. For current
applications, the best solution IMHO is to allow all algorithms when
the bit masks are all ~0.
The following patch does exactly that.
This bug was identified by IBM when testing on the ppc64 platform
using the NULL authentication algorithm which has an ID of 251.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The xfrm_get_policy() and xfrm_add_pol_expire() put some rather large structs
on the stack to work around the LSM API. This patch attempts to fix that
problem by changing the LSM API to require only the relevant "security"
pointers instead of the entire SPD entry; we do this for all of the
security_xfrm_policy*() functions to keep things consistent.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stop dumping of entries when af_key socket receive queue is getting
full and continue it later when there is more room again.
This fixes dumping of large databases. Currently the entries not
fitting into the receive queue are just dropped (including the
end-of-dump message) which can confuse applications.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change xfrm_policy and xfrm_state walking algorithm from O(n^2) to O(n).
This is achieved adding the entries to one more list which is used
solely for walking the entries.
This also fixes some races where the dump can have duplicate or missing
entries when the SPD/SADB is modified during an ongoing dump.
Dumping SADB with 20000 entries using "time ip xfrm state" the sys
time dropped from 1.012s to 0.080s.
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
To make sure the procfs visibility occurs after the ->proc_fs ops are
setup, use proc_net_fops_create() and proc_net_remove().
This also fixes an OOPS after module unload in that the name string
for remove was wrong, so it wouldn't actually be removed. That bug
was introduced by commit 61145aa1a1
("[KEY]: Clean up proc files creation a bit.")
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fix a BUG when adding spds which have same selector.
Signed-off-by: Kazunori MIYAZAWA <kazunori@miyazawa.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The seq files API disposes the caller of the difficulty of
checking file position, the length of data to produce and
the size of provided buffer.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mainly this removes ifdef-s from inside the ipsec_pfkey_init.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since __xfrm_policy_destroy is used to destory the resources
allocated by xfrm_policy_alloc. So using the name
__xfrm_policy_destroy is not correspond with xfrm_policy_alloc.
Rename it to xfrm_policy_destroy.
And along with some instances that call xfrm_policy_alloc
but not using xfrm_policy_destroy to destroy the resource,
fix them.
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The aalgos/ealgos fields are only 32 bits wide. However, af_key tries
to test them with the expression 1 << id where id can be as large as
253. This produces different behaviour on different architectures.
The following patch explicitly checks whether ID is greater than 31
and fails the check if that's the case.
We cannot easily extend the mask to be longer than 32 bits due to
exposure to user-space. Besides, this whole interface is obsolete
anyway in favour of the xfrm_user interface which doesn't use this
bit mask in templates (well not within the kernel anyway).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The change 050f009e16
[IPSEC]: Lock state when copying non-atomic fields to user-space
caused a regression.
Ingo Molnar reports that it causes a potential dead-lock found by the
lock validator as it tries to take x->lock within xfrm_state_lock while
numerous other sites take the locks in opposite order.
For 2.6.24, the best fix is to simply remove the added locks as that puts
us back in the same state as we've been in for years. For later kernels
a proper fix would be to reverse the locking order for every xfrm state
user such that if x->lock is taken together with xfrm_state_lock then
it is to be taken within it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
From: Charles Hardin <chardin@2wire.com>
Kernel needs to respond to an SADB_GET with the same message type to
conform to the RFC 2367 Section 3.1.5
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Finally, the zero_it argument can be completely removed from
the callers and from the function prototype.
Besides, fix the checkpatch.pl warnings about using the
assignments inside if-s.
This patch is rather big, and it is a part of the previous one.
I splitted it wishing to make the patches more readable. Hope
this particular split helped.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
On PowerPC allmodconfig build we get this:
net/key/af_key.c:400: warning: comparison is always false due to limited range of data type
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds locking so that when we're copying non-atomic fields such as
life-time or coaddr to user-space we don't get a partial result.
For af_key I've changed every instance of pfkey_xfrm_state2msg apart from
expiration notification to include the keys and life-times. This is in-line
with XFRM behaviour.
The actual cases affected are:
* pfkey_getspi: No change as we don't have any keys to copy.
* key_notify_sa:
+ ADD/UPD: This wouldn't work otherwise.
+ DEL: It can't hurt.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves some common code that conceptually belongs to the xfrm core
from af_key/xfrm_user into xfrm_alloc_spi.
In particular, the spin lock on the state is now taken inside xfrm_alloc_spi.
Previously it also protected the construction of the response PF_KEY/XFRM
messages to user-space. This is inconsistent as other identical constructions
are not protected by the state lock. This is bad because they in fact should
be protected but only in certain spots (so as not to hold the lock for too
long which may cause packet drops).
The SPI byte order conversion has also been moved.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch passes in the namespace a new socket should be created in
and has the socket code do the appropriate reference counting. By
virtue of this all socket create methods are touched. In addition
the socket create methods are modified so that they will fail if
you attempt to create a socket in a non-default network namespace.
Failing if we attempt to create a socket outside of the default
network namespace ensures that as we incrementally make the network stack
network namespace aware we will not export functionality that someone
has not audited and made certain is network namespace safe.
Allowing us to partially enable network namespaces before all of the
exotic protocols are supported.
Any protocol layers I have missed will fail to compile because I now
pass an extra parameter into the socket creation code.
[ Integrated AF_IUCV build fixes from Andrew Morton... -DaveM ]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes /proc/net per network namespace. It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass &init_net for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has been updated to be handle multiple network namespaces.
Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modifies the current ipsec audit layer
by breaking it up into purpose driven audit calls.
So far, the only audit calls made are when add/delete
an SA/policy. It had been discussed to give each
key manager it's own calls to do this, but I found
there to be much redundnacy since they did the exact
same things, except for how they got auid and sid, so I
combined them. The below audit calls can be made by any
key manager. Hopefully, this is ok.
Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although an ipsec SA was established, kernel couldn't seem to find it.
I think since we are now using "x->sel.family" instead of "family" in
the xfrm_selector_match() called in xfrm_state_find(), af_key needs to
set this field too, just as xfrm_user.
In af_key.c, x->sel.family only gets set when there's an
ext_hdrs[SADB_EXT_ADDRESS_PROXY-1] which I think is for tunnel.
I think pfkey needs to also set the x->sel.family field when it is 0.
Tested with below patch, and ipsec worked when using pfkey.
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we check for permission before deleting entries from SAD and
SPD, (see security_xfrm_policy_delete() security_xfrm_state_delete())
However we are not checking for authorization when flushing the SPD and
the SAD completely. It was perhaps missed in the original security hooks
patch.
This patch adds a security check when flushing entries from the SAD and
SPD. It runs the entire database and checks each entry for a denial.
If the process attempting the flush is unable to remove all of the
entries a denial is logged the the flush function returns an error
without removing anything.
This is particularly useful when a process may need to create or delete
its own xfrm entries used for things like labeled networking but that
same process should not be able to delete other entries or flush the
entire database.
Signed-off-by: Joy Latten<latten@austin.ibm.com>
Signed-off-by: Eric Paris <eparis@parisplace.org>
Signed-off-by: James Morris <jmorris@namei.org>
This is a natural extension of the changeset
[XFRM]: Probe selected algorithm only.
which only removed the probe call for xfrm_user. This patch does exactly
the same thing for af_key. In other words, we load the algorithm requested
by the user rather than everything when adding xfrm states in af_key.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Spring cleaning time...
There seems to be a lot of places in the network code that have
extra bogus semicolons after conditionals. Most commonly is a
bogus semicolon after: switch() { }
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common, open coded 'skb->h.raw = skb->data' operation, so that we can
later turn skb->h.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.
This one touches just the most simple cases:
skb->h.raw = skb->data;
skb->h.raw = {skb_push|[__]skb_pull}()
The next ones will handle the slightly more "complex" cases.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure to actually assign the determined mode to
rq->sadb_x_ipsecrequest_mode.
Noticed by Joe Perches.
Signed-off-by: David S. Miller <davem@davemloft.net>
We should not blindly convert between IPSEC_MODE_xxx and XFRM_MODE_xxx just
by incrementing / decrementing because the assumption is not true any longer.
Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org>
Singed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Inside pfkey_delete and xfrm_del_sa the audit hooks were not called if
there was any permission/security failures in attempting to do the del
operation (such as permission denied from security_xfrm_state_delete).
This patch moves the audit hook to the exit path such that all failures
(and successes) will actually get audited.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Venkat Yekkirala <vyekkirala@trustedcs.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
pfkey_spdget neither had an LSM security hook nor auditing for the
removal of xfrm_policy structs. The security hook was added when it was
moved into xfrm_policy_byid instead of the callers to that function by
my earlier patch and this patch adds the auditing hooks as well.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Venkat Yekkirala <vyekkirala@trustedcs.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The security hooks to check permissions to remove an xfrm_policy were
actually done after the policy was removed. Since the unlinking and
deletion are done in xfrm_policy_by* functions this moves the hooks
inside those 2 functions. There we have all the information needed to
do the security check and it can be done before the deletion. Since
auditing requires the result of that security check err has to be passed
back and forth from the xfrm_policy_by* functions.
This patch also fixes a bug where a deletion that failed the security
check could cause improper accounting on the xfrm_policy
(xfrm_get_policy didn't have a put on the exit path for the hold taken
by xfrm_policy_by*)
It also fixes the return code when no policy is found in
xfrm_add_pol_expire. In old code (at least back in the 2.6.18 days) err
wasn't used before the return when no policy is found and so the
initialization would cause err to be ENOENT. But since err has since
been used above when we don't get a policy back from the xfrm_policy_by*
function we would always return 0 instead of the intended ENOENT. Also
fixed some white space damage in the same area.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Venkat Yekkirala <vyekkirala@trustedcs.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure that this function is called correctly, and
add BUG() checking to ensure the arguments are sane.
Based upon a patch by Joy Latten.
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend PF_KEYv2 framework so that user application can take advantage
of MIGRATE feature via PF_KEYv2 interface. User application can either
send or receive an MIGRATE message to/from PF_KEY socket.
Detail information can be found in the internet-draft
<draft-sugimoto-mip6-pfkey-migrate>.
Signed-off-by: Shinta Sugimoto <shinta.sugimoto@ericsson.com>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
An audit message occurs when an ipsec SA
or ipsec policy is created/deleted.
Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when an IPSec policy rule doesn't specify a security
context, it is assumed to be "unlabeled" by SELinux, and so
the IPSec policy rule fails to match to a flow that it would
otherwise match to, unless one has explicitly added an SELinux
policy rule allowing the flow to "polmatch" to the "unlabeled"
IPSec policy rules. In the absence of such an explicitly added
SELinux policy rule, the IPSec policy rule fails to match and
so the packet(s) flow in clear text without the otherwise applicable
xfrm(s) applied.
The above SELinux behavior violates the SELinux security notion of
"deny by default" which should actually translate to "encrypt by
default" in the above case.
This was first reported by Evgeniy Polyakov and the way James Morris
was seeing the problem was when connecting via IPsec to a
confined service on an SELinux box (vsftpd), which did not have the
appropriate SELinux policy permissions to send packets via IPsec.
With this patch applied, SELinux "polmatching" of flows Vs. IPSec
policy rules will only come into play when there's a explicit context
specified for the IPSec policy rule (which also means there's corresponding
SELinux policy allowing appropriate domains/flows to polmatch to this context).
Secondly, when a security module is loaded (in this case, SELinux), the
security_xfrm_policy_lookup() hook can return errors other than access denied,
such as -EINVAL. We were not handling that correctly, and in fact
inverting the return logic and propagating a false "ok" back up to
xfrm_lookup(), which then allowed packets to pass as if they were not
associated with an xfrm policy.
The solution for this is to first ensure that errno values are
correctly propagated all the way back up through the various call chains
from security_xfrm_policy_lookup(), and handled correctly.
Then, flow_cache_lookup() is modified, so that if the policy resolver
fails (typically a permission denied via the security module), the flow
cache entry is killed rather than having a null policy assigned (which
indicates that the packet can pass freely). This also forces any future
lookups for the same flow to consult the security module (e.g. SELinux)
for current security policy (rather than, say, caching the error on the
flow cache entry).
This patch: Fix the selinux side of things.
This makes sure SELinux polmatching of flow contexts to IPSec policy
rules comes into play only when an explicit context is associated
with the IPSec policy rule.
Also, this no longer defaults the context of a socket policy to
the context of the socket since the "no explicit context" case
is now handled properly.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: James Morris <jmorris@namei.org>
Sub policy can be used through netlink socket.
PF_KEY uses main only and it is TODO to support sub.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Transformation mode is used as either IPsec transport or tunnel.
It is required to add two more items, route optimization and inbound trigger
for Mobile IPv6.
Based on MIPL2 kernel patch.
This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This defaults the label of socket-specific IPSec policies to be the
same as the socket they are set on.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This includes the security context of a security association created
for use by IKE in the acquire messages sent to IKE daemons using
PF_KEY. This would allow the daemons to include the security context
in the negotiation, so that the resultant association is unique to
that security context.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch contains a fix for the previous patch that adds security
contexts to IPsec policies and security associations. In the previous
patch, no authorization (besides the check for write permissions to
SAD and SPD) is required to delete IPsec policies and security
assocations with security contexts. Thus a user authorized to change
SAD and SPD can bypass the IPsec policy authorization by simply
deleteing policies with security contexts. To fix this security hole,
an additional authorization check is added for removing security
policies and security associations with security contexts.
Note that if no security context is supplied on add or present on
policy to be deleted, the SELinux module allows the change
unconditionally. The hook is called on deletion when no context is
present, which we may want to change. At present, I left it up to the
module.
LSM changes:
The patch adds two new LSM hooks: xfrm_policy_delete and
xfrm_state_delete. The new hooks are necessary to authorize deletion
of IPsec policies that have security contexts. The existing hooks
xfrm_policy_free and xfrm_state_free lack the context to do the
authorization, so I decided to split authorization of deletion and
memory management of security data, as is typical in the LSM
interface.
Use:
The new delete hooks are checked when xfrm_policy or xfrm_state are
deleted by either the xfrm_user interface (xfrm_get_policy,
xfrm_del_sa) or the pfkey interface (pfkey_spddelete, pfkey_delete).
SELinux changes:
The new policy_delete and state_delete functions are added.
Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com>
Signed-off-by: Trent Jaeger <tjaeger@cse.psu.edu>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We often just do an atomic_dec(&x->refcnt) on an xfrm_state object
because we know there is more than 1 reference remaining and thus
we can elide the heavier xfrm_state_put() call.
Do this behind an inline function called __xfrm_state_put() so that is
more obvious and also to allow us to more cleanly add refcount
debugging later.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When returning a message to userspace in reply to a SADB_FLUSH or
SADB_X_SPDFLUSH message, the type was not set for the returned PFKEY
message. The patch below corrects this problem.
Signed-off-by: Jerome Borsboom <j.borsboom@erasmusmc.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
net: Use <linux/capability.h> where capable() is used.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This changes some simple "if (x) BUG();" statements to "BUG_ON(x);"
Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
I noticed that some of 'struct proto_ops' used in the kernel may share
a cache line used by locks or other heavily modified data. (default
linker alignement is 32 bytes, and L1_CACHE_LINE is 64 or 128 at
least)
This patch makes sure a 'struct proto_ops' can be declared as const,
so that all cpus can share all parts of it without false sharing.
This is not mandatory : a driver can still use a read/write structure
if it needs to (and eventually a __read_mostly)
I made a global stubstitute to change all existing occurences to make
them const.
This should reduce the possibility of false sharing on SMP, and
speedup some socket system calls.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch series implements per packet access control via the
extension of the Linux Security Modules (LSM) interface by hooks in
the XFRM and pfkey subsystems that leverage IPSec security
associations to label packets. Extensions to the SELinux LSM are
included that leverage the patch for this purpose.
This patch implements the changes necessary to the XFRM subsystem,
pfkey interface, ipv4/ipv6, and xfrm_user interface to restrict a
socket to use only authorized security associations (or no security
association) to send/receive network packets.
Patch purpose:
The patch is designed to enable access control per packets based on
the strongly authenticated IPSec security association. Such access
controls augment the existing ones based on network interface and IP
address. The former are very coarse-grained, and the latter can be
spoofed. By using IPSec, the system can control access to remote
hosts based on cryptographic keys generated using the IPSec mechanism.
This enables access control on a per-machine basis or per-application
if the remote machine is running the same mechanism and trusted to
enforce the access control policy.
Patch design approach:
The overall approach is that policy (xfrm_policy) entries set by
user-level programs (e.g., setkey for ipsec-tools) are extended with a
security context that is used at policy selection time in the XFRM
subsystem to restrict the sockets that can send/receive packets via
security associations (xfrm_states) that are built from those
policies.
A presentation available at
www.selinux-symposium.org/2005/presentations/session2/2-3-jaeger.pdf
from the SELinux symposium describes the overall approach.
Patch implementation details:
On output, the policy retrieved (via xfrm_policy_lookup or
xfrm_sk_policy_lookup) must be authorized for the security context of
the socket and the same security context is required for resultant
security association (retrieved or negotiated via racoon in
ipsec-tools). This is enforced in xfrm_state_find.
On input, the policy retrieved must also be authorized for the socket
(at __xfrm_policy_check), and the security context of the policy must
also match the security association being used.
The patch has virtually no impact on packets that do not use IPSec.
The existing Netfilter (outgoing) and LSM rcv_skb hooks are used as
before.
Also, if IPSec is used without security contexts, the impact is
minimal. The LSM must allow such policies to be selected for the
combination of socket and remote machine, but subsequent IPSec
processing proceeds as in the original case.
Testing:
The pfkey interface is tested using the ipsec-tools. ipsec-tools have
been modified (a separate ipsec-tools patch is available for version
0.5) that supports assignment of xfrm_policy entries and security
associations with security contexts via setkey and the negotiation
using the security contexts via racoon.
The xfrm_user interface is tested via ad hoc programs that set
security contexts. These programs are also available from me, and
contain programs for setting, getting, and deleting policy for testing
this interface. Testing of sa functions was done by tracing kernel
behavior.
Signed-off-by: Trent Jaeger <tjaeger@cse.psu.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
- added typedef unsigned int __nocast gfp_t;
- replaced __nocast uses for gfp flags with gfp_t - it gives exactly
the same warnings as far as sparse is concerned, doesn't change
generated code (from gcc point of view we replaced unsigned int with
typedef) and documents what's going on far better.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here is a patch that adds a helper called xfrm_policy_id2dir to
document the fact that the policy direction can be and is derived
from the index.
This is based on a patch by YOSHIFUJI Hideaki and 210313105@suda.edu.cn.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix implicit nocast warnings in net/key code:
net/key/af_key.c:195:27: warning: implicit cast to nocast type
net/key/af_key.c:1439:28: warning: implicit cast to nocast type
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the flag XFRM_STATE_NOPMTUDISC for xfrm states. It is
similar to the nopmtudisc on IPIP/GRE tunnels. It only has an effect
on IPv4 tunnel mode states. For these states, it will ensure that the
DF flag is always cleared.
This is primarily useful to work around ICMP blackholes.
In future this flag could also allow a larger MTU to be set within the
tunnel just like IPIP/GRE tunnels. This could be useful for short haul
tunnels where temporary fragmentation outside the tunnel is desired over
smaller fragments inside the tunnel.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jmorris@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds xfrm_init_state which is simply a wrapper that calls
xfrm_get_type and subsequently x->type->init_state. It also gets rid
of the unused args argument.
Abstracting it out allows us to add common initialisation code, e.g.,
to set family-specific flags.
The add_time setting in xfrm_user.c was deleted because it's already
set by xfrm_state_alloc.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jmorris@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu wrote:
> @@ -1254,6 +1326,7 @@ static int pfkey_add(struct sock *sk, st
> if (IS_ERR(x))
> return PTR_ERR(x);
>
> + xfrm_state_hold(x);
This introduces a leak when xfrm_state_add()/xfrm_state_update()
fail. We hold two references (one from xfrm_state_alloc(), one
from xfrm_state_hold()), but only drop one. We need to take the
reference because the reference from xfrm_state_alloc() can
be dropped by __xfrm_state_delete(), so the fix is to drop both
references on error. Same problem in xfrm_user.c.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes XFRM_SAP_* and converts them over to XFRM_MSG_*.
The netlink interface is meant to map directly onto the underlying
xfrm subsystem. Therefore rather than using a new independent
representation for the events we can simply use the existing ones
from xfrm_user.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adjusts the SA state conversion in af_key such that
XFRM_STATE_ERROR/XFRM_STATE_DEAD will be converted to SADB_STATE_DEAD
instead of SADB_STATE_DYING.
According to RFC 2367, SADB_STATE_DYING SAs can be turned into
mature ones through updating their lifetime settings. Since SAs
which are in the states XFRM_STATE_ERROR/XFRM_STATE_DEAD cannot
be resurrected, this value is unsuitable.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Heres the final patch.
What this patch provides
- netlink xfrm events
- ability to have events generated by netlink propagated to pfkey
and vice versa.
- fixes the acquire lets-be-happy-with-one-success issue
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!