Add an RCU payload dereference macro as this seems to be a common piece of code
amongst key types that use RCU referenced payloads.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Cleanups based on comments from Sam Ravnborg,
* remove references to the currently unused af_names.h
* add rlim_names.h to clean-files:
* rework cmd_make-XXX to make them more readable by adding comments,
reworking the expressions to put logical components on individual lines,
and keep lines < 80 characters.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Netlink message processing in the kernel is synchronous these days,
capabilities can be checked directly in security_netlink_recv() from
the current process.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Reviewed-by: James Morris <jmorris@namei.org>
[chrisw: update to include pohmelfs and uvesafb]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For SELinux we do not allow security information to change during a remount
operation. Thus this hook simply strips the security module options from
the data and verifies that those are the same options as exist on the
current superblock.
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
The VFS mount code passes the mount options to the LSM. The LSM will remove
options it understands from the data and the VFS will then pass the remaining
options onto the underlying filesystem. This is how options like the
SELinux context= work. The problem comes in that -o remount never calls
into LSM code. So if you include an LSM specific option it will get passed
to the filesystem and will cause the remount to fail. An example of where
this is a problem is the 'seclabel' option. The SELinux LSM hook will
print this word in /proc/mounts if the filesystem is being labeled using
xattrs. If you pass this word on mount it will be silently stripped and
ignored. But if you pass this word on remount the LSM never gets called
and it will be passed to the FS. The FS doesn't know what seclabel means
and thus should fail the mount. For example an ext3 fs mounted over loop
# mount -o loop /tmp/fs /mnt/tmp
# cat /proc/mounts | grep /mnt/tmp
/dev/loop0 /mnt/tmp ext3 rw,seclabel,relatime,errors=continue,barrier=0,data=ordered 0 0
# mount -o remount /mnt/tmp
mount: /mnt/tmp not mounted already, or bad option
# dmesg
EXT3-fs (loop0): error: unrecognized mount option "seclabel" or missing value
This patch passes the remount mount options to an new LSM hook.
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
The security context for the newly created socket shares the same
user, role and MLS attribute as its creator but may have a different
type, which could be specified by a type_transition rule in the relevant
policy package.
Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
[fix call to security_transition_sid to include qstr, Eric Paris]
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
The socket SID would be computed on creation and no longer inherit
its creator's SID by default. Socket may have a different type but
needs to retain the creator's role and MLS attribute in order not
to break labeled networking and network access control.
The kernel value for a class would be used to determine if the class
if one of socket classes. If security_compute_sid is called from
userspace the policy value for a class would be mapped to the relevant
kernel value first.
Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
The security_is_socket_class() is auto-generated by genheaders based
on classmap.h to reduce maintenance effort when a new class is defined
in SELinux kernel. The name for any socket class should be suffixed by
"socket" and doesn't contain more than one substr of "socket".
Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Netlink message processing in the kernel is synchronous these days, the
session information can be collected when needed.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In tomoyo_check_open_permission() since 2.6.36, TOMOYO was by error
recalculating already calculated pathname when checking allow_rewrite
permission. As a result, memory will leak whenever a file is opened for writing
without O_APPEND flag. Also, performance will degrade because TOMOYO is
calculating pathname regardless of profile configuration.
This patch fixes the leak and performance degrade.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <jmorris@namei.org>
This reverts commit 242631c49d.
Conflicts:
security/selinux/hooks.c
SELinux used to recognize certain individual ioctls and check
permissions based on the knowledge of the individual ioctl. In commit
242631c49d the SELinux code stopped trying to understand
individual ioctls and to instead looked at the ioctl access bits to
determine in we should check read or write for that operation. This
same suggestion was made to SMACK (and I believe copied into TOMOYO).
But this suggestion is total rubbish. The ioctl access bits are
actually the access requirements for the structure being passed into the
ioctl, and are completely unrelated to the operation of the ioctl or the
object the ioctl is being performed upon.
Take FS_IOC_FIEMAP as an example. FS_IOC_FIEMAP is defined as:
FS_IOC_FIEMAP _IOWR('f', 11, struct fiemap)
So it has access bits R and W. What this really means is that the
kernel is going to both read and write to the struct fiemap. It has
nothing at all to do with the operations that this ioctl might perform
on the file itself!
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
These permissions are not used and can be dropped in the kernel
definitions.
Suggested-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
The IPSKB_FORWARDED and IP6SKB_FORWARDED flags are used only in the
multicast forwarding case to indicate that a packet looped back after
forward. So these flags are not a good indicator for packet forwarding.
A better indicator is the incoming interface. If we have no socket context,
but an incoming interface and we see the packet in the ip postroute hook,
the packet is going to be forwarded.
With this patch we use the incoming interface as an indicator on packet
forwarding.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
selinux_sock_rcv_skb_compat and selinux_ip_postroute_compat are just
called if selinux_policycap_netpeer is not set. However in these
functions we check if selinux_policycap_netpeer is set. This leads
to some dead code and to the fact that selinux_xfrm_postroute_last
is never executed. This patch removes the dead code and the checks
for selinux_policycap_netpeer in the compatibility functions.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
selinux_xfrm_sec_ctx_alloc accidentally checks the xfrm domain of
interpretation against the selinux context algorithm. This patch
fixes this by checking ctx_alg against the selinux context algorithm.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
The original ima_must_measure() function based its results on cached
iint information, which required an iint be allocated for all files.
Currently, an iint is allocated only for files in policy. As a result,
for those files in policy, ima_must_measure() is now called twice: once
to determine if the inode is in the measurement policy and, the second
time, to determine if it needs to be measured/re-measured.
The second call to ima_must_measure() unnecessarily checks to see if
the file is in policy. As we already know the file is in policy, this
patch removes the second unnecessary call to ima_must_measure(), removes
the vestige iint parameter, and just checks the iint directly to determine
if the inode has been measured or needs to be measured/re-measured.
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Expand security_capable() to include cred, so that it can be usable in a
wider range of call sites.
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
Now that i_readcount is maintained by the VFS layer, remove the
imbalance checking in IMA. Cleans up the IMA code nicely.
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
ima_counts_get() updated the readcount and invalidated the PCR,
as necessary. Only update the i_readcount in the VFS layer.
Move the PCR invalidation checks to ima_file_check(), where it
belongs.
Maintaining the i_readcount in the VFS layer, will allow other
subsystems to use i_readcount.
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com>
The mmap policy enforcement checks the access of the
SMACK64MMAP subject against the current subject incorrectly.
The check as written works correctly only if the access
rules involved have the same access. This is the common
case, so initial testing did not find a problem.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
The mmap policy enforcement was not properly handling the
interaction between the global and local rule lists.
Instead of going through one and then the other, which
missed the important case where a rule specified that
there should be no access, combine the access limitations
where there is a rule in each list.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <jmorris@namei.org>
In cred_alloc_blank() since 2.6.32, abort_creds(new) is called with
new->security == NULL and new->magic == 0 when security_cred_alloc_blank()
returns an error. As a result, BUG() will be triggered if SELinux is enabled
or CONFIG_DEBUG_CREDENTIALS=y.
If CONFIG_DEBUG_CREDENTIALS=y, BUG() is called from __invalid_creds() because
cred->magic == 0. Failing that, BUG() is called from selinux_cred_free()
because selinux_cred_free() is not expecting cred->security == NULL. This does
not affect smack_cred_free(), tomoyo_cred_free() or apparmor_cred_free().
Fix these bugs by
(1) Set new->magic before calling security_cred_alloc_blank().
(2) Handle null cred->security in creds_are_invalid() and selinux_cred_free().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Both settimeofday() and clock_settime() promise with a 'const'
attribute not to alter the arguments passed in. This patch adds the
missing 'const' attribute into the various kernel functions
implementing these calls.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Acked-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <20110201134417.545698637@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The only user for this hook was selinux. sysctl routes every call
through /proc/sys/. Selinux and other security modules use the file
system checks for sysctl too, so no need for this hook any more.
Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
This fixes an old (2007) selinux regression: filesystem labeling for
/proc/sys returned
-r--r--r-- unknown /proc/sys/fs/file-nr
instead of
-r--r--r-- system_u:object_r:sysctl_fs_t:s0 /proc/sys/fs/file-nr
Events that lead to breaking of /proc/sys/ selinux labeling:
1) sysctl was reimplemented to route all calls through /proc/sys/
commit 77b14db502
[PATCH] sysctl: reimplement the sysctl proc support
2) proc_dir_entry was removed from ctl_table:
commit 3fbfa98112
[PATCH] sysctl: remove the proc_dir_entry member for the sysctl tables
3) selinux still walked the proc_dir_entry tree to apply
labeling. Because ctl_tables don't have a proc_dir_entry, we did
not label /proc/sys/ inodes any more. To achieve this the /proc/sys/
inodes were marked private and private inodes were ignored by
selinux.
commit bbaca6c2e7
[PATCH] selinux: enhance selinux to always ignore private inodes
commit 86a71dbd3e
[PATCH] sysctl: hide the sysctl proc inodes from selinux
Access control checks have been done by means of a special sysctl hook
that was called for read/write accesses to any /proc/sys/ entry.
We don't have to do this because, instead of walking the
proc_dir_entry tree we can walk the dentry tree (as done in this
patch). With this patch:
* we don't mark /proc/sys/ inodes as private
* we don't need the sysclt security hook
* we walk the dentry tree to find the path to the inode.
We have to strip the PID in /proc/PID/ entries that have a
proc_dir_entry because selinux does not know how to label paths like
'/1/net/rpc/nfsd.fh' (and defaults to 'proc_t' labeling). Selinux does
know of '/net/rpc/nfsd.fh' (and applies the 'sysctl_rpc_t' label).
PID stripping from the path was done implicitly in the previous code
because the proc_dir_entry tree had the root in '/net' in the example
from above. The dentry tree has the root in '/1'.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Currently SELinux has rules which label new objects according to 3 criteria.
The label of the process creating the object, the label of the parent
directory, and the type of object (reg, dir, char, block, etc.) This patch
adds a 4th criteria, the dentry name, thus we can distinguish between
creating a file in an etc_t directory called shadow and one called motd.
There is no file globbing, regex parsing, or anything mystical. Either the
policy exactly (strcmp) matches the dentry name of the object or it doesn't.
This patch has no changes from today if policy does not implement the new
rules.
Signed-off-by: Eric Paris <eparis@redhat.com>
SELinux would like to implement a new labeling behavior of newly created
inodes. We currently label new inodes based on the parent and the creating
process. This new behavior would also take into account the name of the
new object when deciding the new label. This is not the (supposed) full path,
just the last component of the path.
This is very useful because creating /etc/shadow is different than creating
/etc/passwd but the kernel hooks are unable to differentiate these
operations. We currently require that userspace realize it is doing some
difficult operation like that and than userspace jumps through SELinux hoops
to get things set up correctly. This patch does not implement new
behavior, that is obviously contained in a seperate SELinux patch, but it
does pass the needed name down to the correct LSM hook. If no such name
exists it is fine to pass NULL.
Signed-off-by: Eric Paris <eparis@redhat.com>
Fix __key_link_end()'s attempt to fix up the quota if an error occurs.
There are two erroneous cases: Firstly, we always decrease the quota if
the preallocated replacement keyring needs cleaning up, irrespective of
whether or not we should (we may have replaced a pointer rather than
adding another pointer).
Secondly, we never clean up the quota if we added a pointer without the
keyring storage being extended (we allocate multiple pointers at a time,
even if we're not going to use them all immediately).
We handle this by setting the bottom bit of the preallocation pointer in
__key_link_begin() to indicate that the quota needs fixing up, which is
then passed to __key_link() (which clears the whole thing) and
__key_link_end().
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One failure path in security/keys/trusted.c::trusted_update() does
not free 'new_p' while the others do. This patch makes sure we also free
it in the remaining path (if datablob_parse() returns different from
Opt_update).
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: James Morris <jmorris@namei.org>
Add calls to path-based security hooks into CacheFiles as, unlike inode-based
security, these aren't implicit in the vfs_mkdir() and similar calls.
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Rename encrypted_defined.c and encrypted_defined.h files to encrypted.c and
encrypted.h, respectively. Based on request from David Howells.
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Rename trusted_defined.c and trusted_defined.h files to trusted.c and
trusted.h, respectively. Based on request from David Howells.
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Fix up comments in the key management code. No functional changes.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Do a bit of a style clean up in the key management code. No functional
changes.
Done using:
perl -p -i -e 's!^/[*]*/\n!!' security/keys/*.c
perl -p -i -e 's!} /[*] end [a-z0-9_]*[(][)] [*]/\n!}\n!' security/keys/*.c
sed -i -s -e ": next" -e N -e 's/^\n[}]$/}/' -e t -e P -e 's/^.*\n//' -e "b next" security/keys/*.c
To remove /*****/ lines, remove comments on the closing brace of a
function to name the function and remove blank lines before the closing
brace of a function.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We can avoid scattering va_end() within the
va_start();
for (;;) {
}
va_end();
loop, assuming that crypto_shash_init()/crypto_shash_update() return 0 on
success and negative value otherwise.
Make TSS_authhmac()/TSS_checkhmac1()/TSS_checkhmac2() similar to TSS_rawhmac()
by removing "va_end()/goto" from the loop.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
TSS_rawhmac() checks for data != NULL before using it.
We should do the same thing for TSS_authhmac().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
TSS_rawhmac() forgot to call va_end()/kfree() when data == NULL and
forgot to call va_end() when crypto_shash_update() < 0.
Fix these bugs by escaping from the loop using "break"
(rather than "return"/"goto") in order to make sure that
va_end()/kfree() are always called.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
In the embedded world there are often situations
where libraries are updated from a variety of sources,
for a variety of reasons, and with any number of
security characteristics. These differences
might include privilege required for a given library
provided interface to function properly, as occurs
from time to time in graphics libraries. There are
also cases where it is important to limit use of
libraries based on the provider of the library and
the security aware application may make choices
based on that criteria.
These issues are addressed by providing an additional
Smack label that may optionally be assigned to an object,
the SMACK64MMAP attribute. An mmap operation is allowed
if there is no such attribute.
If there is a SMACK64MMAP attribute the mmap is permitted
only if a subject with that label has all of the access
permitted a subject with the current task label.
Security aware applications may from time to time
wish to reduce their "privilege" to avoid accidental use
of privilege. One case where this arises is the
environment in which multiple sources provide libraries
to perform the same functions. An application may know
that it should eschew services made available from a
particular vendor, or of a particular version.
In support of this a secondary list of Smack rules has
been added that is local to the task. This list is
consulted only in the case where the global list has
approved access. It can only further restrict access.
Unlike the global last, if no entry is found on the
local list access is granted. An application can add
entries to its own list by writing to /smack/load-self.
The changes appear large as they involve refactoring
the list handling to accomodate there being more
than one rule list.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Add missing kfree(td) in tpm_seal() before the return, freeing
td on error paths as well.
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: David Safford <safford@watson.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: James Morris <jmorris@namei.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (30 commits)
MAINTAINERS: Add tomoyo-dev-en ML.
SELinux: define permissions for DCB netlink messages
encrypted-keys: style and other cleanup
encrypted-keys: verify datablob size before converting to binary
trusted-keys: kzalloc and other cleanup
trusted-keys: additional TSS return code and other error handling
syslog: check cap_syslog when dmesg_restrict
Smack: Transmute labels on specified directories
selinux: cache sidtab_context_to_sid results
SELinux: do not compute transition labels on mountpoint labeled filesystems
This patch adds a new security attribute to Smack called SMACK64EXEC. It defines label that is used while task is running.
SELinux: merge policydb_index_classes and policydb_index_others
selinux: convert part of the sym_val_to_name array to use flex_array
selinux: convert type_val_to_struct to flex_array
flex_array: fix flex_array_put_ptr macro to be valid C
SELinux: do not set automatic i_ino in selinuxfs
selinux: rework security_netlbl_secattr_to_sid
SELinux: standardize return code handling in selinuxfs.c
SELinux: standardize return code handling in selinuxfs.c
SELinux: standardize return code handling in policydb.c
...
Remove kobject.h from files which don't need it, notably,
sched.h and fs.h.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove path.h from sched.h and other files.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:
security/smack/smack_lsm.c
Verified and added fix by Stephen Rothwell <sfr@canb.auug.org.au>
Ok'd by Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <jmorris@namei.org>
Perform common cases of path lookups without any stores or locking in the
ancestor dentry elements. This is called rcu-walk, as opposed to the current
algorithm which is a refcount based walk, or ref-walk.
This results in far fewer atomic operations on every path element,
significantly improving path lookup performance. It also avoids cacheline
bouncing on common dentries, significantly improving scalability.
The overall design is like this:
* LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk.
* Take the RCU lock for the entire path walk, starting with the acquiring
of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are
not required for dentry persistence.
* synchronize_rcu is called when unregistering a filesystem, so we can
access d_ops and i_ops during rcu-walk.
* Similarly take the vfsmount lock for the entire path walk. So now mnt
refcounts are not required for persistence. Also we are free to perform mount
lookups, and to assume dentry mount points and mount roots are stable up and
down the path.
* Have a per-dentry seqlock to protect the dentry name, parent, and inode,
so we can load this tuple atomically, and also check whether any of its
members have changed.
* Dentry lookups (based on parent, candidate string tuple) recheck the parent
sequence after the child is found in case anything changed in the parent
during the path walk.
* inode is also RCU protected so we can load d_inode and use the inode for
limited things.
* i_mode, i_uid, i_gid can be tested for exec permissions during path walk.
* i_op can be loaded.
When we reach the destination dentry, we lock it, recheck lookup sequence,
and increment its refcount and mountpoint refcount. RCU and vfsmount locks
are dropped. This is termed "dropping rcu-walk". If the dentry refcount does
not match, we can not drop rcu-walk gracefully at the current point in the
lokup, so instead return -ECHILD (for want of a better errno). This signals the
path walking code to re-do the entire lookup with a ref-walk.
Aside from the final dentry, there are other situations that may be encounted
where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take
a reference on the last good dentry) and continue with a ref-walk. Again, if
we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup
using ref-walk. But it is very important that we can continue with ref-walk
for most cases, particularly to avoid the overhead of double lookups, and to
gain the scalability advantages on common path elements (like cwd and root).
The cases where rcu-walk cannot continue are:
* NULL dentry (ie. any uncached path element)
* parent with d_inode->i_op->permission or ACLs
* dentries with d_revalidate
* Following links
In future patches, permission checks and d_revalidate become rcu-walk aware. It
may be possible eventually to make following links rcu-walk aware.
Uncached path elements will always require dropping to ref-walk mode, at the
very least because i_mutex needs to be grabbed, and objects allocated.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
dget_locked was a shortcut to avoid the lazy lru manipulation when we already
held dcache_lock (lru manipulation was relatively cheap at that point).
However, how that the lru lock is an innermost one, we never hold it at any
caller, so the lock cost can now be avoided. We already have well working lazy
dcache LRU, so it should be fine to defer LRU manipulations to scan time.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Protect d_subdirs and d_child with d_lock, except in filesystems that aren't
using dcache_lock for these anyway (eg. using i_mutex).
Note: if we change the locking rule in future so that ->d_child protection is
provided only with ->d_parent->d_lock, it may allow us to reduce some locking.
But it would be an exception to an otherwise regular locking scheme, so we'd
have to see some good results. Probably not worthwhile.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Protect d_unhashed(dentry) condition with d_lock. This means keeping
DCACHE_UNHASHED bit in synch with hash manipulations.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1436 commits)
cassini: Use local-mac-address prom property for Cassini MAC address
net: remove the duplicate #ifdef __KERNEL__
net: bridge: check the length of skb after nf_bridge_maybe_copy_header()
netconsole: clarify stopping message
netconsole: don't announce stopping if nothing happened
cnic: Fix the type field in SPQ messages
netfilter: fix export secctx error handling
netfilter: fix the race when initializing nf_ct_expect_hash_rnd
ipv4: IP defragmentation must be ECN aware
net: r6040: Return proper error for r6040_init_one
dcb: use after free in dcb_flushapp()
dcb: unlock on error in dcbnl_ieee_get()
net: ixp4xx_eth: Return proper error for eth_init_one
include/linux/if_ether.h: Add #define ETH_P_LINK_CTL for HPNA and wlan local tunnel
net: add POLLPRI to sock_def_readable()
af_unix: Avoid socket->sk NULL OOPS in stream connect security hooks.
net_sched: pfifo_head_drop problem
mac80211: remove stray extern
mac80211: implement off-channel TX using hw r-o-c offload
mac80211: implement hardware offload for remain-on-channel
...
unix_release() can asynchornously set socket->sk to NULL, and
it does so without holding the unix_state_lock() on "other"
during stream connects.
However, the reverse mapping, sk->sk_socket, is only transitioned
to NULL under the unix_state_lock().
Therefore make the security hooks follow the reverse mapping instead
of the forward mapping.
Reported-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If security_filter_rule_init() doesn't return a rule, then not everything
is as fine as the return code implies.
This bug only occurs when the LSM (eg. SELinux) is disabled at runtime.
Adding an empty LSM rule causes ima_match_rules() to always succeed,
ignoring any remaining rules.
default IMA TCB policy:
# PROC_SUPER_MAGIC
dont_measure fsmagic=0x9fa0
# SYSFS_MAGIC
dont_measure fsmagic=0x62656572
# DEBUGFS_MAGIC
dont_measure fsmagic=0x64626720
# TMPFS_MAGIC
dont_measure fsmagic=0x01021994
# SECURITYFS_MAGIC
dont_measure fsmagic=0x73636673
< LSM specific rule >
dont_measure obj_type=var_log_t
measure func=BPRM_CHECK
measure func=FILE_MMAP mask=MAY_EXEC
measure func=FILE_CHECK mask=MAY_READ uid=0
Thus without the patch, with the boot parameters 'tcb selinux=0', adding
the above 'dont_measure obj_type=var_log_t' rule to the default IMA TCB
measurement policy, would result in nothing being measured. The patch
prevents the default TCB policy from being replaced.
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Cc: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: David Safford <safford@watson.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In construct_alloc_key(), up_write() is called in the error path if
__key_link_begin() fails, but this is incorrect as __key_link_begin() only
returns with the nominated keyring locked if it returns successfully.
Without this patch, you might see the following in dmesg:
=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
mount.cifs/5769 is trying to release lock (&key->sem) at:
[<ffffffff81201159>] request_key_and_link+0x263/0x3fc
but there are no more locks to release!
other info that might help us debug this:
3 locks held by mount.cifs/5769:
#0: (&type->s_umount_key#41/1){+.+.+.}, at: [<ffffffff81131321>] sget+0x278/0x3e7
#1: (&ret_buf->session_mutex){+.+.+.}, at: [<ffffffffa0258e59>] cifs_get_smb_ses+0x35a/0x443 [cifs]
#2: (root_key_user.cons_lock){+.+.+.}, at: [<ffffffff81201000>] request_key_and_link+0x10a/0x3fc
stack backtrace:
Pid: 5769, comm: mount.cifs Not tainted 2.6.37-rc6+ #1
Call Trace:
[<ffffffff81201159>] ? request_key_and_link+0x263/0x3fc
[<ffffffff81081601>] print_unlock_inbalance_bug+0xca/0xd5
[<ffffffff81083248>] lock_release_non_nested+0xc1/0x263
[<ffffffff81201159>] ? request_key_and_link+0x263/0x3fc
[<ffffffff81201159>] ? request_key_and_link+0x263/0x3fc
[<ffffffff81083567>] lock_release+0x17d/0x1a4
[<ffffffff81073f45>] up_write+0x23/0x3b
[<ffffffff81201159>] request_key_and_link+0x263/0x3fc
[<ffffffffa026fe9e>] ? cifs_get_spnego_key+0x61/0x21f [cifs]
[<ffffffff812013c5>] request_key+0x41/0x74
[<ffffffffa027003d>] cifs_get_spnego_key+0x200/0x21f [cifs]
[<ffffffffa026e296>] CIFS_SessSetup+0x55d/0x1273 [cifs]
[<ffffffffa02589e1>] cifs_setup_session+0x90/0x1ae [cifs]
[<ffffffffa0258e7e>] cifs_get_smb_ses+0x37f/0x443 [cifs]
[<ffffffffa025a9e3>] cifs_mount+0x1aa1/0x23f3 [cifs]
[<ffffffff8111fd94>] ? alloc_debug_processing+0xdb/0x120
[<ffffffffa027002c>] ? cifs_get_spnego_key+0x1ef/0x21f [cifs]
[<ffffffffa024cc71>] cifs_do_mount+0x165/0x2b3 [cifs]
[<ffffffff81130e72>] vfs_kern_mount+0xaf/0x1dc
[<ffffffff81131007>] do_kern_mount+0x4d/0xef
[<ffffffff811483b9>] do_mount+0x6f4/0x733
[<ffffffff8114861f>] sys_mount+0x88/0xc2
[<ffffffff8100ac42>] system_call_fastpath+0x16/0x1b
Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:
MAINTAINERS
arch/arm/mach-omap2/pm24xx.c
drivers/scsi/bfa/bfa_fcpim.c
Needed to update to apply fixes for which the old branch was too
outdated.
Commit 2f90b865 added two new netlink message types to the netlink route
socket. SELinux has hooks to define if netlink messages are allowed to
be sent or received, but it did not know about these two new message
types. By default we allow such actions so noone likely noticed. This
patch adds the proper definitions and thus proper permissions
enforcement.
Signed-off-by: Eric Paris <eparis@redhat.com>
Cleanup based on David Howells suggestions:
- use static const char arrays instead of #define
- rename init_sdesc to alloc_sdesc
- convert 'unsigned int' definitions to 'size_t'
- revert remaining 'const unsigned int' definitions to 'unsigned int'
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Verify the hex ascii datablob length is correct before converting the IV,
encrypted data, and HMAC to binary.
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Cleanup based on David Howells suggestions:
- replace kzalloc, where possible, with kmalloc
- revert 'const unsigned int' definitions to 'unsigned int'
Signed-off-by: David Safford <safford@watson.ibm.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Previously not all TSS return codes were tested, as they were all eventually
caught by the TPM. Now all returns are tested and handled immediately.
This patch also fixes memory leaks in error and non-error paths.
Signed-off-by: David Safford <safford@watson.ibm.com>
Acked-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: James Morris <jmorris@namei.org>
In a situation where Smack access rules allow processes
with multiple labels to write to a directory it is easy
to get into a situation where the directory gets cluttered
with files that the owner can't deal with because while
they could be written to the directory a process at the
label of the directory can't write them. This is generally
the desired behavior, but when it isn't it is a real
issue.
This patch introduces a new attribute SMACK64TRANSMUTE that
instructs Smack to create the file with the label of the directory
under certain circumstances.
A new access mode, "t" for transmute, is made available to
Smack access rules, which are expanded from "rwxa" to "rwxat".
If a file is created in a directory marked as transmutable
and if access was granted to perform the operation by a rule
that included the transmute mode, then the file gets the
Smack label of the directory instead of the Smack label of the
creating process.
Note that this is equivalent to creating an empty file at the
label of the directory and then having the other process write
to it. The transmute scheme requires that both the access rule
allows transmutation and that the directory be explicitly marked.
Signed-off-by: Jarkko Sakkinen <ext-jarkko.2.sakkinen@nokia.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
sidtab_context_to_sid takes up a large share of time when creating large
numbers of new inodes (~30-40% in oprofile runs). This patch implements a
cache of 3 entries which is checked before we do a full context_to_sid lookup.
On one system this showed over a x3 improvement in the number of inodes that
could be created per second and around a 20% improvement on another system.
Any time we look up the same context string sucessivly (imagine ls -lZ) we
should hit this cache hot. A cache miss should have a relatively minor affect
on performance next to doing the full table search.
All operations on the cache are done COMPLETELY lockless. We know that all
struct sidtab_node objects created will never be deleted until a new policy is
loaded thus we never have to worry about a pointer being dereferenced. Since
we also know that pointer assignment is atomic we know that the cache will
always have valid pointers. Given this information we implement a FIFO cache
in an array of 3 pointers. Every result (whether a cache hit or table lookup)
will be places in the 0 spot of the cache and the rest of the entries moved
down one spot. The 3rd entry will be lost.
Races are possible and are even likely to happen. Lets assume that 4 tasks
are hitting sidtab_context_to_sid. The first task checks against the first
entry in the cache and it is a miss. Now lets assume a second task updates
the cache with a new entry. This will push the first entry back to the second
spot. Now the first task might check against the second entry (which it
already checked) and will miss again. Now say some third task updates the
cache and push the second entry to the third spot. The first task my check
the third entry (for the third time!) and again have a miss. At which point
it will just do a full table lookup. No big deal!
Signed-off-by: Eric Paris <eparis@redhat.com>
selinux_inode_init_security computes transitions sids even for filesystems
that use mount point labeling. It shouldn't do that. It should just use
the mount point label always and no matter what.
This causes 2 problems. 1) it makes file creation slower than it needs to be
since we calculate the transition sid and 2) it allows files to be created
with a different label than the mount point!
# id -Z
staff_u:sysadm_r:sysadm_t:s0-s0:c0.c1023
# sesearch --type --class file --source sysadm_t --target tmp_t
Found 1 semantic te rules:
type_transition sysadm_t tmp_t : file user_tmp_t;
# mount -o loop,context="system_u:object_r:tmp_t:s0" /tmp/fs /mnt/tmp
# ls -lZ /mnt/tmp
drwx------. root root system_u:object_r:tmp_t:s0 lost+found
# touch /mnt/tmp/file1
# ls -lZ /mnt/tmp
-rw-r--r--. root root staff_u:object_r:user_tmp_t:s0 file1
drwx------. root root system_u:object_r:tmp_t:s0 lost+found
Whoops, we have a mount point labeled filesystem tmp_t with a user_tmp_t
labeled file!
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Reviewed-by: James Morris <jmorris@namei.org>
SMACK64EXEC. It defines label that is used while task is
running.
Exception: in smack_task_wait() child task is checked
for write access to parent task using label inherited
from the task that forked it.
Fixed issues from previous submit:
- SMACK64EXEC was not read when SMACK64 was not set.
- inode security blob was not updated after setting
SMACK64EXEC
- inode security blob was not updated when removing
SMACK64EXEC
We duplicate functionality in policydb_index_classes() and
policydb_index_others(). This patch merges those functions just to make it
clear there is nothing special happening here.
Signed-off-by: Eric Paris <eparis@redhat.com>
The sym_val_to_name type array can be quite large as it grows linearly with
the number of types. With known policies having over 5k types these
allocations are growing large enough that they are likely to fail. Convert
those to flex_array so no allocation is larger than PAGE_SIZE
Signed-off-by: Eric Paris <eparis@redhat.com>
In rawhide type_val_to_struct will allocate 26848 bytes, an order 3
allocations. While this hasn't been seen to fail it isn't outside the
realm of possibiliy on systems with severe memory fragmentation. Convert
to flex_array so no allocation will ever be bigger than PAGE_SIZE.
Signed-off-by: Eric Paris <eparis@redhat.com>
selinuxfs carefully uses i_ino to figure out what the inode refers to. The
VFS used to generically set this value and we would reset it to something
useable. After 85fe4025c6 each filesystem sets this value to a default
if needed. Since selinuxfs doesn't use the default value and it can only
lead to problems (I'd rather have 2 inodes with i_ino == 0 than one
pointing to the wrong data) lets just stop setting a default.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
security_netlbl_secattr_to_sid is difficult to follow, especially the
return codes. Try to make the function obvious.
Signed-off-by: Eric Paris <eparis@redhat.com>
selinuxfs.c has lots of different standards on how to handle return paths on
error. For the most part transition to
rc=errno
if (failure)
goto out;
[...]
out:
cleanup()
return rc;
Instead of doing cleanup mid function, or having multiple returns or other
options. This doesn't do that for every function, but most of the complex
functions which have cleanup routines on error.
Signed-off-by: Eric Paris <eparis@redhat.com>
selinuxfs.c has lots of different standards on how to handle return paths on
error. For the most part transition to
rc=errno
if (failure)
goto out;
[...]
out:
cleanup()
return rc;
Instead of doing cleanup mid function, or having multiple returns or other
options. This doesn't do that for every function, but most of the complex
functions which have cleanup routines on error.
Signed-off-by: Eric Paris <eparis@redhat.com>
policydb.c has lots of different standards on how to handle return paths on
error. For the most part transition to
rc=errno
if (failure)
goto out;
[...]
out:
cleanup()
return rc;
Instead of doing cleanup mid function, or having multiple returns or other
options. This doesn't do that for every function, but most of the complex
functions which have cleanup routines on error.
Signed-off-by: Eric Paris <eparis@redhat.com>
This patch fixes the linux-next powerpc build errors as reported by
Stephen Rothwell.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Tested-by: Rajiv Andrade <srajiv@linux.vnet.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
This patch addresses a number of long standing issues
with the way Smack treats UNIX domain sockets.
All access control was being done based on the label of
the file system object. This is inconsistant with the
internet domain, in which access is done based on the
IPIN and IPOUT attributes of the socket. As a result
of the inode label policy it was not possible to use
a UDS socket for label cognizant services, including
dbus and the X11 server.
Support for SCM_PEERSEC on UDS sockets is also provided.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: James Morris <jmorris@namei.org>
Define a new kernel key-type called 'encrypted'. Encrypted keys are kernel
generated random numbers, which are encrypted/decrypted with a 'trusted'
symmetric key. Encrypted keys are created/encrypted/decrypted in the kernel.
Userspace only ever sees/stores encrypted blobs.
Changelog:
- bug fix: replaced master-key rcu based locking with semaphore
(reported by David Howells)
- Removed memset of crypto_shash_digest() digest output
- Replaced verification of 'key-type:key-desc' using strcspn(), with
one based on string constants.
- Moved documentation to Documentation/keys-trusted-encrypted.txt
- Replace hash with shash (based on comments by David Howells)
- Make lengths/counts size_t where possible (based on comments by David Howells)
Could not convert most lengths, as crypto expects 'unsigned int'
(size_t: on 32 bit is defined as unsigned int, but on 64 bit is unsigned long)
- Add 'const' where possible (based on comments by David Howells)
- allocate derived_buf dynamically to support arbitrary length master key
(fixed by Roberto Sassu)
- wait until late_initcall for crypto libraries to be registered
- cleanup security/Kconfig
- Add missing 'update' keyword (reported/fixed by Roberto Sassu)
- Free epayload on failure to create key (reported/fixed by Roberto Sassu)
- Increase the data size limit (requested by Roberto Sassu)
- Crypto return codes are always 0 on success and negative on failure,
remove unnecessary tests.
- Replaced kzalloc() with kmalloc()
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: David Safford <safford@watson.ibm.com>
Reviewed-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: James Morris <jmorris@namei.org>
Define a new kernel key-type called 'trusted'. Trusted keys are random
number symmetric keys, generated and RSA-sealed by the TPM. The TPM
only unseals the keys, if the boot PCRs and other criteria match.
Userspace can only ever see encrypted blobs.
Based on suggestions by Jason Gunthorpe, several new options have been
added to support additional usages.
The new options are:
migratable= designates that the key may/may not ever be updated
(resealed under a new key, new pcrinfo or new auth.)
pcrlock=n extends the designated PCR 'n' with a random value,
so that a key sealed to that PCR may not be unsealed
again until after a reboot.
keyhandle= specifies the sealing/unsealing key handle.
keyauth= specifies the sealing/unsealing key auth.
blobauth= specifies the sealed data auth.
Implementation of a kernel reserved locality for trusted keys will be
investigated for a possible future extension.
Changelog:
- Updated and added examples to Documentation/keys-trusted-encrypted.txt
- Moved generic TPM constants to include/linux/tpm_command.h
(David Howell's suggestion.)
- trusted_defined.c: replaced kzalloc with kmalloc, added pcrlock failure
error handling, added const qualifiers where appropriate.
- moved to late_initcall
- updated from hash to shash (suggestion by David Howells)
- reduced worst stack usage (tpm_seal) from 530 to 312 bytes
- moved documentation to Documentation directory (suggestion by David Howells)
- all the other code cleanups suggested by David Howells
- Add pcrlock CAP_SYS_ADMIN dependency (based on comment by Jason Gunthorpe)
- New options: migratable, pcrlock, keyhandle, keyauth, blobauth (based on
discussions with Jason Gunthorpe)
- Free payload on failure to create key(reported/fixed by Roberto Sassu)
- Updated Kconfig and other descriptions (based on Serge Hallyn's suggestion)
- Replaced kzalloc() with kmalloc() (reported by Serge Hallyn)
Signed-off-by: David Safford <safford@watson.ibm.com>
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Privileged syslog operations currently require CAP_SYS_ADMIN. Split
this off into a new CAP_SYSLOG privilege which we can sanely take away
from a container through the capability bounding set.
With this patch, an lxc container can be prevented from messing with
the host's syslog (i.e. dmesg -c).
Changelog: mar 12 2010: add selinux capability2:cap_syslog perm
Changelog: nov 22 2010:
. port to new kernel
. add a WARN_ONCE if userspace isn't using CAP_SYSLOG
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Andrew G. Morgan <morgan@kernel.org>
Acked-By: Kees Cook <kees.cook@canonical.com>
Cc: James Morris <jmorris@namei.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: "Christopher J. PeBenito" <cpebenito@tresys.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: James Morris <jmorris@namei.org>
The SELinux ip postroute code indicates when policy rejected a packet and
passes the error back up the stack. The compat code does not. This patch
sends the same kind of error back up the stack in the compat code.
Based-on-patch-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the SELinux netlink code returns a fatal error when the error might
actually be transient. This patch just silently drops packets on
potentially transient errors but continues to return a permanant error
indicator when the denial was because of policy.
Based-on-comments-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SELinux netfilter hooks just return NF_DROP if they drop a packet. We
want to signal that a drop in this hook is a permanant fatal error and is not
transient. If we do this the error will be passed back up the stack in some
places and applications will get a faster interaction that something went
wrong.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The addition of CONFIG_SECURITY_DMESG_RESTRICT resulted in a build
failure when CONFIG_PRINTK=n. This is because the capabilities code
which used the new option was built even though the variable in question
didn't exist.
The patch here fixes this by moving the capabilities checks out of the
LSM and into the caller. All (known) LSMs should have been calling the
capabilities hook already so it actually makes the code organization
better to eliminate the hook altogether.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kernel syslog contains debugging information that is often useful
during exploitation of other vulnerabilities, such as kernel heap
addresses. Rather than futilely attempt to sanitize hundreds (or
thousands) of printk statements and simultaneously cripple useful
debugging functionality, it is far simpler to create an option that
prevents unprivileged users from reading the syslog.
This patch, loosely based on grsecurity's GRKERNSEC_DMESG, creates the
dmesg_restrict sysctl. When set to "0", the default, no restrictions are
enforced. When set to "1", only users with CAP_SYS_ADMIN can read the
kernel syslog via dmesg(8) or other mechanisms.
[akpm@linux-foundation.org: explain the config option in kernel.txt]
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Eugene Teo <eugeneteo@kernel.org>
Acked-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
set_init_cxt() allocted sizeof(struct aa_task_cxt) bytes for cxt,
if register_security() failed, it will cause memory leak.
Signed-off-by: Zhitong Wang <zhitong.wangzt@alibaba-inc.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
policy->name is a substring of policy->hname, if prefix is not NULL, it will
allocted strlen(prefix) + strlen(name) + 3 bytes to policy->hname in policy_init().
use kzfree(ns->base.name) will casue memory leak if alloc_namespace() failed.
Signed-off-by: Zhitong Wang <zhitong.wangzt@alibaba-inc.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
Fix an incorrect error check that returns 1 for error instead of the
expected error code.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
split invalidate_inodes()
fs: skip I_FREEING inodes in writeback_sb_inodes
fs: fold invalidate_list into invalidate_inodes
fs: do not drop inode_lock in dispose_list
fs: inode split IO and LRU lists
fs: switch bdev inode bdi's correctly
fs: fix buffer invalidation in invalidate_list
fsnotify: use dget_parent
smbfs: use dget_parent
exportfs: use dget_parent
fs: use RCU read side protection in d_validate
fs: clean up dentry lru modification
fs: split __shrink_dcache_sb
fs: improve DCACHE_REFERENCED usage
fs: use percpu counter for nr_dentry and nr_dentry_unused
fs: simplify __d_free
fs: take dcache_lock inside __d_path
fs: do not assign default i_ino in new_inode
fs: introduce a per-cpu last_ino allocator
new helper: ihold()
...
* ima-memory-use-fixes:
IMA: fix the ToMToU logic
IMA: explicit IMA i_flag to remove global lock on inode_delete
IMA: drop refcnt from ima_iint_cache since it isn't needed
IMA: only allocate iint when needed
IMA: move read counter into struct inode
IMA: use i_writecount rather than a private counter
IMA: use inode->i_lock to protect read and write counters
IMA: convert internal flags from long to char
IMA: use unsigned int instead of long for counters
IMA: drop the inode opencount since it isn't needed for operation
IMA: use rbtree instead of radix tree for inode information cache
Current logic looks like this:
rc = ima_must_measure(NULL, inode, MAY_READ, FILE_CHECK);
if (rc < 0)
goto out;
if (mode & FMODE_WRITE) {
if (inode->i_readcount)
send_tomtou = true;
goto out;
}
if (atomic_read(&inode->i_writecount) > 0)
send_writers = true;
Lets assume we have a policy which states that all files opened for read
by root must be measured.
Lets assume the file has permissions 777.
Lets assume that root has the given file open for read.
Lets assume that a non-root process opens the file write.
The non-root process will get to ima_counts_get() and will check the
ima_must_measure(). Since it is not supposed to measure it will goto
out.
We should check the i_readcount no matter what since we might be causing
a ToMToU voilation!
This is close to correct, but still not quite perfect. The situation
could have been that root, which was interested in the mesurement opened
and closed the file and another process which is not interested in the
measurement is the one holding the i_readcount ATM. This is just overly
strict on ToMToU violations, which is better than not strict enough...
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently for every removed inode IMA must take a global lock and search
the IMA rbtree looking for an associated integrity structure. Instead
we explicitly mark an inode when we add an integrity structure so we
only have to take the global lock and do the removal if it exists.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since finding a struct ima_iint_cache requires a valid struct inode, and
the struct ima_iint_cache is supposed to have the same lifetime as a
struct inode (technically they die together but don't need to be created
at the same time) we don't have to worry about the ima_iint_cache
outliving or dieing before the inode. So the refcnt isn't useful. Just
get rid of it and free the structure when the inode is freed.
Signed-off-by: Eric Paris <eapris@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
IMA always allocates an integrity structure to hold information about
every inode, but only needed this structure to track the number of
readers and writers currently accessing a given inode. Since that
information was moved into struct inode instead of the integrity struct
this patch stops allocating the integrity stucture until it is needed.
Thus greatly reducing memory usage.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
IMA currently allocated an inode integrity structure for every inode in
core. This stucture is about 120 bytes long. Most files however
(especially on a system which doesn't make use of IMA) will never need
any of this space. The problem is that if IMA is enabled we need to
know information about the number of readers and the number of writers
for every inode on the box. At the moment we collect that information
in the per inode iint structure and waste the rest of the space. This
patch moves those counters into the struct inode so we can eventually
stop allocating an IMA integrity structure except when absolutely
needed.
This patch does the minimum needed to move the location of the data.
Further cleanups, especially the location of counter updates, may still
be possible.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
IMA tracks the number of struct files which are holding a given inode
readonly and the number which are holding the inode write or r/w. It
needs this information so when a new reader or writer comes in it can
tell if this new file will be able to invalidate results it already made
about existing files.
aka if a task is holding a struct file open RO, IMA measured the file
and recorded those measurements and then a task opens the file RW IMA
needs to note in the logs that the old measurement may not be correct.
It's called a "Time of Measure Time of Use" (ToMToU) issue. The same is
true is a RO file is opened to an inode which has an open writer. We
cannot, with any validity, measure the file in question since it could
be changing.
This patch attempts to use the i_writecount field to track writers. The
i_writecount field actually embeds more information in it's value than
IMA needs but it should work for our purposes and allow us to shrink the
struct inode even more.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently IMA used the iint->mutex to protect the i_readcount and
i_writecount. This patch uses the inode->i_lock since we are going to
start using in inode objects and that is the most appropriate lock.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The IMA flags is an unsigned long but there is only 1 flag defined.
Lets save a little space and make it a char. This packs nicely next to
the array of u8's.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently IMA uses 2 longs in struct inode. To save space (and as it
seems impossible to overflow 32 bits) we switch these to unsigned int.
The switch to unsigned does require slightly different checks for
underflow, but it isn't complex.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The opencount was used to help debugging to make sure that everything
which created a struct file also correctly made the IMA calls. Since we
moved all of that into the VFS this isn't as necessary. We should be
able to get the same amount of debugging out of just the reader and
write count.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The IMA code needs to store the number of tasks which have an open fd
granting permission to write a file even when IMA is not in use. It
needs this information in order to be enabled at a later point in time
without losing it's integrity garantees.
At the moment that means we store a little bit of data about every inode
in a cache. We use a radix tree key'd on the inode's memory address.
Dave Chinner pointed out that a radix tree is a terrible data structure
for such a sparse key space. This patch switches to using an rbtree
which should be more efficient.
Bug report from Dave:
"I just noticed that slabtop was reporting an awfully high usage of
radix tree nodes:
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
4200331 2778082 66% 0.55K 144839 29 2317424K radix_tree_node
2321500 2060290 88% 1.00K 72581 32 2322592K xfs_inode
2235648 2069791 92% 0.12K 69864 32 279456K iint_cache
That is, 2.7M radix tree nodes are allocated, and the cache itself is
consuming 2.3GB of RAM. I know that the XFS inodei caches are indexed
by radix tree node, but for 2 million cached inodes that would mean a
density of 1 inode per radix tree node, which for a system with 16M
inodes in the filsystems is an impossibly low density. The worst I've
seen in a production system like kernel.org is about 20-25% density,
which would mean about 150-200k radix tree nodes for that many inodes.
So it's not the inode cache.
So I looked up what the iint_cache was. It appears to used for
storing per-inode IMA information, and uses a radix tree for indexing.
It uses the *address* of the struct inode as the indexing key. That
means the key space is extremely sparse - for XFS the struct inode
addresses are approximately 1000 bytes apart, which means the closest
the radix tree index keys get is ~1000. Which means that there is a
single entry per radix tree leaf node, so the radix tree is using
roughly 550 bytes for every 120byte structure being cached. For the
above example, it's probably wasting close to 1GB of RAM...."
Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All callers take dcache_lock just around the call to __d_path, so
take the lock into it in preparation of getting rid of dcache_lock.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Instead of always assigning an increasing inode number in new_inode
move the call to assign it into those callers that actually need it.
For now callers that need it is estimated conservatively, that is
the call is added to all filesystems that do not assign an i_ino
by themselves. For a few more filesystems we can avoid assigning
any inode number given that they aren't user visible, and for others
it could be done lazily when an inode number is actually needed,
but that's left for later patches.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
vfs: make no_llseek the default
vfs: don't use BKL in default_llseek
llseek: automatically add .llseek fop
libfs: use generic_file_llseek for simple_attr
mac80211: disallow seeks in minstrel debug code
lirc: make chardev nonseekable
viotape: use noop_llseek
raw: use explicit llseek file operations
ibmasmfs: use generic_file_llseek
spufs: use llseek in all file operations
arm/omap: use generic_file_llseek in iommu_debug
lkdtm: use generic_file_llseek in debugfs
net/wireless: use generic_file_llseek in debugfs
drm: use noop_llseek
/selinux/policy allows a user to copy the policy back out of the kernel.
This patch allows userspace to actually mmap that file and use it directly.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
There is interest in being able to see what the actual policy is that was
loaded into the kernel. The patch creates a new selinuxfs file
/selinux/policy which can be read by userspace. The actual policy that is
loaded into the kernel will be written back out to userspace.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
AVTAB_MAX_SIZE was a define which was supposed to be used in userspace to
define a maximally sized avtab when userspace wasn't sure how big of a table
it needed. It doesn't make sense in the kernel since we always know our table
sizes. The only place it is used we have a more appropiately named define
called AVTAB_MAX_HASH_BUCKETS, use that instead.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Range transition rules are placed in the hash table in an (almost)
arbitrary order. This patch inserts them in a fixed order to make policy
retrival more predictable.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
With the (long ago) interface change to have the secid_to_secctx functions
do the string allocation instead of having the caller do the allocation we
lost the ability to query the security server for the length of the
upcoming string. The SECMARK code would like to allocate a netlink skb
with enough length to hold the string but it is just too unclean to do the
string allocation twice or to do the allocation the first time and hold
onto the string and slen. This patch adds the ability to call
security_secid_to_secctx() with a NULL data pointer and it will just set
the slen pointer.
Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
Right now secmark has lots of direct selinux calls. Use all LSM calls and
remove all SELinux specific knowledge. The only SELinux specific knowledge
we leave is the mode. The only point is to make sure that other LSMs at
least test this generic code before they assume it works. (They may also
have to make changes if they do not represent labels as strings)
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Paul Moore <paul.moore@hp.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: James Morris <jmorris@namei.org>
Actually I think in this case the appropriate thing to do is to BUG as there
is currently a case (remove) where the alloc_size needs to be larger than
the copy_size, and if copy_size is ever greater than alloc_size there is
a mistake in the caller code.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Acked-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
Configuration files for TOMOYO 2.3 are not compatible with TOMOYO 2.2.
But current panic() message is too unfriendly and is confusing users.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: James Morris <jmorris@namei.org>
All security modules shouldn't change sched_param parameter of
security_task_setscheduler(). This is not only meaningless, but also
make a harmful result if caller pass a static variable.
This patch remove policy and sched_param parameter from
security_task_setscheduler() becuase none of security module is
using it.
Cc: James Morris <jmorris@namei.org>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: James Morris <jmorris@namei.org>
This patch fixes up coding-style problem at this commit:
4f27a7d49789b04404eca26ccde5f527231d01d5
selinux: fast status update interface (/selinux/status)
Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: James Morris <jmorris@namei.org>
While the previous change to the selinux Makefile reduced the window
significantly for this failure, it is still possible to see a compile
failure where cpp starts processing selinux files before the auto
generated flask.h file is completed. This is easily reproduced by
adding the following temporary change to expose the issue everytime:
- cmd_flask = scripts/selinux/genheaders/genheaders ...
+ cmd_flask = sleep 30 ; scripts/selinux/genheaders/genheaders ...
This failure happens because the creation of the object files in the ss
subdir also depends on flask.h. So simply incorporate them into the
parent Makefile, as the ss/Makefile really doesn't do anything unique.
With this change, compiling of all selinux files is dependent on
completion of the header file generation, and this test case with
the "sleep 30" now confirms it is functioning as expected.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: James Morris <jmorris@namei.org>
Selinux has an autogenerated file, "flask.h" which is included by
two other selinux files. The current makefile has a single dependency
on the first object file in the selinux-y list, assuming that will get
flask.h generated before anyone looks for it, but that assumption breaks
down in a "make -jN" situation and you get:
selinux/selinuxfs.c:35: fatal error: flask.h: No such file or directory
compilation terminated.
remake[9]: *** [security/selinux/selinuxfs.o] Error 1
Since flask.h is included by security.h which in turn is included
nearly everywhere, make the dependency apply to all of the selinux-y
list of objs.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: James Morris <jmorris@namei.org>
This patch provides a new /selinux/status entry which allows applications
read-only mmap(2).
This region reflects selinux_kernel_status structure in kernel space.
struct selinux_kernel_status
{
u32 length; /* length of this structure */
u32 sequence; /* sequence number of seqlock logic */
u32 enforcing; /* current setting of enforcing mode */
u32 policyload; /* times of policy reloaded */
u32 deny_unknown; /* current setting of deny_unknown */
};
When userspace object manager caches access control decisions provided
by SELinux, it needs to invalidate the cache on policy reload and setenforce
to keep consistency.
However, the applications need to check the kernel state for each accesses
on userspace avc, or launch a background worker process.
In heuristic, frequency of invalidation is much less than frequency of
making access control decision, so it is annoying to invoke a system call
to check we don't need to invalidate the userspace cache.
If we can use a background worker thread, it allows to receive invalidation
messages from the kernel. But it requires us an invasive coding toward the
base application in some cases; E.g, when we provide a feature performing
with SELinux as a plugin module, it is unwelcome manner to launch its own
worker thread from the module.
If we could map /selinux/status to process memory space, application can
know updates of selinux status; policy reload or setenforce.
A typical application checks selinux_kernel_status::sequence when it tries
to reference userspace avc. If it was changed from the last time when it
checked userspace avc, it means something was updated in the kernel space.
Then, the application can reset userspace avc or update current enforcing
mode, without any system call invocations.
This sequence number is updated according to the seqlock logic, so we need
to wait for a while if it is odd number.
Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Acked-by: Eric Paris <eparis@redhat.com>
--
security/selinux/include/security.h | 21 ++++++
security/selinux/selinuxfs.c | 56 +++++++++++++++
security/selinux/ss/Makefile | 2 +-
security/selinux/ss/services.c | 3 +
security/selinux/ss/status.c | 129 +++++++++++++++++++++++++++++++++++
5 files changed, 210 insertions(+), 1 deletions(-)
Signed-off-by: James Morris <jmorris@namei.org>
We can set default LSM module to DAC (which means "enable no LSM module").
If default LSM module was set to DAC, security_module_enable() must return 0
unless overridden via boot time parameter.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: James Morris <jmorris@namei.org>
type is not used at all, stop declaring and assigning it.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
If domain is NULL then &domain->list is a bogus address. Let's leave
head->r.domain NULL instead of saving an unusable pointer.
This is just a cleanup. The current code always checks head->r.eof
before dereferencing head->r.domain.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.
The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.
New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.
The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.
Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.
Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.
===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}
@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}
@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}
@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}
@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};
@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};
@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};
@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};
@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};
// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};
@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};
// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};
// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};
// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};
@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};
// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////
@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};
@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};
@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};
@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>
System call entry functions sys_*() are never to be called from
general kernel code. The fact that they aren't declared in header
files should have been a clue. These functions also don't exist on
Alpha since it has sys_getxpid() instead.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: James Morris <jmorris@namei.org>
Fix a bug in keyctl_session_to_parent() whereby it tries to check the ownership
of the parent process's session keyring whether or not the parent has a session
keyring [CVE-2010-2960].
This results in the following oops:
BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
IP: [<ffffffff811ae4dd>] keyctl_session_to_parent+0x251/0x443
...
Call Trace:
[<ffffffff811ae2f3>] ? keyctl_session_to_parent+0x67/0x443
[<ffffffff8109d286>] ? __do_fault+0x24b/0x3d0
[<ffffffff811af98c>] sys_keyctl+0xb4/0xb8
[<ffffffff81001eab>] system_call_fastpath+0x16/0x1b
if the parent process has no session keyring.
If the system is using pam_keyinit then it mostly protected against this as all
processes derived from a login will have inherited the session keyring created
by pam_keyinit during the log in procedure.
To test this, pam_keyinit calls need to be commented out in /etc/pam.d/.
Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Tavis Ormandy <taviso@cmpxchg8b.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There's an protected access to the parent process's credentials in the middle
of keyctl_session_to_parent(). This results in the following RCU warning:
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
security/keys/keyctl.c:1291 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
1 lock held by keyctl-session-/2137:
#0: (tasklist_lock){.+.+..}, at: [<ffffffff811ae2ec>] keyctl_session_to_parent+0x60/0x236
stack backtrace:
Pid: 2137, comm: keyctl-session- Not tainted 2.6.36-rc2-cachefs+ #1
Call Trace:
[<ffffffff8105606a>] lockdep_rcu_dereference+0xaa/0xb3
[<ffffffff811ae379>] keyctl_session_to_parent+0xed/0x236
[<ffffffff811af77e>] sys_keyctl+0xb4/0xb6
[<ffffffff81001eab>] system_call_fastpath+0x16/0x1b
The code should take the RCU read lock to make sure the parents credentials
don't go away, even though it's holding a spinlock and has IRQ disabled.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit 8262bb85da allocated the inode integrity struct (iint) before any
inodes were created. Only after IMA was initialized in late_initcall were
the counters updated. This patch updates the counters, whether or not IMA
has been initialized, to resolve 'imbalance' messages.
This patch fixes the bug as reported in bugzilla: 15673. When the i915
is builtin, the ring_buffer is initialized before IMA, causing the
imbalance message on suspend.
Reported-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Tested-by: Thomas Meyer <thomas@m3y3r.de>
Tested-by: David Safford<safford@watson.ibm.com>
Cc: Stable Kernel <stable@kernel.org>
Signed-off-by: James Morris <jmorris@namei.org>
The locking for profile namespace removal is wrong, when removing a
profile namespace, it needs to be removed from its parent's list.
Lock the parent of namespace list instead of the namespace being removed.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
As per Dan Carpenter <error27@gmail.com>
If we have a ns name without a following profile then in the original
code it did "*ns_name = &name[1];". "name" is NULL so "*ns_name" is
0x1. That isn't useful and could cause an oops when this function is
called from aa_remove_profiles().
Beyond this the assignment of the namespace name was wrong in the case
where the profile name was provided as it was being set to &name[1]
after name = skip_spaces(split + 1);
Move the ns_name assignment before updating name for the split and
also add skip_spaces, making the interface more robust.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
2.6.36 introduced the abilitiy to specify the task that is having its
rlimits set. Update mediation to ensure that confined tasks can only
set their own group_leader as expected by current policy.
Add TODO note about extending policy to support setting other tasks
rlimits.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
The 2.6.36 kernel has refactored __d_path() so that it no longer appends
" (deleted)" to unlinked paths. So drop the hack that was used to detect
and remove the appended string.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
fs: brlock vfsmount_lock
fs: scale files_lock
lglock: introduce special lglock and brlock spin locks
tty: fix fu_list abuse
fs: cleanup files_lock locking
fs: remove extra lookup in __lookup_hash
fs: fs_struct rwlock to spinlock
apparmor: use task path helpers
fs: dentry allocation consolidation
fs: fix do_lookup false negative
mbcache: Limit the maximum number of cache entries
hostfs ->follow_link() braino
hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy
remove SWRITE* I/O types
kill BH_Ordered flag
vfs: update ctime when changing the file's permission by setfacl
cramfs: only unlock new inodes
fix reiserfs_evict_inode end_writeback second call
tty: fix fu_list abuse
tty code abuses fu_list, which causes a bug in remount,ro handling.
If a tty device node is opened on a filesystem, then the last link to the inode
removed, the filesystem will be allowed to be remounted readonly. This is
because fs_may_remount_ro does not find the 0 link tty inode on the file sb
list (because the tty code incorrectly removed it to use for its own purpose).
This can result in a filesystem with errors after it is marked "clean".
Taking idea from Christoph's initial patch, allocate a tty private struct
at file->private_data and put our required list fields in there, linking
file and tty. This makes tty nodes behave the same way as other device nodes
and avoid meddling with the vfs, and avoids this bug.
The error handling is not trivial in the tty code, so for this bugfix, I take
the simple approach of using __GFP_NOFAIL and don't worry about memory errors.
This is not a problem because our allocator doesn't fail small allocs as a rule
anyway. So proper error handling is left as an exercise for tty hackers.
[ Arguably filesystem's device inode would ideally be divorced from the
driver's pseudo inode when it is opened, but in practice it's not clear whether
that will ever be worth implementing. ]
Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fs: cleanup files_lock locking
Lock tty_files with a new spinlock, tty_files_lock; provide helpers to
manipulate the per-sb files list; unexport the files_lock spinlock.
Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Make do_execve() take a const filename pointer so that kernel_execve() compiles
correctly on ARM:
arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type
This also requires the argv and envp arguments to be consted twice, once for
the pointer array and once for the strings the array points to. This is
because do_execve() passes a pointer to the filename (now const) to
copy_strings_kernel(). A simpler alternative would be to cast the filename
pointer in do_execve() when it's passed to copy_strings_kernel().
do_execve() may not change any of the strings it is passed as part of the argv
or envp lists as they are some of them in .rodata, so marking these strings as
const should be fine.
Further kernel_execve() and sys_execve() need to be changed to match.
This has been test built on x86_64, frv, arm and mips.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After rlimits tree was merged we get the following errors:
security/apparmor/lsm.c:663:2: warning: initialization from incompatible pointer type
It is because AppArmor was merged in the meantime, but uses the old
prototype. So fix it by adding struct task_struct as a first parameter
of apparmor_task_setrlimit.
NOTE that this is ONLY a compilation warning fix (and crashes caused
by that). It needs proper handling in AppArmor depending on who is the
'task'.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
* 'params' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (22 commits)
param: don't deref arg in __same_type() checks
param: update drivers/acpi/debug.c to new scheme
param: use module_param in drivers/message/fusion/mptbase.c
ide: use module_param_named rather than module_param_call
param: update drivers/char/ipmi/ipmi_watchdog.c to new scheme
param: lock if_sdio's lbs_helper_name and lbs_fw_name against sysfs changes.
param: lock myri10ge_fw_name against sysfs changes.
param: simple locking for sysfs-writable charp parameters
param: remove unnecessary writable charp
param: add kerneldoc to moduleparam.h
param: locking for kernel parameters
param: make param sections const.
param: use free hook for charp (fix leak of charp parameters)
param: add a free hook to kernel_param_ops.
param: silence .init.text references from param ops
Add param ops struct for hvc_iucv driver.
nfs: update for module_param_named API change
AppArmor: update for module_param_named API change
param: use ops in struct kernel_param, rather than get and set fns directly
param: move the EXPORT_SYMBOL to after the definitions.
...
Add a dummy printk function for the maintenance of unused printks through gcc
format checking, and also so that side-effect checking is maintained too.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes these build errors:
security/apparmor/lsm.c:701: error: 'param_ops_aabool' undeclared here (not in a function)
security/apparmor/lsm.c:721: error: 'param_ops_aalockpolicy' undeclared here (not in a function)
security/apparmor/lsm.c:729: error: 'param_ops_aauint' undeclared here (not in a function)
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>