Pull vfs update from Al Viro:
- big one - consolidation of descriptor-related logics; almost all of
that is moved to fs/file.c
(BTW, I'm seriously tempted to rename the result to fd.c. As it is,
we have a situation when file_table.c is about handling of struct
file and file.c is about handling of descriptor tables; the reasons
are historical - file_table.c used to be about a static array of
struct file we used to have way back).
A lot of stray ends got cleaned up and converted to saner primitives,
disgusting mess in android/binder.c is still disgusting, but at least
doesn't poke so much in descriptor table guts anymore. A bunch of
relatively minor races got fixed in process, plus an ext4 struct file
leak.
- related thing - fget_light() partially unuglified; see fdget() in
there (and yes, it generates the code as good as we used to have).
- also related - bits of Cyrill's procfs stuff that got entangled into
that work; _not_ all of it, just the initial move to fs/proc/fd.c and
switch of fdinfo to seq_file.
- Alex's fs/coredump.c spiltoff - the same story, had been easier to
take that commit than mess with conflicts. The rest is a separate
pile, this was just a mechanical code movement.
- a few misc patches all over the place. Not all for this cycle,
there'll be more (and quite a few currently sit in akpm's tree)."
Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
MAX_LFS_FILESIZE should be a loff_t
compat: fs: Generic compat_sys_sendfile implementation
fs: push rcu_barrier() from deactivate_locked_super() to filesystems
btrfs: reada_extent doesn't need kref for refcount
coredump: move core dump functionality into its own file
coredump: prevent double-free on an error path in core dumper
usb/gadget: fix misannotations
fcntl: fix misannotations
ceph: don't abuse d_delete() on failure exits
hypfs: ->d_parent is never NULL or negative
vfs: delete surplus inode NULL check
switch simple cases of fget_light to fdget
new helpers: fdget()/fdput()
switch o2hb_region_dev_write() to fget_light()
proc_map_files_readdir(): don't bother with grabbing files
make get_file() return its argument
vhost_set_vring(): turn pollstart/pollstop into bool
switch prctl_set_mm_exe_file() to fget_light()
switch xfs_find_handle() to fget_light()
switch xfs_swapext() to fget_light()
...
Pull user namespace changes from Eric Biederman:
"This is a mostly modest set of changes to enable basic user namespace
support. This allows the code to code to compile with user namespaces
enabled and removes the assumption there is only the initial user
namespace. Everything is converted except for the most complex of the
filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
nfs, ocfs2 and xfs as those patches need a bit more review.
The strategy is to push kuid_t and kgid_t values are far down into
subsystems and filesystems as reasonable. Leaving the make_kuid and
from_kuid operations to happen at the edge of userspace, as the values
come off the disk, and as the values come in from the network.
Letting compile type incompatible compile errors (present when user
namespaces are enabled) guide me to find the issues.
The most tricky areas have been the places where we had an implicit
union of uid and gid values and were storing them in an unsigned int.
Those places were converted into explicit unions. I made certain to
handle those places with simple trivial patches.
Out of that work I discovered we have generic interfaces for storing
quota by projid. I had never heard of the project identifiers before.
Adding full user namespace support for project identifiers accounts
for most of the code size growth in my git tree.
Ultimately there will be work to relax privlige checks from
"capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
root in a user names to do those things that today we only forbid to
non-root users because it will confuse suid root applications.
While I was pushing kuid_t and kgid_t changes deep into the audit code
I made a few other cleanups. I capitalized on the fact we process
netlink messages in the context of the message sender. I removed
usage of NETLINK_CRED, and started directly using current->tty.
Some of these patches have also made it into maintainer trees, with no
problems from identical code from different trees showing up in
linux-next.
After reading through all of this code I feel like I might be able to
win a game of kernel trivial pursuit."
Fix up some fairly trivial conflicts in netfilter uid/git logging code.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
userns: Convert the ufs filesystem to use kuid/kgid where appropriate
userns: Convert the udf filesystem to use kuid/kgid where appropriate
userns: Convert ubifs to use kuid/kgid
userns: Convert squashfs to use kuid/kgid where appropriate
userns: Convert reiserfs to use kuid and kgid where appropriate
userns: Convert jfs to use kuid/kgid where appropriate
userns: Convert jffs2 to use kuid and kgid where appropriate
userns: Convert hpfs to use kuid and kgid where appropriate
userns: Convert btrfs to use kuid/kgid where appropriate
userns: Convert bfs to use kuid/kgid where appropriate
userns: Convert affs to use kuid/kgid wherwe appropriate
userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
userns: On ia64 deal with current_uid and current_gid being kuid and kgid
userns: On ppc convert current_uid from a kuid before printing.
userns: Convert s390 getting uid and gid system calls to use kuid and kgid
userns: Convert s390 hypfs to use kuid and kgid where appropriate
userns: Convert binder ipc to use kuids
userns: Teach security_path_chown to take kuids and kgids
userns: Add user namespace support to IMA
userns: Convert EVM to deal with kuids and kgids in it's hmac computation
...
- Pass the user namespace the uid and gid values in the xattr are stored
in into posix_acl_from_xattr.
- Pass the user namespace kuid and kgid values should be converted into
when storing uid and gid values in an xattr in posix_acl_to_xattr.
- Modify all callers of posix_acl_from_xattr and posix_acl_to_xattr to
pass in &init_user_ns.
In the short term this change is not strictly needed but it makes the
code clearer. In the longer term this change is necessary to be able to
mount filesystems outside of the initial user namespace that natively
store posix acls in the linux xattr format.
Cc: Theodore Tso <tytso@mit.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Commit d5497fc693 "nfsd4: move rq_flavor
into svc_cred" forgot to remove cl_flavor from the client, leaving two
places (cl_flavor and cl_cred.cr_flavor) for the flavor to be stored.
After that patch, the latter was the one that was updated, but the
former was the one that the callback used.
Symptoms were a long delay on utime(). This is because the utime()
generated a setattr which recalled a delegation, but the cb_recall was
ignored by the client because it had the wrong security flavor.
Cc: stable@vger.kernel.org
Tested-by: Jamie Heilman <jamie@audible.transient.net>
Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull second vfs pile from Al Viro:
"The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
deadlock reproduced by xfstests 068), symlink and hardlink restriction
patches, plus assorted cleanups and fixes.
Note that another fsfreeze deadlock (emergency thaw one) is *not*
dealt with - the series by Fernando conflicts a lot with Jan's, breaks
userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
for massive vfsmount leak; this is going to be handled next cycle.
There probably will be another pull request, but that stuff won't be
in it."
Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
delousing target_core_file a bit
Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
fs: Remove old freezing mechanism
ext2: Implement freezing
btrfs: Convert to new freezing mechanism
nilfs2: Convert to new freezing mechanism
ntfs: Convert to new freezing mechanism
fuse: Convert to new freezing mechanism
gfs2: Convert to new freezing mechanism
ocfs2: Convert to new freezing mechanism
xfs: Convert to new freezing code
ext4: Convert to new freezing mechanism
fs: Protect write paths by sb_start_write - sb_end_write
fs: Skip atime update on frozen filesystem
fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
fs: Improve filesystem freezing handling
switch the protection of percpu_counter list to spinlock
nfsd: Push mnt_want_write() outside of i_mutex
btrfs: Push mnt_want_write() outside of i_mutex
fat: Push mnt_want_write() outside of i_mutex
...
Pull nfsd changes from J. Bruce Fields:
"This has been an unusually quiet cycle--mostly bugfixes and cleanup.
The one large piece is Stanislav's work to containerize the server's
grace period--but that in itself is just one more step in a
not-yet-complete project to allow fully containerized nfs service.
There are a number of outstanding delegation, container, v4 state, and
gss patches that aren't quite ready yet; 3.7 may be wilder."
* 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits)
NFSd: make boot_time variable per network namespace
NFSd: make grace end flag per network namespace
Lockd: move grace period management from lockd() to per-net functions
LockD: pass actual network namespace to grace period management functions
LockD: manage grace list per network namespace
SUNRPC: service request network namespace helper introduced
NFSd: make nfsd4_manager allocated per network namespace context.
LockD: make lockd manager allocated per network namespace
LockD: manage grace period per network namespace
Lockd: add more debug to host shutdown functions
Lockd: host complaining function introduced
LockD: manage used host count per networks namespace
LockD: manage garbage collection timeout per networks namespace
LockD: make garbage collector network namespace aware.
LockD: mark host per network namespace on garbage collect
nfsd4: fix missing fault_inject.h include
locks: move lease-specific code out of locks_delete_lock
locks: prevent side-effects of locks_release_private before file_lock is initialized
NFSd: set nfsd_serv to NULL after service destruction
NFSd: introduce nfsd_destroy() helper
...
When mnt_want_write() starts to handle freezing it will get a full lock
semantics requiring proper lock ordering. So push mnt_want_write() call
consistently outside of i_mutex.
CC: linux-nfs@vger.kernel.org
CC: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
NFSd's boot_time represents grace period start point in time.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Passed network namespace replaced hard-coded init_net
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This is a cleanup patch - makes code looks simplier.
It replaces widely used rqstp->rq_xprt->xpt_net by introduced SVC_NET(rqstp).
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In nfsd_destroy():
if (destroy)
svc_shutdown_net(nfsd_serv, net);
svc_destroy(nfsd_server);
svc_shutdown_net(nfsd_serv, net) calls nfsd_last_thread(), which sets
nfsd_serv to NULL, causing a NULL dereference on the following line.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
I don't think there's a practical difference for the range of values
these interfaces should see, but it would be safer to be unambiguous.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This patch adds recall_lock hold to nfsd_forget_delegations() to protect
nfsd_process_n_delegations() call.
Also, looks like it would be better to collect delegations to some local
on-stack list, and then unhash collected list. This split allows to
simplify locking, because delegation traversing is protected by recall_lock,
when delegation unhash is protected by client_mutex.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This fixes a wrong check for same cr_principal in same_creds
Introduced by 8fbba96e5b "nfsd4: stricter
cred comparison for setclientid/exchange_id".
Cc: stable@vger.kernel.org
Signed-off-by: Vivek Trivedi <vtrivedi018@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We normally allow the owner of a file to override permissions checks on
IO operations, since:
- the client will take responsibility for doing an access check
on open;
- the permission checks offer no protection against malicious
clients--if they can authenticate as the file's owner then
they can always just change its permissions;
- checking permission on each IO operation breaks the usual
posix rule that permission is checked only on open.
However, we've never allowed the owner to override permissions on
readdir operations, even though the above logic would also apply to
directories. I've never heard of this causing a problem, probably
because a) simultaneously opening and creating a directory (with
restricted mode) isn't possible, and b) opening a directory, then
chmod'ing it, is rare.
Our disallowal of owner-override on directories appears to be an
accident, though--the readdir itself succeeds, and then we fail just
because lookup_one_len() calls in our filldir methods fail.
I'm not sure what the easiest fix for that would be. For now, just make
this behavior obvious by denying the override right at the start.
This also fixes some odd v4 behavior: with the rdattr_error attribute
requested, it would perform the readdir but return an ACCES error with
each entry.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We don't need to keep openowners around in the >=4.1 case, because they
aren't needed to handle CLOSE replays any more (that's a problem for
sessions). And doing so causes unexpected failures on a subsequent
destroy_clientid to fail.
We probably also need something comparable for lock owners on last
unlock.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Actually, xfs and jfs can optionally be case insensitive; we'll handle
that case in later patches.
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Share a little common logic. And note the comments here are a little
out of date (e.g. we don't always create new state in the "new" case any
more.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
For the most part readers of cl_cb_state only need a value that is
"eventually" right. And the value is set only either 1) in response to
some change of state, in which case it's set to UNKNOWN and then a
callback rpc is sent to probe the real state, or b) in the handling of a
response to such a callback. UNKNOWN is therefore always a "temporary"
state, and for the other states we're happy to accept last writer wins.
So I think we're OK here.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
According to RFC 5661, the TEST_STATEID operation is not allowed to
return NFS4ERR_STALE_STATEID. In addition, RFC 5661 says:
15.1.16.5. NFS4ERR_STALE_STATEID (Error Code 10023)
A stateid generated by an earlier server instance was used. This
error is moot in NFSv4.1 because all operations that take a stateid
MUST be preceded by the SEQUENCE operation, and the earlier server
instance is detected by the session infrastructure that supports
SEQUENCE.
I triggered NFS4ERR_STALE_STATEID while testing the Linux client's
NOGRACE recovery. Bruce suggested an additional test that could be
useful to client developers.
Lastly, RFC 5661, section 18.48.3 has this:
o Special stateids are always considered invalid (they result in the
error code NFS4ERR_BAD_STATEID).
An explicit check is made for those state IDs to avoid printk noise.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Initiate a CB probe when a new connection with the correct direction is added
to a session (IFF backchannel is marked as down). Without this a
BIND_CONN_TO_SESSION has no effect on the internal backchannel state, which
causes the server to reply to every SEQUENCE op with the
SEQ4_STATUS_CB_PATH_DOWN flag set until DESTROY_SESSION.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Most frequent symptom was a BUG triggering in expire_client, with the
server locking up shortly thereafter.
Introduced by 508dc6e110 "nfsd41:
free_session/free_client must be called under the client_lock".
Cc: stable@kernel.org
Cc: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull the rest of the nfsd commits from Bruce Fields:
"... and then I cherry-picked the remainder of the patches from the
head of my previous branch"
This is the rest of the original nfsd branch, rebased without the
delegation stuff that I thought really needed to be redone.
I don't like rebasing things like this in general, but in this situation
this was the lesser of two evils.
* 'for-3.5' of git://linux-nfs.org/~bfields/linux: (50 commits)
nfsd4: fix, consolidate client_has_state
nfsd4: don't remove rebooted client record until confirmation
nfsd4: remove some dprintk's and a comment
nfsd4: return "real" sequence id in confirmed case
nfsd4: fix exchange_id to return confirm flag
nfsd4: clarify that renewing expired client is a bug
nfsd4: simpler ordering of setclientid_confirm checks
nfsd4: setclientid: remove pointless assignment
nfsd4: fix error return in non-matching-creds case
nfsd4: fix setclientid_confirm same_cred check
nfsd4: merge 3 setclientid cases to 2
nfsd4: pull out common code from setclientid cases
nfsd4: merge last two setclientid cases
nfsd4: setclientid/confirm comment cleanup
nfsd4: setclientid remove unnecessary terms from a logical expression
nfsd4: move rq_flavor into svc_cred
nfsd4: stricter cred comparison for setclientid/exchange_id
nfsd4: move principal name into svc_cred
nfsd4: allow removing clients not holding state
nfsd4: rearrange exchange_id logic to simplify
...
Pull nfsd update from Bruce Fields.
* 'for-3.5-take-2' of git://linux-nfs.org/~bfields/linux: (23 commits)
nfsd: trivial: use SEEK_SET instead of 0 in vfs_llseek
SUNRPC: split upcall function to extract reusable parts
nfsd: allocate id-to-name and name-to-id caches in per-net operations.
nfsd: make name-to-id cache allocated per network namespace context
nfsd: make id-to-name cache allocated per network namespace context
nfsd: pass network context to idmap init/exit functions
nfsd: allocate export and expkey caches in per-net operations.
nfsd: make expkey cache allocated per network namespace context
nfsd: make export cache allocated per network namespace context
nfsd: pass pointer to export cache down to stack wherever possible.
nfsd: pass network context to export caches init/shutdown routines
Lockd: pass network namespace to creation and destruction routines
NFSd: remove hard-coded dereferences to name-to-id and id-to-name caches
nfsd: pass pointer to expkey cache down to stack wherever possible.
nfsd: use hash table from cache detail in nfsd export seq ops
nfsd: pass svc_export_cache pointer as private data to "exports" seq file ops
nfsd: use exp_put() for svc_export_cache put
nfsd: use cache detail pointer from svc_export structure on cache put
nfsd: add link to owner cache detail to svc_export structure
nfsd: use passed cache_detail pointer expkey_parse()
...
Whoops: first, I reimplemented the already-existing has_resources
without noticing; second, I got the test backwards. I did pick a better
name, though. Combine the two....
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In the NFSv4.1 client-reboot case we're currently removing the client's
previous state in exchange_id. That's wrong--we should be waiting till
the confirming create_session.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The comment is redundant, and if we really want dprintk's here they'd
probably be better in the common (check-slot_seqid) code.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The client should ignore the returned sequence_id in the case where the
CONFIRMED flag is set on an exchange_id reply--and in the unconfirmed
case "1" is always the right response. So it shouldn't actually matter
what we return here.
We could continue returning 1 just to catch clients ignoring the spec
here, but I'd rather be generous. Other things equal, returning the
existing sequence_id seems more informative.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Otherwise nfsd4_set_ex_flags writes over the return flags.
Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This can't happen:
- cl_time is zeroed only by unhash_client_locked, which is only
ever called under both the state lock and the client lock.
- every caller of renew_client() should have looked up a
(non-expired) client and then called renew_client() all
without dropping the state lock.
- the only other caller of renew_client_locked() is
release_session_client(), which first checks under the
client_lock that the cl_time is nonzero.
So make it clear that this is a bug, not something we handle. I can't
quite bring myself to make this a BUG(), though, as there are a lot of
renew_client() callers, and returning here is probably safer than a
BUG().
We'll consider making it a BUG() after some more cleanup.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The cases here divide into two main categories:
- if there's an uncomfirmed record with a matching verifier,
then this is a "normal", succesful case: we're either creating
a new client, or updating an existing one.
- otherwise, this is a weird case: a replay, or a server reboot.
Reordering to reflect that makes the code a bit more concise and the
logic a lot easier to understand.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Note CLID_INUSE is for the case where two clients are trying to use the
same client-provided long-form client identifiers. But what we're
looking at here is the server-returned shorthand client id--if those
clash there's a bug somewhere.
Fix the error return, pull the check out into common code, and do the
check unconditionally in all cases.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
New clients are created only by nfsd4_setclientid(), which always gives
any new client a unique clientid. The only exception is in the
"callback update" case, in which case it may create an unconfirmed
client with the same clientid as a confirmed client. In that case it
also checks that the confirmed client has the same credential.
Therefore, it is pointless for setclientid_confirm to check whether a
confirmed and unconfirmed client with the same clientid have matching
credentials--they're guaranteed to.
Instead, it should be checking whether the credential on the
setclientid_confirm matches either of those. Otherwise, it could be
anyone sending the setclientid_confirm. Granted, I can't see why anyone
would, but still it's probalby safer to check.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Move the rq_flavor into struct svc_cred, and use it in setclientid and
exchange_id comparisons as well.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The typical setclientid or exchange_id will probably be performed with a
credential that maps to either root or nobody, so comparing just uid's
is unlikely to be useful. So, use everything else we can get our hands
on.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>