Move most of this into helper functions. Also move the non-CONFIRM case
into caller, providing a helper function for that purpose.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The stateowner has some fields that only make sense for openowners, and
some that only make sense for lockowners, and I find it a lot clearer if
those are separated out.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Move the CLOSE_STATE case into the unique caller that cares about it
rather than putting it in preprocess_seqid_op.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
I don't see the point of having this check in nfs4_preprocess_seqid_op()
when it's only needed by the one caller.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If open fails with any error other than nfserr_replay_me, then the main
nfsd4_proc_compound() loop continues unconditionally to
nfsd4_encode_operation(), which will always call encode_seqid_op_tail.
Thus the condition we check for here does not occur.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
There are currently a couple races in the seqid replay code: a
retransmission could come while we're still encoding the original reply,
or a new seqid-mutating call could come as we're encoding a replay.
So, extend the state lock over the encoding (both encoding of a replayed
reply and caching of the original encoded reply).
I really hate doing this, and previously added the stateowner
reference-counting code to avoid it (which was insufficient)--but I
don't see a less complicated alternative at the moment.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Now that the replay owner is in the cstate we can remove it from a lot
of other individual operations and further simplify
nfs4_preprocess_seqid_op().
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Set the stateowner associated with a replay in one spot in
nfs4_preprocess_seqid_op() and keep it in cstate. This allows removing
a few lines of boilerplate from all the nfs4_preprocess_seqid_op()
callers.
Also turn ENCODE_SEQID_OP_TAIL into a function while we're here.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Thanks to Casey for reminding me that 5661 gives a special meaning to a
value of 0 in the stateid's seqid field, so all stateid's should start
out with si_generation 1. We were doing that in the open and lock
cases for minorversion 1, but not for the delegation stateid, and not
for openstateid's with v4.0.
It doesn't *really* matter much for v4.0 or for delegation stateid's
(which never get the seqid field incremented), but we may as well do the
same for all of them.
Reported-by: Casey Bodley <cbodley@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Follow the recommendation from rfc3530bis for stateid generation number
wraparound, simplify some code, and fix or remove incorrect comments.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When called with OPEN_STATE, preprocess_seqid_op only returns an open
stateid, hence only an open owner.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We've got some lock-specific code here in nfs4_preprocess_seqid_op which
is only used by nfsd4_lock(). Move it to the caller.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Note that the special handling for the lock stateid case is already done
by nfs4_check_openmode() (as of 0292191417
"nfsd4: fix openmode checking on IO using lock stateid") so we no longer
need these two cases in the caller.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Share some common code, stop doing silly things like initializing a list
head immediately before adding it to a list, etc.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
These appear to be generic (for both open and lock owners), but they're
actually just for open owners. This has confused me more than once.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The server is returning nfserr_resource for both permanent errors and
for errors (like allocation failures) that might be resolved by retrying
later. Save nfserr_resource for the former and use delay/jukebox for
the latter.
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The nfsd4 code has a bunch of special exceptions for error returns which
map nfserr_symlink to other errors.
In fact, the spec makes it clear that nfserr_symlink is to be preferred
over less specific errors where possible.
The patch that introduced it back in 2.6.4 is "kNFSd: correct symlink
related error returns.", which claims that these special exceptions are
represent an NFSv4 break from v2/v3 tradition--when in fact the symlink
error was introduced with v4.
I suspect what happened was pynfs tests were written that were overly
faithful to the (known-incomplete) rfc3530 error return lists, and then
code was fixed up mindlessly to make the tests pass.
Delete these unnecessary exceptions.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
CLAIM_DELEGATE_CUR is used in response to a broken lease; allowing it
to break the lease and return EAGAIN leaves the client unable to make
progress in returning the delegation
nfs4_get_vfs_file() now takes struct nfsd4_open for access to the
claim type, and calls nfsd_open() with NFSD_MAY_NOT_BREAK_LEASE when
claim type is CLAIM_DELEGATE_CUR
Signed-off-by: Casey Bodley <cbodley@citi.umich.edu>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Both the filesystem and the lock manager can associate operations with a
lock. Confusingly, one of them (fl_release_private) actually has the
same name in both operation structures.
It would save some confusion to give the lock-manager ops different
names.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Check in SEQUENCE that the request doesn't exceed maxreq_sz for the
given session.
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
According to RFC5661, 18.36.3,
"if the client selects a value for ca_maxresponsesize such that
a replier on a channel could never send a response,the server
SHOULD return NFS4ERR_TOOSMALL in the CREATE_SESSION reply."
So, error out when the client sets a maxreq_sz less than the minimum
possible SEQUENCE request size, or sets a maxresp_sz less than the
minimum possible SEQUENCE reply size.
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Stateid's hold a read reference for a read open, a write reference for a
write open, and an additional one of each for each read+write open. The
latter wasn't getting put on a downgrade, so something like:
open RW
open R
downgrade to R
was resulting in a file leak.
Also fix an imbalance in an error path.
Regression from 7d94784293 "nfsd4: fix
downgrade/lock logic".
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Without this, for example,
open read
open read+write
close
will result in a struct file leak.
Regression from 7d94784293 "nfsd4: fix
downgrade/lock logic".
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This operation is used by the client to check the validity of a list of
stateids.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This operation is used by the client to tell the server to free a
stateid.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* 'for-2.6.40' of git://linux-nfs.org/~bfields/linux: (22 commits)
nfsd: make local functions static
NFSD: Remove unused variable from nfsd4_decode_bind_conn_to_session()
NFSD: Check status from nfsd4_map_bcts_dir()
NFSD: Remove setting unused variable in nfsd_vfs_read()
nfsd41: error out on repeated RECLAIM_COMPLETE
nfsd41: compare request's opcnt with session's maxops at nfsd4_sequence
nfsd v4.1 lOCKT clientid field must be ignored
nfsd41: add flag checking for create_session
nfsd41: make sure nfs server process OPEN with EXCLUSIVE4_1 correctly
nfsd4: fix wrongsec handling for PUTFH + op cases
nfsd4: make fh_verify responsibility of nfsd_lookup_dentry caller
nfsd4: introduce OPDESC helper
nfsd4: allow fh_verify caller to skip pseudoflavor checks
nfsd: distinguish functions of NFSD_MAY_* flags
svcrpc: complete svsk processing on cb receive failure
svcrpc: take advantage of tcp autotuning
SUNRPC: Don't wait for full record to receive tcp data
svcrpc: copy cb reply instead of pages
svcrpc: close connection if client sends short packet
svcrpc: note network-order types in svc_process_calldir
...
This also fixes a number of sparse warnings.
Signed-off-by: Daniel Mack <zonque@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Compiling gave me this warning:
fs/nfsd/nfs4state.c: In function ‘nfsd4_bind_conn_to_session’:
fs/nfsd/nfs4state.c:1623:9: warning: variable ‘status’ set but not used
[-Wunused-but-set-variable]
The local variable "status" was being set by nfsd4_map_bcts_dir() and
then ignored before calling nfsd4_new_conn().
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Servers are supposed to return nfserr_complete_already to clients that
attempt to send multiple RECLAIM_COMPLETEs.
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Make sure nfs server errors out if request contains more ops
than channel allows.
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
[bfields@redhat.com: use helper function]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Teach the NFS server to reject invalid create_session flags.
Also do some minor formatting adjustments.
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
23fcf2ec93 (nfsd4: fix oops on lock failure)
The above patch breaks free path for stp->st_file. If stp was inserted
into sop->so_stateids, we have to free stp->st_file refcount. Because
stp->st_file refcount itself is taken whether or not any refcounts are
taken on the stp->st_file->fi_fds[].
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Introduced by acfdf5c383.
Cc: stable@kernel.org
Reported-by: Gerhard Heift <ml-nfs-linux-20110412-ef47@gheift.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
According to rfc5661,
ca_maxresponsesize_cached:
Like ca_maxresponsesize, but the maximum size of a reply that
will be stored in the reply cache (Section 2.10.6.1). For each
channel, the server MAY decrease this value, but MUST NOT
increase it.
the latest kernel(2.6.38-rc8) may increase the value for ignoring
request's ca_maxresponsesize_cached value. We should not ignore it.
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Make sure we properly reference count the struct files that a lock
depends on, and release them when the lock stateid is released.
This fixes a major leak of struct files when using locking over nfsv4.
Cc: stable@kernel.org
Reported-by: Rick Koshi <nfs-bug-report@more-right-rudder.com>
Tested-by: Ivo Přikryl <prikryl@eurosat.cz>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Minor cleanup in preparation for a bugfix--moving some code to avoid
forward references, etc. No change in functionality.
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
These macros had never been used for several years.
So, remove them.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In case of a nonempty list, the return on error here is obviously bogus;
it ends up being a pointer to the list head instead of to any valid
delegation on the list.
In particular, if nfsd4_delegreturn() hits this case, and you're quite unlucky,
then renew_client may oops, and it may take an embarassingly long time to
figure out why. Facepalm.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
IP: [<ffffffff81292965>] nfsd4_delegreturn+0x125/0x200
...
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Instead of acquiring one lease each time another client opens a file,
nfsd can acquire just one lease to represent all of them, and reference
count it to determine when to release it.
This fixes a regression introduced by
c45821d263 "locks: eliminate fl_mylease
callback": after that patch, only the struct file * is used to determine
who owns a given lease. But since we recently converted the server to
share a single struct file per open, if we acquire multiple leases on
the same file from nfsd, it then becomes impossible on unlocking a lease
to determine which of those leases (all of whom share the same struct
file *) we meant to remove.
Thanks to Takashi Iwai <tiwai@suse.de> for catching a bug in a previous
version of this patch.
Tested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Modify fi_delegations only under the recall_lock, allowing us to use
that list on lease breaks.
Also some trivial cleanup to simplify later changes.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: (62 commits)
nfsd4: fix callback restarting
nfsd: break lease on unlink, link, and rename
nfsd4: break lease on nfsd setattr
nfsd: don't support msnfs export option
nfsd4: initialize cb_per_client
nfsd4: allow restarting callbacks
nfsd4: simplify nfsd4_cb_prepare
nfsd4: give out delegations more quickly in 4.1 case
nfsd4: add helper function to run callbacks
nfsd4: make sure sequence flags are set after destroy_session
nfsd4: re-probe callback on connection loss
nfsd4: set sequence flag when backchannel is down
nfsd4: keep finer-grained callback status
rpc: allow xprt_class->setup to return a preexisting xprt
rpc: keep backchannel xprt as long as server connection
rpc: move sk_bc_xprt to svc_xprt
nfsd4: allow backchannel recovery
nfsd4: support BIND_CONN_TO_SESSION
nfsd4: modify session list under cl_lock
Documentation: fl_mylease no longer exists
...
Fix up conflicts in fs/nfsd/vfs.c with the vfs-scale work. The
vfs-scale work touched some msnfs cases, and this merge removes support
for that entirely, so the conflict was trivial to resolve.
If we lose the backchannel and then the client repairs the problem,
resend any callbacks.
We use a new cb_done flag to track whether there is still work to be
done for the callback or whether it can be destroyed with the rpc.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If this loses any backchannel, make sure we have a chance to notice that
and set the sequence flags.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Distinguish between when the callback channel is known to be down, and
when it is not yet confirmed. This will be useful in the 4.1 case.
Also, we don't seem to be using the fact that this field is atomic.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Now that we have a list of connections to choose from, we can teach the
callback code to just pick a suitable connection and use that, instead
of insisting on forever using the connection that the first
create_session was sent with.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Basic xdr and processing for BIND_CONN_TO_SESSION. This adds a
connection to the list of connections associated with a session.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
when callback is generated in NFSv4 server, it doesn't set the source
address. When an alias IP is utilized on NFSv4 server and suppose the
client is accessing via that alias IP (e.g. eth0:0), the client invokes
the callback to the IP address that is set on the original device (e.g.
eth0). This behavior results in timeout of xprt.
The patch sets the IP address that the client should invoke callback to.
Signed-off-by: Takuma Umeya <tumeya@redhat.com>
[bfields@redhat.com: Simplify gen_callback arguments, use helper function]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The nfs server only supports read delegations for now, so we don't care
how conflicts are determined. All we care is that unlocks are
recognized as matching the leases they are meant to remove. After the
last patch, a comparison of struct files will work for that purpose. So
we no longer need this callback.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When we converted to sharing struct filess between nfs4 opens I went too
far and also used the same mechanism for delegations. But keeping
a reference to the struct file ensures it will outlast the lease, and
allows us to remove the lease with the same file as we added it.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
nfsd controls the lifetime of the lease, not the lock code, so there's
no need for this callback on lease destruction.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Instead of failing to find client entries which don't match the
minorversion, we should be finding them, then either erroring out or
expiring them as appropriate.
This also fixes a problem which would cause the 4.1 server to fail to
recognize clients after a second reboot.
Reported-by: Casey Bodley <cbodley@citi.umich.edu>
Reviewed-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
cancel_rearming_delayed_work[queue]() has been superceded by
cancel_delayed_work_sync() quite some time ago. Convert all the
in-kernel users. The conversions are completely equivalent and
trivial.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: netdev@vger.kernel.org
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Alex Elder <aelder@sgi.com>
Cc: xfs-masters@oss.sgi.com
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: netfilter-devel@vger.kernel.org
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: linux-nfs@vger.kernel.org
At the latest kernel(2.6.37-rc1), server just initialize the forechannel
at init_forechannel_attrs, but don't reflect it to reply.
After initialize the session success, we should copy the forechannel info
to nfsd4_create_session struct.
Reviewed-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When server gets drc mem fail, it should reply error to client.
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The original code would oops if this were called from nfsd4_setattr()
because "filpp" is NULL.
(Note this case is currently impossible, as long as we only give out
read delegations.)
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Lock_kernel is gone from the code, so the comments should be updated,
too. nfsd now uses lock_flocks instead of lock_kernel to protect
against posix file locks.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If a connection is closed just after a sequence or create_session
is sent over it, we could end up trying to register a callback that will
never get called since the xprt is already marked dead.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The caller allocated it, the caller should free it.
The only issue so far is that we could change the flp pointer even on an
error return if the fl_change callback failed. But we can simply move
the flp assignment after the fl_change invocation, as the callers don't
care about the flp return value if the setlease call failed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The NFSv4 server was initializing the dp->dl_flock pointer by the
somewhat ridiculous method of a locks_copy_lock callback.
Now that setlease uses the passed-in lock instead of doing a copy,
dl_flock no longer gets set, resulting in the lock leaking on delegation
release, and later possible hangs (among other problems).
So, initialize dl_flock and get rid of the callback.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As suggested by Christoph Hellwig, this moves allocation
of new file locks out of generic_setlease into the
callers, nfs4_open_delegation and fcntl_setlease in order
to allow GFP_KERNEL allocations when lock_flocks has
become a spinlock.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: J. Bruce Fields <bfields@redhat.com>
* 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: (99 commits)
svcrpc: svc_tcp_sendto XPT_DEAD check is redundant
svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue
svcrpc: assume svc_delete_xprt() called only once
svcrpc: never clear XPT_BUSY on dead xprt
nfsd4: fix connection allocation in sequence()
nfsd4: only require krb5 principal for NFSv4.0 callbacks
nfsd4: move minorversion to client
nfsd4: delay session removal till free_client
nfsd4: separate callback change and callback probe
nfsd4: callback program number is per-session
nfsd4: track backchannel connections
nfsd4: confirm only on succesful create_session
nfsd4: make backchannel sequence number per-session
nfsd4: use client pointer to backchannel session
nfsd4: move callback setup into session init code
nfsd4: don't cache seq_misordered replies
SUNRPC: Properly initialize sock_xprt.srcaddr in all cases
SUNRPC: Use conventional switch statement when reclassifying sockets
sunrpc/xprtrdma: clean up workqueue usage
sunrpc: Turn list_for_each-s into the ..._entry-s
...
Fix up trivial conflicts (two different deprecation notices added in
separate branches) in Documentation/feature-removal-schedule.txt
We're doing an allocation under a spinlock, and ignoring the
possibility of allocation failure.
A better fix wouldn't require an unnecessary allocation in the common
case, but we'll leave that for later.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The minorversion seems more a property of the client than the callback
channel.
Some time we should probably also enforce consistent minorversion usage
from the client; for now, this is just a cosmetic change.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Have unhash_client_locked() remove client and associated sessions from
global hashes, but delay further dismantling till free_client().
(After unhash_client_locked(), the only remaining references outside the
destroying thread are from any connections which have xpt_user callbacks
registered.)
This will simplify locking on session destruction.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Only one of the nfsd4_callback_probe callers actually cares about
changing the callback information.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The callback program is allowed to depend on the session which the
callback is going over.
No change in behavior yet, while we still only do callbacks over a
single session for the lifetime of the client.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We need to keep track of which connections are available for use with
the backchannel, which for the forechannel, and which for both.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Following rfc 5661, section 18.36.4: "If the session is not successfully
created, then no changes are made to any client records on the server."
We shouldn't be confirming or incrementing the sequence id in this case.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently we don't deal well with a client that has multiple sessions
associated with it (even simultaneously, or serially over the lifetime
of the client).
In particular, we don't attempt to keep the backchannel running after
the original session diseappears.
We will fix that soon.
Once we do that, we need the slot sequence number to be per-session;
otherwise, for example, we cannot correctly handle a case like this:
- All session 1 connections are lost.
- The client creates session 2. We use it for the backchannel
(since it's the only working choice).
- The client gives us a new connection to use with session 1.
- The client destroys session 2.
At this point our only choice is to go back to using session 1. When we
do so we must use the sequence number that is next for session 1. We
therefore need to maintain multiple sequence number streams.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Instead of copying the sessionid, use the new cl_cb_session pointer,
which indicates which session we're using for the backchannel.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The backchannel should be associated with a session, it isn't really
global to the client.
We do, however, want a pointer global to the client which tracks which
session we're currently using for client-based callbacks.
This is a first step in that direction; for now, just reshuffling of
code with no significant change in behavior.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This prepares the removal of the big kernel lock from the
file locking code. We still use the BKL as long as fs/lockd
uses it and ceph might sleep, but we can flip the definition
to a private spinlock as soon as that's done.
All users outside of fs/lockd get converted to use
lock_flocks() instead of lock_kernel() where appropriate.
Based on an earlier patch to use a spinlock from Matthew
Wilcox, who has attempted this a few times before, the
earliest patch from over 10 years ago turned it into
a semaphore, which ended up being slower than the BKL
and was subsequently reverted.
Someone should do some serious performance testing when
this becomes a spinlock, since this has caused problems
before. Using a spinlock should be at least as good
as the BKL in theory, but who knows...
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Sage Weil <sage@newdream.net>
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Commit 78155ed75f "nfsd4: distinguish
expired from stale stateids" attempted to distinguish expired and stale
stateid's using time information that may not have been completely
reliable, so I reverted it.
That was throwing out the baby with the bathwater; we still do want to
return expired, but let's do that using the simpler approach of just
assuming any stateid is expired if it looks like it was given out by the
current server instance, but we can't find it any more.
This may help clients that are recovering from network partitions.
Reported-by: Bian Naimeng <biannm@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
As long as we're not implementing any session security, we should just
automatically add any new connections that come along to the list of
sessions associated with the session.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The spec requires us in various places to keep track of the connections
associated with each session.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Changes:
- make sure session memory reservation is released on failure
path.
- use min_t()/min() for more compact code in several places.
- break alloc_init_session into smaller pieces.
- miscellaneous other cleanup.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Instead of creating the new rpc client from a regular server thread,
set a flag, kick off a null call, and allow the null call to do the work
of setting up the client on the callback workqueue.
Use a spinlock to ensure the callback work gets a consistent view of the
callback parameters.
This allows, for example, changing the callback from contexts where
sleeping is not allowed. I hope it will also keep the locking simple as
we add more session and trunking features, by serializing most of the
callback-specific work.
This also closes a small race where the the new cb_ident could be used
with an old connection (or vice-versa).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This will eventually allow us, for example, to kick off null callback
from contexts where we can't sleep.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Now that we have both nfsd4_callback and nfsd4_cb_conn structures, I get
confused if variables of both types are always named cb....
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
If we already had a RW open for a file, and get a readonly open, we were
piggybacking on the existing RW open. That's inconsistent with the
downgrade logic which blows away the RW open assuming you'll still have
a readonly open.
Also, make sure there is a readonly or writeonly open available for
locking, again to prevent bad behavior in downgrade cases when any RW
open may be lost.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
It's OK for this function to return without setting filp--we do it in
the special-stateid case.
And there's a legitimate case where we can hit this, since we do permit
reads on write-only stateid's.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* 'for-2.6.36' of git://linux-nfs.org/~bfields/linux: (34 commits)
nfsd4: fix file open accounting for RDWR opens
nfsd: don't allow setting maxblksize after svc created
nfsd: initialize nfsd versions before creating svc
net: sunrpc: removed duplicated #include
nfsd41: Fix a crash when a callback is retried
nfsd: fix startup/shutdown order bug
nfsd: minor nfsd read api cleanup
gcc-4.6: nfsd: fix initialized but not read warnings
nfsd4: share file descriptors between stateid's
nfsd4: fix openmode checking on IO using lock stateid
nfsd4: miscellaneous process_open2 cleanup
nfsd4: don't pretend to support write delegations
nfsd: bypass readahead cache when have struct file
nfsd: minor nfsd_svc() cleanup
nfsd: move more into nfsd_startup()
nfsd: just keep single lockd reference for nfsd
nfsd: clean up nfsd_create_serv error handling
nfsd: fix error handling in __write_ports_addxprt
nfsd: fix error handling when starting nfsd with rpcbind down
nfsd4: fix v4 state shutdown error paths
...
Commit f9d7562fdb "nfsd4: share file
descriptors between stateid's" didn't correctly account for O_RDWR opens.
Symptoms include leaked files, resulting in failures to unmount and/or
warnings about orphaned inodes on reboot.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Fixes at least one real minor bug: the nfs4 recovery dir sysctl
would not return its status properly.
Also I finished Al's 1e41568d73 ("Take ima_path_check() in nfsd
past dentry_open() in nfsd_open()") commit, it moved the IMA
code, but left the old path initializer in there.
The rest is just dead code removed I think, although I was not
fully sure about the "is_borc" stuff. Some more review
would be still good.
Found by gcc 4.6's new warnings.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The vfs doesn't really allow us to "upgrade" a file descriptor from
read-only to read-write, and our attempt to do so in nfs4_upgrade_open
is ugly and incomplete.
Move to a different scheme where we keep multiple opens, shared between
open stateid's, in the nfs4_file struct. Each file will be opened at
most 3 times (for read, write, and read-write), and those opens will be
shared between all clients and openers. On upgrade we will do another
open if necessary instead of attempting to upgrade an existing open.
We keep count of the number of readers and writers so we know when to
close the shared files.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
It is legal to perform a write using the lock stateid that was
originally associated with a read lock, or with a file that was
originally opened for read, but has since been upgraded.
So, when checking the openmode, check the mode associated with the
open stateid from which the lock was derived.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The delegation code mostly pretends to support either read or write
delegations. However, correct support for write delegations would
require, for example, breaking of delegations (and/or implementation of
cb_getattr) on stat. Currently all that stops us from handing out
delegations is a subtle reference-counting issue.
Avoid confusion by adding an earlier check that explicitly refuses write
delegations.
For now, though, I'm not going so far as to rip out existing
half-support for write delegations, in case we get around to using that
soon.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If someone tries to shut down the laundry_wq while it isn't up it'll
cause an oops.
This can happen because write_ports can create a nfsd_svc before we
really start the nfs server, and we may fail before the server is ever
started.
Also make sure state is shutdown on error paths in nfsd_svc().
Use a common global nfsd_up flag instead of nfs4_init, and create common
helper functions for nfsd start/shutdown, as there will be other work
that we want done only when we the number of nfsd threads transitions
between zero and nonzero.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If the server is out of memory is better for clients to back off and
retry than to just error out.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This reportedly causes a lockdep warning on nfsd shutdown. That looks
like a false positive to me, but there's no reason why this needs the
state lock anyway.
Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
NFSv4.1 adds additional flags to the share_access argument of the open
call. These flags need to be masked out in some of the existing code,
but current code does that inconsistently.
Tested-by: Michael Groshans <groshans@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This reverts commit 78155ed75f.
We're depending here on the boot time that we use to generate the
stateid being monotonic, but get_seconds() is not necessarily.
We still depend at least on boot_time being different every time, but
that is a safer bet.
We have a few reports of errors that might be explained by this problem,
though we haven't been able to confirm any of them.
But the minor gain of distinguishing expired from stale errors seems not
worth the risk.
Conflicts:
fs/nfsd/nfs4state.c
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The alloc_init_file() first adds a file to the hash and then
initializes its fi_inode, fi_id and fi_had_conflict.
The uninitialized fi_inode could thus be erroneously checked by
the find_file(), so move the hash insertion lower.
The client_mutex should prevent this race in practice; however, we
eventually hope to make less use of the client_mutex, so the ordering
here is an accident waiting to happen.
I didn't find whether the same can be true for two other fields,
but the common sense tells me it's better to initialize an object
before putting it into a global hash table :)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This is a mandatory operation. Also, here (not in open) is where we
should be committing the reboot recovery information.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
nfsd4_set_callback_client must be called under the state lock to atomically
set or unset the callback client and shutting down the previous one.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Get a refcount on the client on SEQUENCE,
Release the refcount and renew the client when all respective compounds completed.
Do not expire the client by the laundromat while in use.
If the client was expired via another path, free it when the compounds
complete and the refcount reaches 0.
Note that unhash_client_locked must call list_del_init on cl_lru as
it may be called twice for the same client (once from nfs4_laundromat
and then from expire_client)
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Mark the client as expired under the client_lock so it won't be renewed
when an nfsv4.1 session is done, after it was explicitly expired
during processing of the compound.
Do not renew a client mark as expired (in particular, it is not
on the lru list anymore)
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Currently just initialize the cl_refcount to 1
and decrement in expire_client(), conditionally freeing the
client when the refcount reaches 0.
To be used later by nfsv4.1 compounds to keep the client from
timing out while in use.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Separate out unhashing of the client and session.
To be used later by the laundromat.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
To be used later on to hold a reference count on the client while in use by a
nfsv4.1 compound.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
and grab the client lock once for all the client's sessions.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
In preparation to share the lock's scope to both client
and session hash tables.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
It's legal to send a DESTROY_SESSION outside any session (as the only
operation in a compound), in which case cstate->session will be NULL;
check for that case.
While we're at it, move these checks into a separate helper function.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
In the replay case, the
renew_client(session->se_client);
happens after we've droppped the sessionid_lock, and without holding a
reference on the session; so there's nothing preventing the session
being freed before we get here.
Thanks to Benny Halevy for catching a bug in an earlier version of this
patch.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Benny Halevy <bhalevy@panasas.com>
Enforce the rules about compound op ordering.
Motivated by implementing RECLAIM_COMPLETE, for which the client is
implicit in the current session, so it is important to ensure a
succesful SEQUENCE proceeds the RECLAIM_COMPLETE.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The rfc allows a client to change the callback parameters, but we didn't
previously implement it.
Teach the callbacks to rerun themselves (by placing themselves on a
workqueue) when they recognize that their rpc task has been killed and
that the callback connection has changed.
Then we can change the callback connection by setting up a new rpc
client, modifying the nfs4 client to point at it, waiting for any work
in progress to complete, and then shutting down the old client.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Now that the shutdown sequence guarantees callbacks are shut down before
the client is destroyed, we no longer have a use for cl_count.
We'll probably reinstate a reference count on the client some day, but
it will be held by users other than callbacks.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The NFSv4 server's fl_break callback can sleep (dropping the BKL), in
order to allocate a new rpc task to send a recall to the client.
As far as I can tell this doesn't cause any races in the current code,
but the analysis is difficult. Also, the sleep here may complicate the
move away from the BKL.
So, just schedule some work to do the job for us instead. The work will
later also prove useful for restarting a call after the callback
information is changed.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Once we've expired the client, there's no further purpose to the
callbacks; go ahead and shut down the callback client rather than
waiting for the last reference to go.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
The original code here assumed we'd allow the user to change the lease
any time, but only allow the change to take effect on restart. Since
then we modified the code to allow setting the lease on when the server
is down. Update the rest of the code to reflect that fact, clarify
variable names, and add document.
Also, the code insisted that the grace period always be the longer of
the old and new lease periods, but that's overly conservative--as long
as it lasts at least the old lease period, old clients should still know
to recover in time.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Instead of accessing the lease time directly, some users call
nfs4_lease_time(), and some a macro, NFSD_LEASE_TIME, defined as
nfs4_lease_time(). Neither layer of indirection serves any purpose.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: (22 commits)
nfsd4: fix minor memory leak
svcrpc: treat uid's as unsigned
nfsd: ensure sockets are closed on error
Revert "sunrpc: move the close processing after do recvfrom method"
Revert "sunrpc: fix peername failed on closed listener"
sunrpc: remove unnecessary svc_xprt_put
NFSD: NFSv4 callback client should use RPC_TASK_SOFTCONN
xfs_export_operations.commit_metadata
commit_metadata export operation replacing nfsd_sync_dir
lockd: don't clear sm_monitored on nsm_reboot_lookup
lockd: release reference to nsm_handle in nlm_host_rebooted
nfsd: Use vfs_fsync_range() in nfsd_commit
NFSD: Create PF_INET6 listener in write_ports
SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found"
SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt()
NFSD: Support AF_INET6 in svc_addsock() function
SUNRPC: Use rpc_pton() in ip_map_parse()
nfsd: 4.1 has an rfc number
nfsd41: Create the recovery entry for the NFSv4.1 client
nfsd: use vfs_fsync for non-directories
...
We'll introduce FMODE_RANDOM which will be runtime modified. So protect
all runtime modification to f_mode with f_lock to avoid races.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@kernel.org> [2.6.33.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The new .h files have paths at the top that are now out of date. While
we're here, just remove all of those from fs/nfsd; they never served any
purpose.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Lots of include/linux/nfsd/* headers are only used by
nfsd module. Move them to the source directory
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Now that the headers are fixed and carry their own wait, all fs/nfsd/
source files can include a minimal set of headers. and still compile just
fine.
This patch should improve the compilation speed of the nfsd module.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
None of this stuff is used outside nfsd, so move it out of the common
linux include directory.
Actually, probably none of the stuff in include/linux/nfsd/nfsd.h really
belongs there, so later we may remove that file entirely.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
We do the same calculation in a couple places; use a helper function,
and add a little documentation, in the hopes of preventing bugs like
that fixed in the last patch.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Unbalanced calculations on creation and destruction of sessions could
cause our estimate of cache memory used to become negative, sometimes
resulting in spurious SERVERFAULT returns to client CREATE_SESSION
requests.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
ca_maxresponsesize and ca_maxrequest size include the RPC header.
sv_max_mesg is sv_max_payolad plus a page for overhead and is used in
svc_init_buffer to allocate server buffer space for both the request and reply.
Note that this means we can service an RPC compound that requires
ca_maxrequestsize (MAXWRITE) or ca_max_responsesize (MAXREAD) but that we do
not support an RPC compound that requires both ca_maxrequestsize and
ca_maxresponsesize.
Signed-off-by: Andy Adamson <andros@netapp.com>
[bfields@citi.umich.edu: more documentation updates]
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* 'for-2.6.32' of git://linux-nfs.org/~bfields/linux: (68 commits)
nfsd4: nfsv4 clients should cross mountpoints
nfsd: revise 4.1 status documentation
sunrpc/cache: avoid variable over-loading in cache_defer_req
sunrpc/cache: use list_del_init for the list_head entries in cache_deferred_req
nfsd: return success for non-NFS4 nfs4_state_start
nfsd41: Refactor create_client()
nfsd41: modify nfsd4.1 backchannel to use new xprt class
nfsd41: Backchannel: Implement cb_recall over NFSv4.1
nfsd41: Backchannel: cb_sequence callback
nfsd41: Backchannel: Setup sequence information
nfsd41: Backchannel: Server backchannel RPC wait queue
nfsd41: Backchannel: Add sequence arguments to callback RPC arguments
nfsd41: Backchannel: callback infrastructure
nfsd4: use common rpc_cred for all callbacks
nfsd4: allow nfs4 state startup to fail
SUNRPC: Defer the auth_gss upcall when the RPC call is asynchronous
nfsd4: fix null dereference creating nfsv4 callback client
nfsd4: fix whitespace in NFSPROC4_CLNT_CB_NULL definition
nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel
sunrpc/cache: simplify cache_fresh_locked and cache_fresh_unlocked.
...
Move common initialization of 'struct nfs4_client' inside create_client().
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
[nfsd41: Remember the auth flavor to use for callbacks]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Follows the model used by the NFS client. Setup the RPC prepare and done
function pointers so that we can populate the sequence information if
minorversion == 1. rpc_run_task() is then invoked directly just like
existing NFS client operations do.
nfsd4_cb_prepare() determines if the sequence information needs to be setup.
If the slot is in use, it adds itself to the wait queue.
nfsd4_cb_done() wakes anyone sleeping on the callback channel wait queue
after our RPC reply has been received. It also sets the task message
result pointer to NULL to clearly indicate we're done using it.
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
[define and initialize cl_cb_seq_nr here]
[pulled out unused defintion of nfsd4_cb_done]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
RPC callback requests will wait on this wait queue if the backchannel
is out of slots.
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Keep the xprt used for create_session in cl_cb_xprt.
Mark cl_callback.cb_minorversion = 1 and remember
the client provided cl_callback.cb_prog rpc program number.
Use it to probe the callback path.
Use the client's network address to initialize as the
callback's address as expected by the xprt creation
routines.
Define xdr sizes and code nfs4_cb_compound header to be able
to send a null callback rpc.
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
[get callback minorversion from fore channel's]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: change bc_sock to bc_xprt]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pulled definition for cl_cb_xprt]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: set up backchannel's cb_addr]
[moved rpc_create_args init to "nfsd: modify nfsd4.1 backchannel to use new xprt class"]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Callbacks are always made using the machine's identity, so we can use a
single auth_generic credential shared among callbacks to all clients and
let the rpc code take care of the rest.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Use NFSD_SLOT_CACHE_SIZE size buffers for sessions DRC instead of holding nfsd
pages in cache.
Connectathon testing has shown that 1024 bytes for encoded compound operation
responses past the sequence operation is sufficient, 512 bytes is a little too
small. Set NFSD_SLOT_CACHE_SIZE to 1024.
Allocate memory for the session DRC in the CREATE_SESSION operation
to guarantee that the memory resource is available for caching responses.
Allocate each slot individually in preparation for slot table size negotiation.
Remove struct nfsd4_cache_entry and helper functions for the old page-based
DRC.
The iov_len calculation in nfs4svc_encode_compoundres is now always
correct. Replay is now done in nfsd4_sequence under the state lock, so
the session ref count is only bumped on non-replay. Clean up the
nfs4svc_encode_compoundres session logic.
The nfsd4_compound_state statp pointer is also not used.
Remove nfsd4_set_statp().
Move useful nfsd4_cache_entry fields into nfsd4_slot.
Signed-off-by: Andy Adamson <andros@netapp.com
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
nfserr_resource is not a legal error for NFSv4.1. Replace it with
nfserr_serverfault for EXCHANGE_ID and CREATE_SESSION processing.
We will also need to map nfserr_resource to other errors in routines shared
by NFSv4.0 and NFSv4.1
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This fixes a bug in the sequence operation reply.
The sequence operation returns the highest slotid it will accept in the future
in sr_highest_slotid, and the highest slotid it prefers the client to use.
Since we do not re-negotiate the session slot table yet, these should both
always be set to the session ca_maxrequests.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
By using the requested ca_maxresponsesize_cached * ca_maxresponses to bound
a forechannel drc request size, clients can tailor a session to usage.
For example, an I/O session (READ/WRITE only) can have a much smaller
ca_maxresponsesize_cached (for only WRITE compound responses) and a lot larger
ca_maxresponses to service a large in-flight data window.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Compounds consisting of only a sequence operation don't need any
additional caching beyond the sequence information we store in the slot
entry. Fix nfsd4_is_solo_sequence to identify this case correctly.
The additional check for a failed sequence in nfsd4_store_cache_entry()
is redundant, since the nfsd4_is_solo_sequence call lower down catches
this case.
The final ce_cachethis set in nfsd4_sequence is also redundant.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Until we work out the state locking so we can use a spin lock to protect
the cl_lru, we need to take the state_lock to renew the client.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: Do not renew state on error]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: Simplify exit code]
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
When a SETCLIENTID call comes in, one of the args given is the svc_rqst.
This struct contains an rq_addr field which holds the address that sent
the call. If this is an IPv6 address, then we can use the sin6_scope_id
field in this address to populate the sin6_scope_id field in the
callback address.
AFAICT, the rq_addr.sin6_scope_id is non-zero if and only if the client
mounted the server's link-local address.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The framework to add this is all in place. Now, add the code to allow
support for establishing a callback channel on an IPv6 socket.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
...rather than as a separate address and port fields. This will be
necessary for implementing callbacks over IPv6. Also, convert
gen_callback to use the standard rpcuaddr2sockaddr routine rather than
its own private one.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
It's currently a __be32, which isn't big enough to hold an IPv6 address.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The sequence operation is not cached; always encode the sequence operation on
a replay from the slot table and session values. This simplifies the sessions
replay logic in nfsd4_proc_compound.
If this is a replay of a compound that was specified not to be cached, return
NFS4ERR_RETRY_UNCACHED_REP.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Instead of trying to share the generic 4.1 reply cache code for the
CREATE_SESSION reply cache, it's simpler to handle CREATE_SESSION
separately.
The nfs41 single slot clientid DRC holds the results of create session
processing. CREATE_SESSION can be preceeded by a SEQUENCE operation
(an embedded CREATE_SESSION) and the create session single slot cache must be
maintained. nfsd4_replay_cache_entry() and nfsd4_store_cache_entry() do not
implement the replay of an embedded CREATE_SESSION.
The clientid DRC slot does not need the inuse, cachethis or other fields that
the multiple slot session cache uses. Replace the clientid DRC cache struct
nfs4_slot cache with a new nfsd4_clid_slot cache. Save the xdr struct
nfsd4_create_session into the cache at the end of processing, and on a replay,
replace the struct for the replay request with the cached version all while
under the state lock.
nfsd4_proc_compound will handle both the solo and embedded CREATE_SESSION case
via the normal use of encode_operation.
Errors that do not change the create session cache:
A create session NFS4ERR_STALE_CLIENTID error means that a client record
(and associated create session slot) could not be found and therefore can't
be changed. NFSERR_SEQ_MISORDERED errors do not change the slot cache.
All other errors get cached.
Remove the clientid DRC specific check in nfs4svc_encode_compoundres to
put the session only if cstate.session is set which will now always be true.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
For separation of session slot and clientid slot processing.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
NFSD_SLOT_CACHE_SIZE is the size of all encoded operation responses
(excluding the sequence operation) that we want to cache.
For now, keep NFSD_SLOT_CACHE_SIZE at PAGE_SIZE. It will be reduced
when the DRC is changed from page based to memory based.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This fixes a leak which would eventually lock out new clients.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The version 4.1 DRC memory limit and tracking variables are server wide and
session specific. Replace struct svc_serv fields with globals.
Stop using the svc_serv sv_lock.
Add a spinlock to serialize access to the DRC limit management variables which
change on session creation and deletion (usage counter) or (future)
administrative action to adjust the total DRC memory limit.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Prepare to share backchannel code with NFSv4.1.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
[nfsd41: use nfsd4_cb_sequence for callback minorversion]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Verified that cthon and pynfs exchange id tests pass (except for the
two expected fails: EID8 and EID50)
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Ensure the client requested maximum requests are between 1 and
NFSD_MAX_SLOTS_PER_SESSION
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
the change is valid for both the forechannel and the backchannel (currently dummy)
Signed-off-by: Alexandros Batsakis <Alexandros.Batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The session and slots are allocated all in one piece.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
There's no point in keeping this field around--it's always zero.
(Background: the protocol allows you to tell the client that the file is
about to be truncated, as an optimization to save the client from
writing back dirty pages that will just be discarded. We don't
implement this hint. If we do some day, adding this field back in will
be the least of the work involved.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The nfs4_cb_recall struct is used only in nfs4_delegation, so its
pointer to the containing delegation is unnecessary--we could just use
container_of().
But there's no real reason to have this a separate struct at all--just
move these fields to nfs4_delegation.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
I want to use the name for a struct that actually does represent a
single callback.
(Actually, I've never been sure it helps to a separate struct for the
callback information. Some day maybe those fields could just be dumped
into struct nfs4_client. I don't know.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This setclientid_confirm case should allow the client to change
callbacks, but it currently has a dummy implementation that just turns
off callbacks completely. That dummy implementation isn't completely
correct either, though:
- There's no need to remove any client recovery directory in
this case.
- New clientid confirm verifiers should be generated (and
returned) in setclientid; there's no need to generate a new
one here.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Stephen Rothwell said:
"Today's linux-next build (powerpc ppc64_defconfig) produced this new
warning:
fs/nfsd/nfs4state.c: In function 'EXPIRED_STATEID':
fs/nfsd/nfs4state.c:2757: warning: comparison of distinct pointer types lacks a cast
Caused by commit 78155ed75f ("nfsd4:
distinguish expired from stale stateids")."
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
If we encode the time of client creation into the stateid instead of the
time of server boot, then we can determine whether that stateid is from
a previous instance of the a server, or from a client that has expired,
and return an appropriate error to the client.
Signed-off-by: Bian Naimeng <biannm@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Separate the access bits from the want bits and enable __set_bit to
work correctly with st_access_bmap.
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
For nfs41, the open share flags are used also for
delegation "wants" and "signals". Check that they are valid.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Extract the clientid from sessionid to set the op_clientid on open.
Verify that the clid for other stateful ops is zero for minorversion != 0
Do all other checks for stateful ops without sessions.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
[fixed whitespace indent]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41 remove sl_session from nfsd4_open]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
When sessions are used, stateful operation sequenceid and stateid handling
are not used. When sessions are used, on the first open set the seqid to 1,
mark state confirmed and skip seqid processing.
When sessionas are used the stateid generation number is ignored when it is zero
whereas without sessions bad_stateid or stale stateid is returned.
Add flags to propagate session use to all stateful ops and down to
check_stateid_generation.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfsd4_has_session should return a boolean, not u32]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: pass nfsd4_compoundres * to nfsd4_process_open1]
[nfsd41: calculate HAS_SESSION in nfs4_preprocess_stateid_op]
[nfsd41: calculate HAS_SESSION in nfs4_preprocess_seqid_op]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Currently we only use cstate->current_fh,
will also be used by nfsd41 code.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Implement the destory_session operation confoming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26
[use sessionid_lock spin lock]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
A session inactivity time compound (lease renewal) or a compound where the
sequence operation has sa_cachethis set to FALSE do not require any pages
to be held in the v4.1 DRC. This is because struct nfsd4_slot is already
caching the session information.
Add logic to the nfs41 server to not cache response pages for solo sequence
responses.
Return nfserr_replay_uncached_rep on the operation following the sequence
operation when sa_cachethis is FALSE.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use cstate session in nfsd4_replay_cache_entry]
[nfsd41: rename nfsd4_no_page_in_cache]
[nfsd41 rename nfsd4_enc_no_page_replay]
[nfsd41 nfsd4_is_solo_sequence]
[nfsd41 change nfsd4_not_cached return]
Signed-off-by: Andy Adamson <andros@netapp.com>
[changed return type to bool]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41 drop parens in nfsd4_is_solo_sequence call]
Signed-off-by: Andy Adamson <andros@netapp.com>
[changed "== 0" to "!"]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Replace the nfs4_client cl_seqid field with a single struct nfs41_slot used
for the create session replay cache.
The CREATE_SESSION slot sets the sl_session pointer to NULL. Otherwise, the
slot and it's replay cache are used just like the session slots.
Fix unconfirmed create_session replay response by initializing the
create_session slot sequence id to 0.
A future patch will set the CREATE_SESSION cache when a SEQUENCE operation
preceeds the CREATE_SESSION operation. This compound is currently only cached
in the session slot table.
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use bool inuse for slot state]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: revert portion of nfsd4_set_cache_entry]
Signed-off-by: Andy Adamson <andros@netpp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Implement the create_session operation confoming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26
Look up the client id (generated by the server on exchange_id,
given by the client on create_session).
If neither a confirmed or unconfirmed client is found
then the client id is stale
If a confirmed cilent is found (i.e. we already received
create_session for it) then compare the sequence id
to determine if it's a replay or possibly a mis-ordered rpc.
If the seqid is in order, update the confirmed client seqid
and procedd with updating the session parameters.
If an unconfirmed client_id is found then verify the creds
and seqid. If both match move the client id to confirmed state
and proceed with processing the create_session.
Currently, we do not support persistent sessions, and RDMA.
alloc_init_session generates a new sessionid and creates
a session structure.
NFSD_PAGES_PER_SLOT is used for the max response cached calculation, and for
the counting of DRC pages using the hard limits set in struct srv_serv.
A note on NFSD_PAGES_PER_SLOT:
Other patches in this series allow for NFSD_PAGES_PER_SLOT + 1 pages to be
cached in a DRC slot when the response size is less than NFSD_PAGES_PER_SLOT *
PAGE_SIZE but xdr_buf pages are used. e.g. a READDIR operation will encode a
small amount of data in the xdr_buf head, and then the READDIR in the xdr_buf
pages. So, the hard limit calculation use of pages by a session is
underestimated by the number of cached operations using the xdr_buf pages.
Yet another patch caches no pages for the solo sequence operation, or any
compound where cache_this is False. So the hard limit calculation use of
pages by a session is overestimated by the number of these operations in the
cache.
TODO: improve resource pre-allocation and negotiate session
parameters accordingly. Respect and possibly adjust
backchannel attributes.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
[nfsd41: remove headerpadsz from channel attributes]
Our client and server only support a headerpadsz of 0.
[nfsd41: use DRC limits in fore channel init]
[nfsd41: do not change CREATE_SESSION back channel attrs]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[use sessionid_lock spin lock]
[nfsd41: use bool inuse for slot state]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41 remove sl_session from alloc_init_session]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[simplify nfsd4_encode_create_session error handling]
[nfsd41: fix comment style in init_forechannel_attrs]
[nfsd41: allocate struct nfsd4_session and slot table in one piece]
[nfsd41: no need to INIT_LIST_HEAD in alloc_init_session just prior to list_add]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Replay a request in nfsd4_sequence.
Add a minorversion to struct nfsd4_compound_state.
Pass the current slot to nfs4svc_encode_compound res via struct
nfsd4_compoundres to set an NFSv4.1 DRC entry.
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use bool inuse for slot state]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use cstate session in nfs4svc_encode_compoundres]
[nfsd41 replace nfsd4_set_cache_entry]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cache all the result pages, including the rpc header in rq_respages[0],
for a request in the slot table cache entry.
Cache the statp pointer from nfsd_dispatch which points into rq_respages[0]
just past the rpc header. When setting a cache entry, calculate and save the
length of the nfs data minus the rpc header for rq_respages[0].
When replaying a cache entry, replace the cached rpc header with the
replayed request rpc result header, unless there is not enough room in the
cached results first page. In that case, use the cached rpc header.
The sessions fore channel maxresponse size cached is set to NFSD_PAGES_PER_SLOT
* PAGE_SIZE. For compounds we are cacheing with operations such as READDIR
that use the xdr_buf->pages to hold data, we choose to cache the extra page of
data rather than copying data from xdr_buf->pages into the xdr_buf->head page.
[nfsd41: limit cache to maxresponsesize_cached]
[nfsd41: mv nfsd4_set_statp under CONFIG_NFSD_V4_1]
[nfsd41: rename nfsd4_move_pages]
[nfsd41: rename page_no variable]
[nfsd41: rename nfsd4_set_cache_entry]
[nfsd41: fix nfsd41_copy_replay_data comment]
[nfsd41: add to nfsd4_set_cache_entry]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andy Adamson<andros@netapp.com>
[nfsd41: do not verify nfserr_sequence_pos for minorversion 0]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Implement the sequence operation conforming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26
Check for stale clientid (as derived from the sessionid).
Enforce slotid range and exactly-once semantics using
the slotid and seqid.
If everything went well renew the client lease and
mark the slot INPROGRESS.
Add a struct nfsd4_slot pointer to struct nfsd4_compound_state.
To be used for sessions DRC replay.
[nfsd41: rename sequence catchthis to cachethis]
Signed-off-by: Andy Adamson<andros@netapp.com>
[pulled some code to set cstate->slot from "nfsd DRC logic"]
[use sessionid_lock spin lock]
[nfsd41: use bool inuse for slot state]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd: add a struct nfsd4_slot pointer to struct nfsd4_compound_state]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: add nfsd4_session pointer to nfsd4_compound_state]
[nfsd41: set cstate session]
[nfsd41: use cstate session in nfsd4_sequence]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[simplify nfsd4_encode_sequence error handling]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
We need to distinguish between client names provided by NFSv4.0 clients
SETCLIENTID and those provided by NFSv4.1 via EXCHANGE_ID when looking
up the clientid by string.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfsd41: use boolean values for use_exchange_id argument]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: simplify match_clientid_establishment logic]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Implement the exchange_id operation confoming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-28
Based on the client provided name, hash a client id.
If a confirmed one is found, compare the op's creds and
verifier. If the creds match and the verifier is different
then expire the old client (client re-incarnated), otherwise,
if both match, assume it's a replay and ignore it.
If an unconfirmed client is found, then copy the new creds
and verifer if need update, otherwise assume replay.
The client is moved to a confirmed state on create_session.
In the nfs41 branch set the exchange_id flags to
EXCHGID4_FLAG_USE_NON_PNFS | EXCHGID4_FLAG_SUPP_MOVED_REFER
(pNFS is not supported, Referrals are supported,
Migration is not.).
Address various scenarios from section 18.35 of the spec:
1. Check for EXCHGID4_FLAG_UPD_CONFIRMED_REC_A and set
EXCHGID4_FLAG_CONFIRMED_R as appropriate.
2. Return error codes per 18.35.4 scenarios.
3. Update client records or generate new client ids depending on
scenario.
Note: 18.35.4 case 3 probably still needs revisiting. The handling
seems not quite right.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Andy Adamosn <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use utsname for major_id (and copy to server_scope)]
[nfsd41: fix handling of various exchange id scenarios]
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: reverse use of EXCHGID4_INVAL_FLAG_MASK_A]
[simplify nfsd4_encode_exchange_id error handling]
[nfsd41: embed an xdr_netobj in nfsd4_exchange_id]
[nfsd41: return nfserr_serverfault for spa_how == SP4_MACH_CRED]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Simple sessionid hashing using its monotonically increasing sequence number.
Locking considerations:
sessionid_hashtbl access is controlled by the sessionid_lock spin lock.
It must be taken for insert, delete, and lookup.
nfsd4_sequence looks up the session id and if the session is found,
it calls nfsd4_get_session (still under the sessionid_lock).
nfsd4_destroy_session calls nfsd4_put_session after unhashing
it, so when the session's kref reaches zero it's going to get freed.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[we don't use a prime for sessionid hash table size]
[use sessionid_lock spin lock]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This patch provides basic data structures representing the nfs41
sessions and slots, plus helpers for keeping a reference count
on the session and freeing it.
Note that our server only support a headerpadsz of 0 and
it ignores backchannel attributes at the moment.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: remove headerpadsz from channel attributes]
[nfsd41: embed nfsd4_channel in nfsd4_session]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: use bool inuse for slot state]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41 remove sl_session from nfsd4_slot]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The spec allows clients to change ip address, so we shouldn't be
requiring that setclientid always come from the same address. For
example, a client could reboot and get a new dhcpd address, but still
present the same clientid to the server. In that case the server should
revoke the client's previous state and allow it to continue, instead of
(as it currently does) returning a CLID_INUSE error.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
As part of reducing the scope of the client_mutex, and in order to
remove the need for mutexes from the callback code (so that callbacks
can be done as asynchronous rpc calls), move manipulations of the
file_hashtable under the recall_lock.
Update the relevant comments while we're here.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Alexandros Batsakis <batsakis@netapp.com>
Reviewed-by: Benny Halevy <bhalevy@panasas.com>
Previous cleanup reveals an obvious (though harmless) bug: when
delegreturn gets a stateid that isn't for a delegation, it should return
an error rather than doing nothing.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Delegreturn is enough a special case for preprocess_stateid_op to
warrant just open-coding it in delegreturn.
There should be no change in behavior here; we're just reshuffling code.
Thanks to Yang Hongyang for catching a critical typo.
Reviewed-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
I can't recall ever seeing these printk's used to debug a problem. I'll
happily put them back if we see a case where they'd be useful. (Though
if we do that the find_XXX() errors would probably be better
reported in find_XXX() functions themselves.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Note that we exit this first big "if" with stp == NULL if and only if we
took the first branch; therefore, the second "if" is redundant, and we
can just combine the two, simplifying the logic.
Reviewed-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The caller always knows specifically whether it's releasing a lockowner
or an openowner, and the code is simpler if we use separate functions
(and the apparent recursion is gone).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The flags here attempt to make the code more general, but I find it
actually just adds confusion.
I think it's clearer to separate the logic for the open and lock cases
entirely. And eventually we may want to separate the stateowner and
stateid types as well, as many of the fields aren't shared between the
lock and open cases.
Also move to eliminate forward references.
Start with the stateid's.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reviewed-by: Benny Halevy <bhalevy@panasas.com>
nfsd4_lockt does a search for a lockstateowner when building the lock
struct to test. If one is found, it'll set fl_owner to it. Regardless of
whether that happens, it'll also set fl_lmops. Given that this lock is
basically a "lightweight" lock that's just used for checking conflicts,
setting fl_lmops is probably not appropriate for it.
This behavior exposed a bug in DLM's GETLK implementation where it
wasn't clearing out the fields in the file_lock before filling in
conflicting lock info. While we were able to fix this in DLM, it
still seems pointless and dangerous to set the fl_lmops this way
when we may have a NULL lockstateowner.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@pig.fieldses.org>
refactor the nfs4 server lock code to use last_byte_offset
to compute the last byte covered by the lock. Check for overflow
so that the last byte is set to NFS4_MAX_UINT64 if offset + len
wraps around.
Also, use NFS4_MAX_UINT64 for ~(u64)0 where appropriate.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Since nfsv4 allows LOCKT without an open, but the ->lock() method is a
file method, we fake up a struct file in the nfsv4 code with just the
fields we need initialized. But we forgot to initialize the file
operations, with the result that LOCKT never results in a call to the
filesystem's ->lock() method (if it exists).
We could just add that one more initialization. But this hack of faking
up a struct file with only some fields initialized seems the kind of
thing that might cause more problems in the future. We should either do
an open and get a real struct file, or make lock-testing an inode (not a
file) method.
This patch does the former.
Reported-by: Marc Eshel <eshel@almaden.ibm.com>
Tested-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Minor cleanup/rewrite of find_stateid. Compile tested.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
This patch adds server-side support for callbacks other than AUTH_SYS.
Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Two principals are involved in krb5 authentication: the target, who we
authenticate *to* (normally the name of the server, like
nfs/server.citi.umich.edu@CITI.UMICH.EDU), and the source, we we
authenticate *as* (normally a user, like bfields@UMICH.EDU)
In the case of NFSv4 callbacks, the target of the callback should be the
source of the client's setclientid call, and the source should be the
nfs server's own principal.
Therefore we allow svcgssd to pass down the name of the principal that
just authenticated, so that on setclientid we can store that principal
name with the new client, to be used later on callbacks.
Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If nfsd was shut down before the grace period ended, we could end up
with a freed object still on grace_list. Thanks to Jeff Moyer for
reporting the resulting list corruption warnings.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Tested-by: Jeff Moyer <jmoyer@redhat.com>
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rewrite grace period code to unify management of grace period across
lockd and nfsd. The current code has lockd and nfsd cooperate to
compute a grace period which is satisfactory to them both, and then
individually enforce it. This creates a slight race condition, since
the enforcement is not coordinated. It's also more complicated than
necessary.
Here instead we have lockd and nfsd each inform common code when they
enter the grace period, and when they're ready to leave the grace
period, and allow normal locking only after both of them are ready to
leave.
We also expect the locks_start_grace()/locks_end_grace() interface here
to be simpler to build on for future cluster/high-availability work,
which may require (for example) putting individual filesystems into
grace, or enforcing grace periods across multiple cluster nodes.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
It's not immediately obvious from the code why we're doing this.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Benny Halevy <bhalevy@panasas.com>
Rename nfsd_permission() specific MAY_* flags to NFSD_MAY_* to make it
clear, that these are not used outside nfsd, and to avoid name and
number space conflicts with the VFS.
[comment from hch: rename MAY_READ, MAY_WRITE and MAY_EXEC as well]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Several of the nfsd filesystem interfaces allow changes to parameters
that don't have any effect on a running nfsd service. They are only ever
checked when nfsd is started. This patch fixes it so that changes to
those procfiles return -EBUSY if nfsd is already running to make it
clear that changes on the fly don't work.
The patch should also close some relatively harmless races between
changing the info in those interfaces and starting nfsd, since these
variables are being moved under the protection of the nfsd_mutex.
Finally, the nfsv4recoverydir file always returns -EINVAL if read. This
patch fixes it to return the recoverydir path as expected.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
These bit operations don't need to be atomic. They're all done under a
single big mutex anyway.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
The file_lock structure is used both as a heavy-weight representation of
an active lock, with pointers to reference-counted structures, etc., and
as a simple container for parameters that describe a file lock.
The conflicting lock returned from __posix_lock_file is an example of
the latter; so don't call the filesystem or lock manager callbacks when
copying to it. This also saves the need for an unnecessary
locks_init_lock in the nfsv4 server.
Thanks to Trond for pointing out the error.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
While lease is correctly checked by supplying the type argument to
vfs_setlease(), it's stored with fl_type uninitialized. This breaks the
logic when checking the type of the lease. The fix is to initialize
fl_type.
The old code still happened to function correctly since F_RDLCK is zero,
and we only implement read delegations currently (nor write
delegations). But that's no excuse for not fixing this.
Signed-off-by: Felix Blyakher <felixb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Add extern to nfsd/nfsd.h
fs/nfsd/nfssvc.c:146:5: warning: symbol 'nfsd_nrthreads' was not declared. Should it be static?
fs/nfsd/nfssvc.c:261:5: warning: symbol 'nfsd_nrpools' was not declared. Should it be static?
fs/nfsd/nfssvc.c:269:5: warning: symbol 'nfsd_get_nrthreads' was not declared. Should it be static?
fs/nfsd/nfssvc.c:281:5: warning: symbol 'nfsd_set_nrthreads' was not declared. Should it be static?
fs/nfsd/export.c:1534:23: warning: symbol 'nfs_exports_op' was not declared. Should it be static?
Add include of auth.h
fs/nfsd/auth.c:27:5: warning: symbol 'nfsd_setuser' was not declared. Should it be static?
Make static, move forward declaration closer to where it's needed.
fs/nfsd/nfs4state.c:1877:1: warning: symbol 'laundromat_main' was not declared. Should it be static?
Make static, forward declaration was already marked static.
fs/nfsd/nfs4idmap.c:206:1: warning: symbol 'idtoname_parse' was not declared. Should it be static?
fs/nfsd/vfs.c:1156:1: warning: symbol 'nfsd_create_setattr' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
If someone decides to demote a file from r/w to just
r/o, they can use this same code as __fput().
NFS does just that, and will use this in the next
patch.
AV: drop write access in __fput() only after we evict from file list.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Cc: Erez Zadok <ezk@cs.sunysb.edu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J Bruce Fields" <bfields@fieldses.org>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Add path_put() functions for releasing a reference to the dentry and
vfsmount of a struct path in the right order
* Switch from path_release(nd) to path_put(&nd->path)
* Rename dput_path() to path_put_conditional()
[akpm@linux-foundation.org: fix cifs]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the central patch of a cleanup series. In most cases there is no good
reason why someone would want to use a dentry for itself. This series reflects
that fact and embeds a struct path into nameidata.
Together with the other patches of this series
- it enforced the correct order of getting/releasing the reference count on
<dentry,vfsmount> pairs
- it prepares the VFS for stacking support since it is essential to have a
struct path in every place where the stack can be traversed
- it reduces the overall code size:
without patch series:
text data bss dec hex filename
5321639 858418 715768 6895825 6938d1 vmlinux
with patch series:
text data bss dec hex filename
5320026 858418 715768 6894212 693284 vmlinux
This patch:
Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix cifs]
[akpm@linux-foundation.org: fix smack]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Document these checks a little better and inline, as suggested by Neil
Brown (note both functions have two callers). Remove an obviously bogus
check while we're there (checking whether unsigned value is negative).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Neil Brown <neilb@suse.de>
The failure to return a stateowner from nfs4_preprocess_seqid_op() means
in the case where a lock request is of a type incompatible with an open
(due to, e.g., an application attempting a write lock on a file open for
read), means that fs/nfsd/nfs4xdr.c:ENCODE_SEQID_OP_TAIL() never bumps
the seqid as it should. The client, attempting to close the file
afterwards, then gets an (incorrect) bad sequence id error. Worse, this
prevents the open file from ever being closed, so we leak state.
Thanks to Benny Halevy and Trond Myklebust for analysis, and to Steven
Wilton for the report and extensive data-gathering.
Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Steven Wilton <steven.wilton@team.eftel.com.au>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
When the callback channel fails, we inform the client of that by
returning a cb_path_down error the next time it tries to renew its
lease.
If we wait most of a lease period before deciding that a callback has
failed and that the callback channel is down, then we decrease the
chances that the client will find out in time to do anything about it.
So, mark the channel down as soon as we recognize that an rpc has
failed. However, continue trying to recall delegations anyway, in hopes
it will come back up. This will prevent more delegations from being
given out, and ensure cb_path_down is returned to renew calls earlier,
while still making the best effort to deliver recalls of existing
delegations.
Also fix a couple comments and remove a dprink that doesn't seem likely
to be useful.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Declare this variable in the one function where it's used, and clean up
some minor style problems.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
We generate a unique cl_confirm for every new client; so if we've
already checked that this cl_confirm agrees with the cl_confirm of
unconf, then we already know that it does not agree with the cl_confirm
of conf.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Again, the only way conf and unconf can have the same clientid is if
they were created in the "probable callback update" case of setclientid,
in which case we already know that the cl_verifier fields must agree.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
If conf and unconf are both found in the lookup by cl_clientid, then
they share the same cl_clientid. We always create a unique new
cl_clientid field when creating a new client--the only exception is the
"probable callback update" case in setclientid, where we copy the old
cl_clientid from another clientid with the same name.
Therefore two clients with the same cl_client field also always share
the same cl_name field, and a couple of the checks here are redundant.
Thanks to Simon Holm Thøgersen for a compile fix.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Simon Holm Thøgersen <odie@cs.aau.dk>
Using a counter instead of the nanoseconds value seems more likely to
produce a unique cl_confirm.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
We're supposed to generate a different cl_confirm verifier for each new
client, so these to cl_confirm values should never be the same.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Most of these comments just summarize the code.
The matching of code to the cases described in the RFC may still be
useful, though; add specific section references to make that easier to
follow. Also update references to the outdated RFC 3010.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Our callback code doesn't actually handle concurrent attempts to probe
the callback channel. Some rethinking of the locking may be required.
However, we can also just move the callback probing to this case. Since
this is the only time a client is "confirmed" (and since that can only
happen once in the lifetime of a client), this ensures we only probe
once.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* 'locks' of git://linux-nfs.org/~bfields/linux:
nfsd: remove IS_ISMNDLCK macro
Rework /proc/locks via seq_files and seq_list helpers
fs/locks.c: use list_for_each_entry() instead of list_for_each()
NFS: clean up explicit check for mandatory locks
AFS: clean up explicit check for mandatory locks
9PFS: clean up explicit check for mandatory locks
GFS2: clean up explicit check for mandatory locks
Cleanup macros for distinguishing mandatory locks
Documentation: move locks.txt in filesystems/
locks: add warning about mandatory locking races
Documentation: move mandatory locking documentation to filesystems/
locks: Fix potential OOPS in generic_setlease()
Use list_first_entry in locks_wake_up_blocks
locks: fix flock_lock_file() comment
Memory shortage can result in inconsistent flocks state
locks: kill redundant local variable
locks: reverse order of posix_locks_conflict() arguments
The combination of S_ISGID bit set and S_IXGRP bit unset is used to mark the
inode as "mandatory lockable" and there's a macro for this check called
MANDATORY_LOCK(inode). However, fs/locks.c and some filesystems still perform
the explicit i_mode checking. Besides, Andrew pointed out, that this macro is
buggy itself, as it dereferences the inode arg twice.
Convert this macro into static inline function and switch its users to it,
making the code shorter and more readable.
The __mandatory_lock() helper is to be used in places where the IS_MANDLOCK()
for superblock is already known to be true.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
It's not enough to take a reference on the delegation object itself; we
need to ensure that the rpc_client won't go away just as we're about to
make an rpc call.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
If a callback still holds a reference on the client, then it may be
about to perform an rpc call, so it isn't safe to call rpc_shutdown().
(Though rpc_shutdown() does wait for any outstanding rpc's, it can't
know if a new rpc is about to be issued with that client.)
So, wait to shutdown the rpc_client until the reference count on the
client has gone to zero.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Currently there's a race that can cause an oops in generic_setlease.
(In detail: nfsd, when it removes a lease, does so by calling
vfs_setlease() with F_UNLCK and a pointer to the fl_flock field, which
in turn points to nfsd's existing lease; but the first thing the
setlease code does is call time_out_leases(). If the lease happens to
already be beyond the lease break time, that will free the lease and (in
nfsd's release_private callback) set fl_flock to NULL, leading to a NULL
deference soon after in vfs_setlease().)
There are probably other things to fix here too, but it seems inherently
racy to allow either locks.c or nfsd to time out this lease. Instead
just set the fl_break_time to 0 (preventing locks.c from ever timing out
this lock) and leave it up to nfsd's laundromat thread to deal with it.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Each branch of this if-then-else has a bunch of duplicated code that we
could just put at the end.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
We have some slabs that the nfs4 server uses to store state objects.
We're currently creating and destroying those slabs whenever the server
is brought up or down. That seems excessive; may as well just do that
in module initialization and exit.
Also add some minor header cleanup. (Thanks to Andrew Morton for that
and a compile fix.)
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
To quote a recent mail from Andrew Morton:
Look: if there's a way in which an unprivileged user can trigger
a printk we fix it, end of story.
OK. I assume that goes double for printk()s that might be triggered by
random hosts on the internet. So, disable some printk()s that look like
they could be triggered by malfunctioning or malicious clients. For
now, just downgrade them to dprintk()s.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Benny Halevy suggested renaming cmp_* to same_* to make the meaning of
the return value clearer.
Fix some nearby style deviations while we're at it, including a small
swath of creative indentation in nfs4_preprocess_seqid_op().
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
We've been using the convention that vfs_foo is the function that calls
a filesystem-specific foo method if it exists, or falls back on a
generic method if it doesn't; thus vfs_foo is what is called when some
other part of the kernel (normally lockd or nfsd) wants to get a lock,
whereas foo is what filesystems call to use the underlying local
functionality as part of their lock implementation.
So rename setlease to vfs_setlease (which will call a
filesystem-specific setlease after a later patch) and __setlease to
setlease.
Also, vfs_setlease need only be GPL-exported as long as it's only needed
by lockd and nfsd.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
One more incremental delegation policy improvement: don't give out a
delegation on a file if conflicting access has previously required that a
delegation be revoked on that file. (In practice we'll forget about the
conflict when the struct nfs4_file is removed on close, so this is of limited
use for now, though it should at least solve a temporary problem with
self-conflicts on write opens from the same client.)
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Our original NFSv4 delegation policy was to give out a read delegation on any
open when it was possible to.
Since the lifetime of a delegation isn't limited to that of an open, a client
may quite reasonably hang on to a delegation as long as it has the inode
cached. This becomes an obvious problem the first time a client's inode cache
approaches the size of the server's total memory.
Our first quick solution was to add a hard-coded limit. This patch makes a
mild incremental improvement by varying that limit according to the server's
total memory size, allowing at most 4 delegations per megabyte of RAM.
My quick back-of-the-envelope calculation finds that in the worst case (where
every delegation is for a different inode), a delegation could take about
1.5K, which would make the worst case usage about 6% of memory. The new limit
works out to be about the same as the old on a 1-gig server.
[akpm@linux-foundation.org: Don't needlessly bloat vmlinux]
[akpm@linux-foundation.org: Make it right for highmem machines]
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>