Commit Graph

4609 Commits

Author SHA1 Message Date
Trond Myklebust
b72888cb0b NFSv4: Fix up nfs4_proc_lookup_mountpoint
Currently, we do not check the return value of client = rpc_clone_client(),
nor do we shut down the resulting cloned rpc_clnt in the case where a
NFS4ERR_WRONGSEC has caused nfs4_proc_lookup_common() to replace the
original value of 'client' (causing a memory leak).

Fix both issues and simplify the code by moving the call to
rpc_clone_client() until after nfs4_proc_lookup_common() has
done its business.

Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-07 20:47:26 -04:00
Trond Myklebust
eddffa4084 NFS: Remove unnecessary call to nfs_setsecurity in nfs_fhget()
We only need to call it on the creation of the inode.

Reported-by: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Steve Dickson <SteveD@redhat.com>
Cc: Dave Quigley <dpquigl@davequigley.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-07 17:07:41 -04:00
Scott Mayhew
e890db0104 NFSv4: Fix the sync mount option for nfs4 mounts
The sync mount option stopped working for NFSv4 mounts after commit
c02d7adf8c (NFSv4: Replace nfs4_path_walk() with
FS path lookup in a private namespace).  If MS_SYNCHRONOUS is set in the
super_block that we're cloning from, then it should be set in the new
super_block as well.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-07 17:07:41 -04:00
Trond Myklebust
f8806c843f NFS: Fix writeback performance issue on cache invalidation
If a cache invalidation is triggered, and we happen to have a lot of
writebacks cached at the time, then the call to invalidate_inode_pages2()
will end up calling ->launder_page() on each and every dirty page in order
to sync its contents to disk, thus defeating write coalescing.
The following patch ensures that we try to sync the inode to disk before
calling invalidate_inode_pages2() so that we do the writeback as efficiently
as possible.

Reported-by: William Dauchy <william@gandi.net>
Reported-by: Pascal Bouchareine <pascal@gandi.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: William Dauchy <william@gandi.net>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2013-08-07 17:07:40 -04:00
Andy Adamson
bc4b2a86a5 NFSv4.1 Increase NFS4_DEF_SLOT_TABLE_SIZE
Increase NFS4_DEF_SLOT_TABLE_SIZE which is used as the client ca_maxreequests
value in CREATE_SESSION.  Current non-dynamic session slot server
implementations use the client ca_maxrequests as a maximum slot number: 64
session slots can handle most workloads.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-07 13:10:40 -04:00
Andy Adamson
f8407299f6 NFS Remove unused authflavour parameter from init_client
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-07 13:09:30 -04:00
Chuck Lever
73d8bde5e4 NFS: Never use user credentials for lease renewal
Never try to use a non-UID 0 user credential for lease management,
as that credential can change out from under us.  The server will
block NFSv4 lease recovery with NFS4ERR_CLID_INUSE.

Since the mechanism to acquire a credential for lease management
is now the same for all minor versions, replace the minor version-
specific callout with a single function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-07 13:06:08 -04:00
Chuck Lever
d688f7b8f6 NFS: Use root's credential for lease management when keytab is missing
Commit 05f4c350 "NFS: Discover NFSv4 server trunking when mounting"
Fri Sep 14 17:24:32 2012 introduced Uniform Client String support,
which forces our NFS client to establish a client ID immediately
during a mount operation rather than waiting until a user wants to
open a file.

Normally machine credentials (eg. from a keytab) are used to perform
a mount operation that is protected by Kerberos.  Before 05fc350,
SETCLIENTID used a machine credential, or fell back to a regular
user's credential if no keytab is available.

On clients that don't have a keytab, performing SETCLIENTID early
means there's no user credential to fall back on, since no regular
user has kinit'd yet.  05f4c350 seems to have broken the ability
to mount with sec=krb5 on clients that don't have a keytab in
kernels 3.7 - 3.10.

To address this regression, commit 4edaa308 (NFS: Use "krb5i" to
establish NFSv4 state whenever possible), Sat Mar 16 15:56:20 2013,
was merged in 3.10.  This commit forces the NFS client to fall back
to AUTH_SYS for lease management operations if no keytab is
available.

Neil Brown noticed that, since root is required to kinit to do a
sec=krb5 mount when a client doesn't have a keytab, we can try to
use root's Kerberos credential before AUTH_SYS.

Now, when determining a principal and flavor to use for lease
management, the NFS client tries in this order:

  1.  Flavor: AUTH_GSS, krb5i
      Principal: service principal (via keytab)

  2.  Flavor: AUTH_GSS, krb5i
      Principal: user principal established for UID 0 (via kinit)

  3.  Flavor: AUTH_SYS
      Principal: UID 0 / GID 0

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-07 13:05:10 -04:00
Trond Myklebust
6da1a03436 NFSv4: Refuse mount attempts with proto=udp
RFC3530 disallows the use of udp as a transport protocol for NFSv4.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-07 12:37:04 -04:00
Jeff Layton
9597c13b2f nfs: verify open flags before allowing an atomic open
Currently, you can open a NFSv4 file with O_APPEND|O_DIRECT, but cannot
fcntl(F_SETFL,...) with those flags. This flag combination is explicitly
forbidden on NFSv3 opens, and it seems like it should also be on NFSv4.

Reported-by: Chao Ye <cye@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-08-07 12:16:22 -04:00
Kent Overstreet
73a7075e3f aio: Kill aio_rw_vect_retry()
This code doesn't serve any purpose anymore, since the aio retry
infrastructure has been removed.

This change should be safe because aio_read/write are also used for
synchronous IO, and called from do_sync_read()/do_sync_write() - and
there's no looping done in the sync case (the read and write syscalls).

Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2013-07-30 11:53:12 -04:00
Trond Myklebust
55b592933b NFSv4: Fix nfs4_init_uniform_client_string for net namespaces
Commit 6f2ea7f2a (NFS: Add nfs4_unique_id boot parameter) introduces a
boot parameter that allows client administrators to set a string
identifier for use by the EXCHANGE_ID and SETCLIENTID arguments in order
to make them more globally unique.

Unfortunately, that uniquifier is no longer globally unique in the presence
of net namespaces, since each container expects to be able to set up their
own lease when mounting a new NFSv4/4.1 partition.
The fix is to add back in the container-specific hostname in addition to
the unique id.

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-24 16:47:04 -04:00
Andy Adamson
1771c5774b NFSv4.1 Use the mount point rpc_clnt for layoutreturn
Should not use the clientid maintenance rpc_clnt.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-23 18:18:54 -04:00
Chuck Lever
b14b7979d7 NFS: Fix return type of nfs4_end_drain_session() stub
Clean up: when NFSv4.1 support is compiled out,
nfs4_end_drain_session() becomes a stub.  Make the synopsis of the
stub match the synopsis of the real version of the function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-23 18:18:53 -04:00
Nadav Shemer
cc7936f9ad nfs: fix open(O_RDONLY|O_TRUNC) in NFS4.0
nfs4_proc_setattr removes ATTR_OPEN from sattr->ia_valid, but later
nfs4_do_setattr checks for it

Signed-off-by: Nadav Shemer <nadav@tonian.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-23 18:18:53 -04:00
Trond Myklebust
d7067b2db7 NFSv4: encode_attrs should not backfill the bitmap and attribute length
The attribute length is already calculated in advance. There is no
reason why we cannot calculate the bitmap in advance too so that
we don't have to play pointer games.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-23 18:18:53 -04:00
Trond Myklebust
4f3cc4809a NFSv4: Fix brainfart in attribute length calculation
The calculation of the attribute length was 4 bytes off.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reported-and-tested-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-23 14:24:59 -07:00
Trond Myklebust
b4a2cf76ab NFSv4: Fix a regression against the FreeBSD server
Technically, the Linux client is allowed by the NFSv4 spec to send
3 word bitmaps as part of an OPEN request. However, this causes the
current FreeBSD server to return NFS4ERR_ATTRNOTSUPP errors.

Fix the regression by making the Linux client use a 2 word bitmap unless
doing NFSv4.2 with labeled NFS.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-17 16:54:46 -04:00
Scott Mayhew
c7559663e4 NFS: Allow nfs_updatepage to extend a write under additional circumstances
Currently nfs_updatepage allows a write to be extended to cover a full
page only if we don't have a byte range lock lock on the file... but if
we have a write delegation on the file or if we have the whole file
locked for writing then we should be allowed to extend the write as
well.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
[Trond: fix up call to nfs_have_delegation()]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-09 19:32:50 -04:00
Scott Mayhew
07b5ce8ef2 NFS: Make nfs_readdir revalidate less often
Make nfs_readdir revalidate only when we're at the beginning of the directory or
if the cached attributes have expired.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-09 17:17:07 -04:00
Scott Mayhew
43f291cd07 NFS: Make nfs_attribute_cache_expired() non-static
NFS: Make nfs_attribute_cache_expired() non-static so we can call it from
nfs_readdir().

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-09 17:17:07 -04:00
Jeff Layton
cda57a1ef6 nfs: set verifier on existing dentries in nfs_prime_dcache
nfs_prime_dcache currently only sets the verifier when it doesn't
initially a matching dentry in the dcache. Set the verifier in the case
where we do find a dentry in the dcache. This ensures that we don't
have to look up the dentry again if we want to use it after a readdir.

Cc: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-07-09 17:16:39 -04:00
Linus Torvalds
be0c5d8c0b NFS client updates for Linux 3.11
Feature highlights include:
 - Add basic client support for NFSv4.2
 - Add basic client support for Labeled NFS (selinux for NFSv4.2)
 - Fix the use of credentials in NFSv4.1 stateful operations, and
   add support for NFSv4.1 state protection.
 
 Bugfix highlights:
 - Fix another NFSv4 open state recovery race
 - Fix an NFSv4.1 back channel session regression
 - Various rpc_pipefs races
 - Fix another issue with NFSv3 auth negotiation
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJR2vsSAAoJEGcL54qWCgDyWBIP/AqlpBBAblxbNQ1Bl/0m1Pdb
 iKH961qgM4U1BzK0svGtHTZqkovpm4o/VbkbKBT5mQ4g6SbbsJ/AsS1plCyfnIZi
 bdnKNJyj6zg0NsAkJ3vKWqd4BTaP+icdSfEIlRKQxAPESewN7b5B3OWgY4KdYmnk
 q5BP25anC1ryxVycSY67ux8S2IKXVSRZeCZv+RO21rvZ2G0bV5y7t8Om28ztxEnU
 RKrHgQHgaaktR7i8QVO0sbiWq3iqLa3GPkUvFLwWGr8PQJtTkYY0QwYSrsV3N4rY
 hYpMRUZFHpZ8UG5YvBT6xyOy/XaGwMGKSfZjB9/YG4QVju+tTy50U1JbTil5PEWY
 GHWYF68aurIeUkXrhSv8AVnOnhir0mISx5ou/SV7p0QoAZ92V6kq+LkPrW520qlc
 z8ILh3j28pN3ZUCIEArcaZhYCt48uO2hwBi5TqevQyyGRsXFGbN1moD5jvHkllft
 Fi0XGuCBdvhrzFRZcsEl+PDq7fT8lXUK2BHe8oR5jz9PhUp+jpEl9m/eg3RsjJjN
 DuxsHye2U4chScdnRtLBQvpFtdINvWX/Gy8Bi7kdE5tsQySvOa+rdwuBc7h88PHC
 +4xI2iX3z4O1+GpsAe/T9+pjW689jEilS+eVDRVEGl6yHGn9q8PYOayjPjwbJHxS
 R2mLTRhKu1DKguTzO13f
 =wGjn
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Feature highlights include:
   - Add basic client support for NFSv4.2
   - Add basic client support for Labeled NFS (selinux for NFSv4.2)
   - Fix the use of credentials in NFSv4.1 stateful operations, and add
     support for NFSv4.1 state protection.

  Bugfix highlights:
   - Fix another NFSv4 open state recovery race
   - Fix an NFSv4.1 back channel session regression
   - Various rpc_pipefs races
   - Fix another issue with NFSv3 auth negotiation

  Please note that Labeled NFS does require some additional support from
  the security subsystem.  The relevant changesets have all been
  reviewed and acked by James Morris."

* tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (54 commits)
  NFS: Set NFS_CS_MIGRATION for NFSv4 mounts
  NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs
  nfs: have NFSv3 try server-specified auth flavors in turn
  nfs: have nfs_mount fake up a auth_flavs list when the server didn't provide it
  nfs: move server_authlist into nfs_try_mount_request
  nfs: refactor "need_mount" code out of nfs_try_mount
  SUNRPC: PipeFS MOUNT notification optimization for dying clients
  SUNRPC: split client creation routine into setup and registration
  SUNRPC: fix races on PipeFS UMOUNT notifications
  SUNRPC: fix races on PipeFS MOUNT notifications
  NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount
  NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount
  NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize
  NFS: Improve legacy idmapping fallback
  NFSv4.1 end back channel session draining
  NFS: Apply v4.1 capabilities to v4.2
  NFSv4.1: Clean up layout segment comparison helper names
  NFSv4.1: layout segment comparison helpers should take 'const' parameters
  NFSv4: Move the DNS resolver into the NFSv4 module
  rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set
  ...
2013-07-09 12:09:43 -07:00
Al Viro
84d08fa888 helper for reading ->d_count
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-05 18:59:33 +04:00
Linus Torvalds
7f0ef0267e Merge branch 'akpm' (updates from Andrew Morton)
Merge first patch-bomb from Andrew Morton:
 - various misc bits
 - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
   distracted.  There has been quite a bit of activity.
 - About half the MM queue
 - Some backlight bits
 - Various lib/ updates
 - checkpatch updates
 - zillions more little rtc patches
 - ptrace
 - signals
 - exec
 - procfs
 - rapidio
 - nbd
 - aoe
 - pps
 - memstick
 - tools/testing/selftests updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits)
  tools/testing/selftests: don't assume the x bit is set on scripts
  selftests: add .gitignore for kcmp
  selftests: fix clean target in kcmp Makefile
  selftests: add .gitignore for vm
  selftests: add hugetlbfstest
  self-test: fix make clean
  selftests: exit 1 on failure
  kernel/resource.c: remove the unneeded assignment in function __find_resource
  aio: fix wrong comment in aio_complete()
  drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
  drivers/memstick/host/r592.c: convert to module_pci_driver
  drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
  pps-gpio: add device-tree binding and support
  drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
  drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
  drivers/parport/share.c: use kzalloc
  Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
  aoe: update internal version number to v83
  aoe: update copyright date
  aoe: perform I/O completions in parallel
  ...
2013-07-03 17:12:13 -07:00
Kees Cook
f170168b9a drivers: avoid parsing names as kthread_run() format strings
Calling kthread_run with a single name parameter causes it to be handled
as a format string. Many callers are passing potentially dynamic string
content, so use "%s" in those cases to avoid any potential accidents.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:41 -07:00
Mel Gorman
a0b8cab3b9 mm: remove lru parameter from __pagevec_lru_add and remove parts of pagevec API
Now that the LRU to add a page to is decided at LRU-add time, remove the
misleading lru parameter from __pagevec_lru_add.  A consequence of this
is that the pagevec_lru_add_file, pagevec_lru_add_anon and similar
helpers are misleading as the caller no longer has direct control over
what LRU the page is added to.  Unused helpers are removed by this patch
and existing users of pagevec_lru_add_file() are converted to use
lru_cache_add_file() directly and use the per-cpu pagevecs instead of
creating their own pagevec.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alexey Lyahkov <alexey.lyashkov@gmail.com>
Cc: Andrew Perepechko <anserper@ya.ru>
Cc: Robin Dong <sanbai@taobao.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Bernd Schubert <bernd.schubert@fastmail.fm>
Cc: David Howells <dhowells@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:31 -07:00
Mel Gorman
f919b19614 fs: nfs: inform the VM about pages being committed or unstable
VM page reclaim uses dirty and writeback page states to determine if
flushers are cleaning pages too slowly and that page reclaim should
stall waiting on flushers to catch up.  Page state in NFS is a bit more
complex and a clean page can be unreclaimable due to being unstable
which is effectively "dirty" from the perspective of the VM from reclaim
context.  Similarly, if the inode is currently being committed then it's
similar to being under writeback.

This patch adds a is_dirty_writeback() handled for NFS that checks if a
pages backing inode is being committed and should be accounted as
writeback and if a page has private state indicating that it is
effectively dirty.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Zlatko Calusic <zcalusic@bitsync.net>
Cc: dormando <dormando@rydia.net>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:29 -07:00
Linus Torvalds
f991fae5c6 Power management and ACPI updates for 3.11-rc1
- Hotplug changes allowing device hot-removal operations to fail
   gracefully (instead of crashing the kernel) if they cannot be
   carried out completely.  From Rafael J Wysocki and Toshi Kani.
 
 - Freezer update from Colin Cross and Mandeep Singh Baines targeted
   at making the freezing of tasks a bit less heavy weight operation.
 
 - cpufreq resume fix from Srivatsa S Bhat for a regression introduced
   during the 3.10 cycle causing some cpufreq sysfs attributes to
   return wrong values to user space after resume.
 
 - New freqdomain_cpus sysfs attribute for the acpi-cpufreq driver to
   provide information previously available via related_cpus from
   Lan Tianyu.
 
 - cpufreq fixes and cleanups from Viresh Kumar, Jacob Shin,
   Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia, Arnd Bergmann, and
   Tang Yuantian.
 
 - Fix for an ACPICA regression causing suspend/resume issues to
   appear on some systems introduced during the 3.4 development cycle
   from Lv Zheng.
 
 - ACPICA fixes and cleanups from Bob Moore, Tomasz Nowicki, Lv Zheng,
   Chao Guan, and Zhang Rui.
 
 - New cupidle driver for Xilinx Zynq processors from Michal Simek.
 
 - cpuidle fixes and cleanups from Daniel Lezcano.
 
 - Changes to make suspend/resume work correctly in Xen guests from
   Konrad Rzeszutek Wilk.
 
 - ACPI device power management fixes and cleanups from Fengguang Wu
   and Rafael J Wysocki.
 
 - ACPI documentation updates from Lv Zheng, Aaron Lu and Hanjun Guo.
 
 - Fix for the IA-64 issue that was the reason for reverting commit
   9f29ab1 and updates of the ACPI scan code from Rafael J Wysocki.
 
 - Mechanism for adding CMOS RTC address space handlers from Lan Tianyu
   (to allow some EC-related breakage to be fixed on some systems).
 
 - Spec-compliant implementation of acpi_os_get_timer() from
   Mika Westerberg.
 
 - Modification of do_acpi_find_child() to execute _STA in order to
   to avoid situations in which a pointer to a disabled device object
   is returned instead of an enabled one with the same _ADR value.
   From Jeff Wu.
 
 - Intel BayTrail PCH (Platform Controller Hub) support for the ACPI
   Intel Low-Power Subsystems (LPSS) driver and modificaions of that
   driver to work around a couple of known BIOS issues from
   Mika Westerberg and Heikki Krogerus.
 
 - EC driver fix from Vasiliy Kulikov to make it use get_user() and
   put_user() instead of dereferencing user space pointers blindly.
 
 - Assorted ACPI code cleanups from Bjorn Helgaas, Nicholas Mazzuca and
   Toshi Kani.
 
 - Modification of the "runtime idle" helper routine to take the return
   values of the callbacks executed by it into account and to call
   rpm_suspend() if they return 0, which allows some code bloat
   reduction to be done, from Rafael J Wysocki and Alan Stern.
 
 - New trace points for PM QoS from Sahara <keun-o.park@windriver.com>.
 
 - PM QoS documentation update from Lan Tianyu.
 
 - Assorted core PM code cleanups and changes from Bernie Thompson,
   Bjorn Helgaas, Julius Werner, and Shuah Khan.
 
 - New devfreq driver for the Exynos5-bus device from Abhilash Kesavan.
 
 - Minor devfreq cleanups, fixes and MAINTAINERS update from
   MyungJoo Ham, Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and
   Wei Yongjun.
 
 - OMAP Adaptive Voltage Scaling (AVS) SmartReflex voltage control
   driver updates from Andrii Tseglytskyi and Nishanth Menon.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJR0ZNOAAoJEKhOf7ml8uNsDLYP/0EU4rmvw0TWTITfp6RS1KDE
 9GwBn96ZR4Q5bJd9gBCTPSqhHOYMqxWEUp99sn/M2wehG1pk/jw5LO56+2IhM3UZ
 g1HDcJ7te2nVT/iXsKiAGTVhU9Rk0aYwoVSknwk27qpIBGxW9w/s5tLX8pY3Q3Zq
 wL/7aTPjyL+PFFFEaxgH7qLqsl3DhbtYW5AriUBTkXout/tJ4eO1b7MNBncLDh8X
 VQ/0DNCKE95VEJfkO4rk9RKUyVp9GDn0i+HXCD/FS4IA5oYzePdVdNDmXf7g+swe
 CGlTZq8pB+oBpDiHl4lxzbNrKQjRNbGnDUkoRcWqn0nAw56xK+vmYnWJhW99gQ/I
 fKnvxeLca5po1aiqmC4VSJxZIatFZqLrZAI4dzoCLWY+bGeTnCKmj0/F8ytFnZA2
 8IuLLs7/dFOaHXV/pKmpg6FAlFa9CPxoqRFoyqb4M0GjEarADyalXUWsPtG+6xCp
 R/p0CISpwk+guKZR/qPhL7M654S7SHrPwd2DPF0KgGsvk+G2GhoB8EzvD8BVp98Z
 9siCGCdgKQfJQVI6R0k9aFmn/4gRQIAgyPhkhv9tqULUUkiaXki+/t8kPfnb8O/d
 zep+CA57E2G8MYLkDJfpFeKS7GpPD6TIdgFdGmOUC0Y6sl9iTdiw4yTx8O2JM37z
 rHBZfYGkJBrbGRu+Q1gs
 =VBBq
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI updates from Rafael Wysocki:
 "This time the total number of ACPI commits is slightly greater than
  the number of cpufreq commits, but Viresh Kumar (who works on cpufreq)
  remains the most active patch submitter.

  To me, the most significant change is the addition of offline/online
  device operations to the driver core (with the Greg's blessing) and
  the related modifications of the ACPI core hotplug code.  Next are the
  freezer updates from Colin Cross that should make the freezing of
  tasks a bit less heavy weight.

  We also have a couple of regression fixes, a number of fixes for
  issues that have not been identified as regressions, two new drivers
  and a bunch of cleanups all over.

  Highlights:

   - Hotplug changes to support graceful hot-removal failures.

     It sometimes is necessary to fail device hot-removal operations
     gracefully if they cannot be carried out completely.  For example,
     if memory from a memory module being hot-removed has been allocated
     for the kernel's own use and cannot be moved elsewhere, it's
     desirable to fail the hot-removal operation in a graceful way
     rather than to crash the kernel, but currenty a success or a kernel
     crash are the only possible outcomes of an attempted memory
     hot-removal.  Needless to say, that is not a very attractive
     alternative and it had to be addressed.

     However, in order to make it work for memory, I first had to make
     it work for CPUs and for this purpose I needed to modify the ACPI
     processor driver.  It's been split into two parts, a resident one
     handling the low-level initialization/cleanup and a modular one
     playing the actual driver's role (but it binds to the CPU system
     device objects rather than to the ACPI device objects representing
     processors).  That's been sort of like a live brain surgery on a
     patient who's riding a bike.

     So this is a little scary, but since we found and fixed a couple of
     regressions it caused to happen during the early linux-next testing
     (a month ago), nobody has complained.

     As a bonus we remove some duplicated ACPI hotplug code, because the
     ACPI-based CPU hotplug is now going to use the common ACPI hotplug
     code.

   - Lighter weight freezing of tasks.

     These changes from Colin Cross and Mandeep Singh Baines are
     targeted at making the freezing of tasks a bit less heavy weight
     operation.  They reduce the number of tasks woken up every time
     during the freezing, by using the observation that the freezer
     simply doesn't need to wake up some of them and wait for them all
     to call refrigerator().  The time needed for the freezer to decide
     to report a failure is reduced too.

     Also reintroduced is the check causing a lockdep warining to
     trigger when try_to_freeze() is called with locks held (which is
     generally unsafe and shouldn't happen).

   - cpufreq updates

     First off, a commit from Srivatsa S Bhat fixes a resume regression
     introduced during the 3.10 cycle causing some cpufreq sysfs
     attributes to return wrong values to user space after resume.  The
     fix is kind of fresh, but also it's pretty obvious once Srivatsa
     has identified the root cause.

     Second, we have a new freqdomain_cpus sysfs attribute for the
     acpi-cpufreq driver to provide information previously available via
     related_cpus.  From Lan Tianyu.

     Finally, we fix a number of issues, mostly related to the
     CPUFREQ_POSTCHANGE notifier and cpufreq Kconfig options and clean
     up some code.  The majority of changes from Viresh Kumar with bits
     from Jacob Shin, Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia,
     Arnd Bergmann, and Tang Yuantian.

   - ACPICA update

     A usual bunch of updates from the ACPICA upstream.

     During the 3.4 cycle we introduced support for ACPI 5 extended
     sleep registers, but they are only supposed to be used if the
     HW-reduced mode bit is set in the FADT flags and the code attempted
     to use them without checking that bit.  That caused suspend/resume
     regressions to happen on some systems.  Fix from Lv Zheng causes
     those registers to be used only if the HW-reduced mode bit is set.

     Apart from this some other ACPICA bugs are fixed and code cleanups
     are made by Bob Moore, Tomasz Nowicki, Lv Zheng, Chao Guan, and
     Zhang Rui.

   - cpuidle updates

     New driver for Xilinx Zynq processors is added by Michal Simek.

     Multidriver support simplification, addition of some missing
     kerneldoc comments and Kconfig-related fixes come from Daniel
     Lezcano.

   - ACPI power management updates

     Changes to make suspend/resume work correctly in Xen guests from
     Konrad Rzeszutek Wilk, sparse warning fix from Fengguang Wu and
     cleanups and fixes of the ACPI device power state selection
     routine.

   - ACPI documentation updates

     Some previously missing pieces of ACPI documentation are added by
     Lv Zheng and Aaron Lu (hopefully, that will help people to
     uderstand how the ACPI subsystem works) and one outdated doc is
     updated by Hanjun Guo.

   - Assorted ACPI updates

     We finally nailed down the IA-64 issue that was the reason for
     reverting commit 9f29ab11dd ("ACPI / scan: do not match drivers
     against objects having scan handlers"), so we can fix it and move
     the ACPI scan handler check added to the ACPI video driver back to
     the core.

     A mechanism for adding CMOS RTC address space handlers is
     introduced by Lan Tianyu to allow some EC-related breakage to be
     fixed on some systems.

     A spec-compliant implementation of acpi_os_get_timer() is added by
     Mika Westerberg.

     The evaluation of _STA is added to do_acpi_find_child() to avoid
     situations in which a pointer to a disabled device object is
     returned instead of an enabled one with the same _ADR value.  From
     Jeff Wu.

     Intel BayTrail PCH (Platform Controller Hub) support is added to
     the ACPI driver for Intel Low-Power Subsystems (LPSS) and that
     driver is modified to work around a couple of known BIOS issues.
     Changes from Mika Westerberg and Heikki Krogerus.

     The EC driver is fixed by Vasiliy Kulikov to use get_user() and
     put_user() instead of dereferencing user space pointers blindly.

     Code cleanups are made by Bjorn Helgaas, Nicholas Mazzuca and Toshi
     Kani.

   - Assorted power management updates

     The "runtime idle" helper routine is changed to take the return
     values of the callbacks executed by it into account and to call
     rpm_suspend() if they return 0, which allows us to reduce the
     overall code bloat a bit (by dropping some code that's not
     necessary any more after that modification).

     The runtime PM documentation is updated by Alan Stern (to reflect
     the "runtime idle" behavior change).

     New trace points for PM QoS are added by Sahara
     (<keun-o.park@windriver.com>).

     PM QoS documentation is updated by Lan Tianyu.

     Code cleanups are made and minor issues are addressed by Bernie
     Thompson, Bjorn Helgaas, Julius Werner, and Shuah Khan.

   - devfreq updates

     New driver for the Exynos5-bus device from Abhilash Kesavan.

     Minor cleanups, fixes and MAINTAINERS update from MyungJoo Ham,
     Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and Wei Yongjun.

   - OMAP power management updates

     Adaptive Voltage Scaling (AVS) SmartReflex voltage control driver
     updates from Andrii Tseglytskyi and Nishanth Menon."

* tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits)
  cpufreq: Fix cpufreq regression after suspend/resume
  ACPI / PM: Fix possible NULL pointer deref in acpi_pm_device_sleep_state()
  PM / Sleep: Warn about system time after resume with pm_trace
  cpufreq: don't leave stale policy pointer in cdbs->cur_policy
  acpi-cpufreq: Add new sysfs attribute freqdomain_cpus
  cpufreq: make sure frequency transitions are serialized
  ACPI: implement acpi_os_get_timer() according the spec
  ACPI / EC: Add HP Folio 13 to ec_dmi_table in order to skip DSDT scan
  ACPI: Add CMOS RTC Operation Region handler support
  ACPI / processor: Drop unused variable from processor_perflib.c
  cpufreq: tegra: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: s3c64xx: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: omap: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: imx6q: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: exynos: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: dbx500: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: davinci: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: arm-big-little: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: powernow-k8: call CPUFREQ_POSTCHANGE notfier in error cases
  cpufreq: pcc: call CPUFREQ_POSTCHANGE notfier in error cases
  ...
2013-07-03 14:35:40 -07:00
Linus Torvalds
790eac5640 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second set of VFS changes from Al Viro:
 "Assorted f_pos race fixes, making do_splice_direct() safe to call with
  i_mutex on parent, O_TMPFILE support, Jeff's locks.c series,
  ->d_hash/->d_compare calling conventions changes from Linus, misc
  stuff all over the place."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  Document ->tmpfile()
  ext4: ->tmpfile() support
  vfs: export lseek_execute() to modules
  lseek_execute() doesn't need an inode passed to it
  block_dev: switch to fixed_size_llseek()
  cpqphp_sysfs: switch to fixed_size_llseek()
  tile-srom: switch to fixed_size_llseek()
  proc_powerpc: switch to fixed_size_llseek()
  ubi/cdev: switch to fixed_size_llseek()
  pci/proc: switch to fixed_size_llseek()
  isapnp: switch to fixed_size_llseek()
  lpfc: switch to fixed_size_llseek()
  locks: give the blocked_hash its own spinlock
  locks: add a new "lm_owner_key" lock operation
  locks: turn the blocked_list into a hashtable
  locks: convert fl_link to a hlist_node
  locks: avoid taking global lock if possible when waking up blocked waiters
  locks: protect most of the file_lock handling with i_lock
  locks: encapsulate the fl_link list handling
  locks: make "added" in __posix_lock_file a bool
  ...
2013-07-03 09:10:19 -07:00
Linus Torvalds
9e239bb939 Lots of bug fixes, cleanups and optimizations. In the bug fixes
category, of note is a fix for on-line resizing file systems where the
 block size is smaller than the page size (i.e., file systems 1k blocks
 on x86, or more interestingly file systems with 4k blocks on Power or
 ia64 systems.)
 
 In the cleanup category, the ext4's punch hole implementation was
 significantly improved by Lukas Czerner, and now supports bigalloc
 file systems.  In addition, Jan Kara significantly cleaned up the
 write submission code path.  We also improved error checking and added
 a few sanity checks.
 
 In the optimizations category, two major optimizations deserve
 mention.  The first is that ext4_writepages() is now used for
 nodelalloc and ext3 compatibility mode.  This allows writes to be
 submitted much more efficiently as a single bio request, instead of
 being sent as individual 4k writes into the block layer (which then
 relied on the elevator code to coalesce the requests in the block
 queue).  Secondly, the extent cache shrink mechanism, which was
 introduce in 3.9, no longer has a scalability bottleneck caused by the
 i_es_lru spinlock.  Other optimizations include some changes to reduce
 CPU usage and to avoid issuing empty commits unnecessarily.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJR0XhgAAoJENNvdpvBGATwMXkQAJwTPk5XYLqtAwLziFLvM6wG
 0tWa1QAzTNo80tLyM9iGqI6x74X5nddLw5NMICUmPooOa9agMuA4tlYVSss5jWzV
 yyB7vLzsc/2eZJusuVqfTKrdGybE+M766OI6VO9WodOoIF1l51JXKjktKeaWegfv
 NkcLKlakD4V+ZASEDB/cOcR/lTwAs9dQ89AZzgPiW+G8Do922QbqkENJB8mhalbg
 rFGX+lu9W0f3fqdmT3Xi8KGn3EglETdVd6jU7kOZN4vb5LcF5BKHQnnUmMlpeWMT
 ksOVasb3RZgcsyf5ZOV5feXV601EsNtPBrHAmH22pWQy3rdTIvMv/il63XlVUXZ2
 AXT3cHEvNQP0/yVaOTCZ9xQVxT8sL4mI6kENP9PtNuntx7E90JBshiP5m24kzTZ/
 zkIeDa+FPhsDx1D5EKErinFLqPV8cPWONbIt/qAgo6663zeeIyMVhzxO4resTS9k
 U2QEztQH+hDDbjgABtz9M/GjSrohkTYNSkKXzhTjqr/m5huBrVMngjy/F4/7G7RD
 vSEx5aXqyagnrUcjsupx+biJ1QvbvZWOVxAE/6hNQNRGDt9gQtHAmKw1eG2mugHX
 +TFDxodNE4iWEURenkUxXW3mDx7hFbGZR0poHG3M/LVhKMAAAw0zoKrrUG5c70G7
 XrddRLGlk4Hf+2o7/D7B
 =SwaI
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 update from Ted Ts'o:
 "Lots of bug fixes, cleanups and optimizations.  In the bug fixes
  category, of note is a fix for on-line resizing file systems where the
  block size is smaller than the page size (i.e., file systems 1k blocks
  on x86, or more interestingly file systems with 4k blocks on Power or
  ia64 systems.)

  In the cleanup category, the ext4's punch hole implementation was
  significantly improved by Lukas Czerner, and now supports bigalloc
  file systems.  In addition, Jan Kara significantly cleaned up the
  write submission code path.  We also improved error checking and added
  a few sanity checks.

  In the optimizations category, two major optimizations deserve
  mention.  The first is that ext4_writepages() is now used for
  nodelalloc and ext3 compatibility mode.  This allows writes to be
  submitted much more efficiently as a single bio request, instead of
  being sent as individual 4k writes into the block layer (which then
  relied on the elevator code to coalesce the requests in the block
  queue).  Secondly, the extent cache shrink mechanism, which was
  introduce in 3.9, no longer has a scalability bottleneck caused by the
  i_es_lru spinlock.  Other optimizations include some changes to reduce
  CPU usage and to avoid issuing empty commits unnecessarily."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (86 commits)
  ext4: optimize starting extent in ext4_ext_rm_leaf()
  jbd2: invalidate handle if jbd2_journal_restart() fails
  ext4: translate flag bits to strings in tracepoints
  ext4: fix up error handling for mpage_map_and_submit_extent()
  jbd2: fix theoretical race in jbd2__journal_restart
  ext4: only zero partial blocks in ext4_zero_partial_blocks()
  ext4: check error return from ext4_write_inline_data_end()
  ext4: delete unnecessary C statements
  ext3,ext4: don't mess with dir_file->f_pos in htree_dirblock_to_tree()
  jbd2: move superblock checksum calculation to jbd2_write_superblock()
  ext4: pass inode pointer instead of file pointer to punch hole
  ext4: improve free space calculation for inline_data
  ext4: reduce object size when !CONFIG_PRINTK
  ext4: improve extent cache shrink mechanism to avoid to burn CPU time
  ext4: implement error handling of ext4_mb_new_preallocation()
  ext4: fix corruption when online resizing a fs with 1K block size
  ext4: delete unused variables
  ext4: return FIEMAP_EXTENT_UNKNOWN for delalloc extents
  jbd2: remove debug dependency on debug_fs and update Kconfig help text
  jbd2: use a single printk for jbd_debug()
  ...
2013-07-02 09:39:34 -07:00
Jeff Layton
1c8c601a8c locks: protect most of the file_lock handling with i_lock
Having a global lock that protects all of this code is a clear
scalability problem. Instead of doing that, move most of the code to be
protected by the i_lock instead. The exceptions are the global lists
that the ->fl_link sits on, and the ->fl_block list.

->fl_link is what connects these structures to the
global lists, so we must ensure that we hold those locks when iterating
over or updating these lists.

Furthermore, sound deadlock detection requires that we hold the
blocked_list state steady while checking for loops. We also must ensure
that the search and update to the list are atomic.

For the checking and insertion side of the blocked_list, push the
acquisition of the global lock into __posix_lock_file and ensure that
checking and update of the  blocked_list is done without dropping the
lock in between.

On the removal side, when waking up blocked lock waiters, take the
global lock before walking the blocked list and dequeue the waiters from
the global list prior to removal from the fl_block list.

With this, deadlock detection should be race free while we minimize
excessive file_lock_lock thrashing.

Finally, in order to avoid a lock inversion problem when handling
/proc/locks output we must ensure that manipulations of the fl_block
list are also protected by the file_lock_lock.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:57:42 +04:00
Al Viro
23db862060 [readdir] convert nfs
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29 12:56:40 +04:00
Trond Myklebust
959d921f5e Merge branch 'labeled-nfs' into linux-next
* labeled-nfs:
  NFS: Apply v4.1 capabilities to v4.2
  NFS: Add in v4.2 callback operation
  NFS: Make callbacks minor version generic
  Kconfig: Add Kconfig entry for Labeled NFS V4 client
  NFS: Extend NFS xattr handlers to accept the security namespace
  NFS: Client implementation of Labeled-NFS
  NFS: Add label lifecycle management
  NFS:Add labels to client function prototypes
  NFSv4: Extend fattr bitmaps to support all 3 words
  NFSv4: Introduce new label structure
  NFSv4: Add label recommended attribute and NFSv4 flags
  NFSv4.2: Added NFS v4.2 support to the NFS client
  SELinux: Add new labeling type native labels
  LSM: Add flags field to security_sb_set_mnt_opts for in kernel mount data.
  Security: Add Hook to test if the particular xattr is part of a MAC model.
  Security: Add hook to calculate context based on a negative dentry.
  NFS: Add NFSv4.2 protocol constants

Conflicts:
	fs/nfs/nfs4proc.c
2013-06-28 16:29:51 -04:00
Chuck Lever
f112bb4899 NFS: Set NFS_CS_MIGRATION for NFSv4 mounts
NFS_CS_MIGRATION makes sense only for NFSv4 mounts.  Introduced by
commit 89652617 (NFS: Introduce "migration" mount option) Fri Sep 14
17:24:11 2012.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 16:04:10 -04:00
Andy Adamson
18aad3d552 NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs
nfs4_init_session was originally written to be called prior to
nfs4_init_channel_attrs, setting the session target_max response and request
sizes that nfs4_init_channel_attrs would pay attention to.

In the current code flow, nfs4_init_session, just like nfs4_init_ds_session
for the data server case, is called after the session is all negotiated, and
is actually used in a RECLAIM COMPLETE call to the server.

Remove the un-needed fc_target_max response and request fields from
nfs4_session and just set the max_resp_sz and max_rqst_sz in
nfs4_init_channel_attrs.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 15:55:19 -04:00
Jeff Layton
9111c95b07 nfs: have NFSv3 try server-specified auth flavors in turn
The current scheme is to try and pick the auth flavor that the server
prefers. In some cases though, we may find that we're not actually
able to use that auth flavor later. For instance, the server may
prefer an AUTH_GSS flavor, but we may not be able to get GSSAPI creds.

The current code just gives up at that point. Change it instead to
try the ->create_server call using each of the different authflavors
in the server's list if one was not specified at mount time. Once
we have a successful ->create_server call, return the result. Only
give up and return error if all attempts fail.

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 15:52:37 -04:00
Jeff Layton
fb9b02fda0 nfs: have nfs_mount fake up a auth_flavs list when the server didn't provide it
Instead of handling this as a special case in the auth-selection code,
we can simply fake up an auth_flavs list when the server doesn't
provide it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 15:51:51 -04:00
Jeff Layton
294ae81d4f nfs: move server_authlist into nfs_try_mount_request
In a later patch we're going to want to cycle over this list and attempt
to call ->create_server for each different flavor until one succeeds.
Move the list allocation to the stack of nfs_try_mount_request() and
pass a pointer to it and its length to nfs_request_mount().

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 15:49:49 -04:00
Jeff Layton
d17540c61b nfs: refactor "need_mount" code out of nfs_try_mount
This looks like pointless refactoring for now, but we'll flesh out
the need_mount case a little more in a later patch.

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 15:49:27 -04:00
Andy Adamson
52fcac988a NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 15:34:45 -04:00
Andy Adamson
968fe25243 NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 15:34:44 -04:00
Andy Adamson
f1c097be2b NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize
The GETDEVICEINFO gdia_maxcount represents all of the data being returned
within the GETDEVICEINFO4resok structure and includes the XDR overhead.

The CREATE_SESSION ca_maxresponsesize is the maximum reply and includes the RPC
headers (including security flavor credentials and verifiers).

Split out the struct pnfs_device field maxcount which is the gdia_maxcount
from the pglen field which is the reply (the total) buffer length.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 15:34:43 -04:00
Bryan Schumaker
ffa57b9e53 NFS: Improve legacy idmapping fallback
Fallback should happen only when the request_key() call fails, because
this indicates that there was a problem running the nfsidmap program.
We shouldn't call the legacy code if the error was elsewhere.

Signed-off-by: Bryan Schumaker <bjschuma@netappp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-28 15:22:07 -04:00
Rafael J. Wysocki
207bc1181b Merge branch 'freezer'
* freezer:
  af_unix: use freezable blocking calls in read
  sigtimedwait: use freezable blocking call
  nanosleep: use freezable blocking call
  futex: use freezable blocking call
  select: use freezable blocking call
  epoll: use freezable blocking call
  binder: use freezable blocking calls
  freezer: add new freezable helpers using freezer_do_not_count()
  freezer: convert freezable helpers to static inline where possible
  freezer: convert freezable helpers to freezer_do_not_count()
  freezer: skip waking up tasks with PF_FREEZER_SKIP set
  freezer: shorten freezer sleep time using exponential backoff
  lockdep: check that no locks held at freeze time
  lockdep: remove task argument from debug_check_no_locks_held
  freezer: add unsafe versions of freezable helpers for CIFS
  freezer: add unsafe versions of freezable helpers for NFS
2013-06-28 13:00:53 +02:00
Andy Adamson
62f288a02f NFSv4.1 end back channel session draining
We need to ensure that we clear NFS4_SLOT_TBL_DRAINING on the back
channel when we're done recovering the session.

Regression introduced by commit 774d5f14e (NFSv4.1 Fix a pNFS session
draining deadlock)

Signed-off-by: Andy Adamson <andros@netapp.com>
[Trond: Changed order to start back-channel first. Minor code cleanup]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>=3.10]
2013-06-20 10:19:21 -04:00
Bryan Schumaker
7017310ad7 NFS: Apply v4.1 capabilities to v4.2
This fixes POSIX locks and possibly a few other v4.2 features, like
readdir plus.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-19 13:55:43 -04:00
Trond Myklebust
7dc0ac70f8 NFSv4.1: Clean up layout segment comparison helper names
Give them names that are a bit more consistent with the general
pNFS naming scheme.

 - lo_seg_contained -> pnfs_lseg_range_contained
 - lo_seg_intersecting -> pnfs_lseg_range_intersecting
 - cmp_layout -> pnfs_lseg_range_cmp
 - is_matching_lseg -> pnfs_lseg_range_match

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-18 13:47:18 -04:00
Trond Myklebust
3cb2df17ae NFSv4.1: layout segment comparison helpers should take 'const' parameters
Also strip off the unnecessary 'inline' declarations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-18 13:47:18 -04:00
Trond Myklebust
c8d74d9b68 NFSv4: Move the DNS resolver into the NFSv4 module
The other protocols don't use it, so make it local to NFSv4, and
remove the EXPORT.
Also ensure that we only compile in cache_lib.o if we're using
the legacy DNS resolver.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
2013-06-18 13:47:18 -04:00
Djalal Harouni
fe2d5395c4 NFSv4: SETCLIENTID add the format string for the NETID
Make sure that NFSv4 SETCLIENTID does not parse the NETID as a
format string.

Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-18 13:45:01 -04:00
Bryan Schumaker
6b140b85d9 NFS: Add in v4.2 callback operation
NFS v4.2 adds a CB_OFFLOAD operation used by COPY and WRITE_PLUS.  Since
neither of these operations have been implemented yet, simply return
NFS4ERR_NOTSUPP.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08 16:20:18 -04:00
Bryan Schumaker
459de2edb9 NFS: Make callbacks minor version generic
I found a few places that hardcode the minor version number rather than
making it dependent on the protocol the callback came in over.  This
patch makes it easier to add new minor versions in the future.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08 16:20:18 -04:00
Steve Dickson
f58eda9bc2 Kconfig: Add Kconfig entry for Labeled NFS V4 client
This patch adds the NFS_V4_SECURITY_LABEL entry which
enables security label support for the NFSv4 client

Signed-off-by: Steve Dickson <steved@redhat.com>
[trond: Make this non-interactive]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08 16:20:17 -04:00
David Quigley
c9bccef6b9 NFS: Extend NFS xattr handlers to accept the security namespace
The existing NFSv4 xattr handlers do not accept xattr calls to the security
namespace. This patch extends these handlers to accept xattrs from the security
namespace in addition to the default NFSv4 ACL namespace.

Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08 16:20:17 -04:00
David Quigley
aa9c266962 NFS: Client implementation of Labeled-NFS
This patch implements the client transport and handling support for labeled
NFS. The patch adds two functions to encode and decode the security label
recommended attribute which makes use of the LSM hooks added earlier. It also
adds code to grab the label from the file attribute structures and encode the
label to be sent back to the server.

Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08 16:20:16 -04:00
David Quigley
14c43f7678 NFS: Add label lifecycle management
This patch adds the lifecycle management for the security label structure
introduced in an earlier patch. The label is not used yet but allocations and
freeing of the structure is handled.

Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08 16:20:15 -04:00
David Quigley
1775fd3e80 NFS:Add labels to client function prototypes
After looking at all of the nfsv4 operations the label structure has been added
to the prototypes of the functions which can transmit label data.

Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08 16:20:15 -04:00
David Quigley
a09df2ca23 NFSv4: Extend fattr bitmaps to support all 3 words
The fattr handling bitmap code only uses the first two fattr words sofar. This
patch adds the 3rd word to being sent but doesn't populate it yet.

Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08 16:20:14 -04:00
Steve Dickson
e058f70b80 NFSv4: Introduce new label structure
In order to mimic the way that NFSv4 ACLs are implemented we have created a
structure to be used to pass label data up and down the call chain. This patch
adds the new structure and new members to the required NFSv4 call structures.

Signed-off-by: Matthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08 16:20:14 -04:00
Steve Dickson
42c2c4249c NFSv4.2: Added NFS v4.2 support to the NFS client
This enable NFSv4.2 support. To enable this code the
CONFIG_NFS_V4_2 Kconfig define needs to be set and
the -o v4.2 mount option need to be used.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08 16:20:13 -04:00
David Quigley
649f6e7718 LSM: Add flags field to security_sb_set_mnt_opts for in kernel mount data.
There is no way to differentiate if a text mount option is passed from user
space or the kernel. A flags field is being added to the
security_sb_set_mnt_opts hook to allow for in kernel security flags to be sent
to the LSM for processing in addition to the text options received from mount.
This patch also updated existing code to fix compilation errors.

Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
Signed-off-by: Miguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: Phua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: Khin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-08 16:20:12 -04:00
Trond Myklebust
c45ffdd269 NFSv4: Close another NFSv4 recovery race
State recovery currently relies on being able to find a valid
nfs_open_context in the inode->open_files list.
We therefore need to put the nfs_open_context on the list while
we're still protected by the sp->so_reclaim_seqcount in order
to avoid reboot races.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-06 16:24:44 -04:00
Trond Myklebust
275bb30786 NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-06 16:24:43 -04:00
Trond Myklebust
3efb972247 NFSv4: Refactor _nfs4_open_and_get_state to set ctx->state
Instead of having the callers set ctx->state, do it inside
_nfs4_open_and_get_state.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-06 16:24:42 -04:00
Trond Myklebust
4197a055eb NFSv4: Cleanup: pass the nfs_open_context to nfs4_do_open
All the callers have an open_context at this point, and since we always
need one in order to do state recovery, it makes sense to use it as the
basis for the nfs4_do_open() call.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-06 16:24:42 -04:00
Trond Myklebust
1a1a29fa84 NFSv4: Remove redundant check for FMODE_EXEC in nfs_finish_open
We already check the EXEC access mode in the lower layers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-06 16:24:41 -04:00
Trond Myklebust
5cc2216db8 NFSv4.1: Simplify setting the layout header credential
ctx->cred == ctx->state->owner->so_cred, so let's just use the former.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-06 16:24:38 -04:00
Trond Myklebust
4f0b429df1 NFSv4.1: Enable state protection
Use the EXCHGID4_FLAG_BIND_PRINC_STATEID exchange_id flag to enable
stateid protection. This means that if we create a stateid using a
particular principal, then we must use the same principal if we
want to change that state.
IOW: if we OPEN a file using a particular credential, then we have
to use the same credential in subsequent OPEN_DOWNGRADE, CLOSE,
or DELEGRETURN operations that use that stateid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-06 16:24:37 -04:00
Trond Myklebust
cd5875fefe NFSv4.1: Use layout credentials for get_deviceinfo calls
This is not strictly needed, since get_deviceinfo is not allowed to
return NFS4ERR_ACCESS or NFS4ERR_WRONG_CRED, but lets do it anyway
for consistency with other pNFS operations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-06 16:24:37 -04:00
Trond Myklebust
ab7cb0dfab NFSv4.1: Ensure that test_stateid and free_stateid use correct credentials
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-06 16:24:36 -04:00
Trond Myklebust
965e9c23de NFSv4.1: Ensure that reclaim_complete uses the right credential
We want to use the same credential for reclaim_complete as we used
for the exchange_id call.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-06 16:24:35 -04:00
Trond Myklebust
9556000d8c NFSv4.1: Ensure that layoutreturn uses the correct credential
We need to use the same credential as was used for the layoutget
and/or layoutcommit operations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-06 16:24:35 -04:00
Trond Myklebust
6ab59344d9 NFSv4.1: Ensure that layoutget is called using the layout credential
Ensure that we use the same credential for layoutget, layoutcommit and
layoutreturn.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-06-06 16:24:34 -04:00
Chuck Lever
eb54d43707 NFS: Fix security flavor negotiation with legacy binary mounts
Darrick J. Wong <darrick.wong@oracle.com> reports:
> I have a kvm-based testing setup that netboots VMs over NFS, the
> client end of which seems to have broken somehow in 3.10-rc1.  The
> server's exports file looks like this:
>
> /storage/mtr/x64	192.168.122.0/24(ro,sync,no_root_squash,no_subtree_check)
>
> On the client end (inside the VM), the initrd runs the following
> command to try to mount the rootfs over NFS:
>
> # mount -o nolock -o ro -o retrans=10 192.168.122.1:/storage/mtr/x64/ /root
>
> (Note: This is the busybox mount command.)
>
> The mount fails with -EINVAL.

Commit 4580a92d44 "NFS: Use server-recommended security flavor by
default (NFSv3)" introduced a behavior regression for NFS mounts
done via a legacy binary mount(2) call.

Ensure that a default security flavor is specified for legacy binary
mount requests, since they do not invoke nfs_select_flavor() in the
kernel.

Busybox uses klibc's nfsmount command, which performs NFS mounts
using the legacy binary mount data format.  /sbin/mount.nfs is not
affected by this regression.

Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-05-30 16:31:34 -04:00
Trond Myklebust
f448badd34 NFSv4: Fix a thinko in nfs4_try_open_cached
We need to pass the full open mode flags to nfs_may_open() when doing
a delegated open.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-05-29 16:03:23 -04:00
Chuck Lever
83c168bf80 NFS: Fix SETCLIENTID fallback if GSS is not available
Commit 79d852bf "NFS: Retry SETCLIENTID with AUTH_SYS instead of
AUTH_NONE" did not take into account commit 23631227 "NFSv4: Fix the
fallback to AUTH_NULL if krb5i is not available".

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-05-23 18:50:40 -04:00
Lukas Czerner
d47992f86b mm: change invalidatepage prototype to accept length
Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.

Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).

This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.

We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.

Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
2013-05-21 23:17:23 -04:00
Andy Adamson
774d5f14ee NFSv4.1 Fix a pNFS session draining deadlock
On a CB_RECALL the callback service thread flushes the inode using
filemap_flush prior to scheduling the state manager thread to return the
delegation. When pNFS is used and I/O has not yet gone to the data server
servicing the inode, a LAYOUTGET can preceed the I/O. Unlike the async
filemap_flush call, the LAYOUTGET must proceed to completion.

If the state manager starts to recover data while the inode flush is sending
the LAYOUTGET, a deadlock occurs as the callback service thread holds the
single callback session slot until the flushing is done which blocks the state
manager thread, and the state manager thread has set the session draining bit
which puts the inode flush LAYOUTGET RPC to sleep on the forechannel slot
table waitq.

Separate the draining of the back channel from the draining of the fore channel
by moving the NFS4_SESSION_DRAINING bit from session scope into the fore
and back slot tables.  Drain the back channel first allowing the LAYOUTGET
call to proceed (and fail) so the callback service thread frees the callback
slot. Then proceed with draining the forechannel.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-05-20 14:20:14 -04:00
Colin Cross
416ad3c9c0 freezer: add unsafe versions of freezable helpers for NFS
NFS calls the freezable helpers with locks held, which is unsafe
and will cause lockdep warnings when 6aa9707 "lockdep: check
that no locks held at freeze time" is reapplied (it was reverted
in dbf520a).  NFS shouldn't be doing this, but it has
long-running syscalls that must hold a lock but also shouldn't
block suspend.  Until NFS freeze handling is rewritten to use a
signal to exit out of the critical section, add new *_unsafe
versions of the helpers that will not run the lockdep test when
6aa9707 is reapplied, and call them from NFS.

In practice the likley result of holding the lock while freezing
is that a second task blocked on the lock will never freeze,
aborting suspend, but it is possible to manufacture a case using
the cgroup freezer, the lock, and the suspend freezer to create
a deadlock.  Silencing the lockdep warning here will allow
problems to be found in other drivers that may have a more
serious deadlock risk, and prevent new problems from being added.

Signed-off-by: Colin Cross <ccross@android.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-05-12 14:16:21 +02:00
Linus Torvalds
8cbc95ee74 More NFS client bugfixes for 3.10
- Ensure that we match the 'sec=' mount flavour against the server list
 - Fix the NFSv4 byte range locking in the presence of delegations
 - Ensure that we conform to the NFSv4.1 spec w.r.t. freeing lock stateids
 - Fix a pNFS data server connection race
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRit1yAAoJEGcL54qWCgDyD9EQAKgb37dXhGt7OXBRBP4EY/T8
 xJZ2tmdDZ6etLFJVftqCv05hBvyfilPLK0E9zg/zW/kvkKxYQ/fykvpzBR/+Q7KF
 quOmjDHLhDTXBnXzPg1HEoeTaXI2/a8CdjpxxEkthD4+FaKlyCXM+EFtA9orT9ZI
 oM+aNaqEzTjoQyryTFMcHxAvsrqjnZBa0MT6Fh45HaLaijV7CdDWoj6gjy6Lc3Al
 4wHeT8QrZTp/NfIN16uykFZjeWwul4N9upu+CI2V8ZDMEit6JDYX4sl5tB41PzYW
 audDBcu0waSqoVQ2mJ5OHoYGZf0wopMUFaAst+tn0pQvwWUfTjD8XtO8uOgeMNoz
 2S+XxUC2qhSMszwNBVSmwe2LtSAyHiw32Md4hqkLYDH2c7tk8bJPKDXZJACBzJS7
 O1aMmOgWar8+nmzvmXFeU804SxBykV1V8UgtXWp5IwC36V0HAYnM5xtHwXBR7HWe
 lnuVHVdux7ySeAyrs2aMdKk7SAw5OC//WW8qoEF5USDEIljeoBzA+IYu9n91Hg2b
 ufnsyxumGJ6dZ0iU2nJVoLagRaZcm6kOhnxcegMpb9IH2+RLCQNef09lj2iklm2j
 mJA4o2lkVEHOswg/NwKn/I4ho8tbNNb8v//S5KiqrYhiiqZhOzu3RRtFeZi91iac
 P/g+hPzfuGnmwcoCEUSa
 =5zpc
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull more NFS client bugfixes from Trond Myklebust:

 - Ensure that we match the 'sec=' mount flavour against the server list

 - Fix the NFSv4 byte range locking in the presence of delegations

 - Ensure that we conform to the NFSv4.1 spec w.r.t.  freeing lock
   stateids

 - Fix a pNFS data server connection race

* tag 'nfs-for-3.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS4.1 Fix data server connection race
  NFSv3: match sec= flavor against server list
  NFSv4.1: Ensure that we free the lock stateid on the server
  NFSv4: Convert nfs41_free_stateid to use an asynchronous RPC call
  SUNRPC: Don't spam syslog with "Pseudoflavor not found" messages
  NFSv4.x: Fix handling of partially delegated locks
2013-05-09 10:24:54 -07:00
Andy Adamson
c23266d532 NFS4.1 Fix data server connection race
Unlike meta data server mounts which support multiple mount points to
the same server via struct nfs_server, data servers support a single connection.

Concurrent calls to setup the data server connection can race where the first
call allocates the nfs_client struct, and before the cache struct nfs_client
pointer can be set, a second call also tries to setup the connection, finds the
already allocated nfs_client, bumps the reference count, re-initializes the
session,etc. This results in a hanging data server session after umount.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-05-08 17:19:32 -04:00
Al Viro
4385bab128 make blkdev_put() return void
same story as with the previous patches - note that return
value of blkdev_close() is lost, since there's nowhere the
caller (__fput()) could return it to.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-07 02:16:31 -04:00
Weston Andros Adamson
d497ab9751 NFSv3: match sec= flavor against server list
Older linux clients match the 'sec=' mount option flavor against the server's
flavor list (if available) and return EPERM if the specified flavor or AUTH_NULL
(which "matches" any flavor) is not found.

Recent changes skip this step and allow the vfs mount even though no operations
will succeed, creating a 'dud' mount.

This patch reverts back to the old behavior of matching specified flavors
against the server list and also returns EPERM when no sec= is specified and
none of the flavors returned by the server are supported by the client.

Example of behavior change:

the server's /etc/exports:

/export/krb5      *(sec=krb5,rw,no_root_squash)

old client behavior:

$ uname -a
Linux one.apikia.fake 3.8.8-202.fc18.x86_64 #1 SMP Wed Apr 17 23:25:17 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
$ sudo mount -v -o sec=sys,vers=3 zero:/export/krb5 /mnt
mount.nfs: timeout set for Sun May  5 17:32:04 2013
mount.nfs: trying text-based options 'sec=sys,vers=3,addr=192.168.100.10'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 192.168.100.10 prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: trying 192.168.100.10 prog 100005 vers 3 prot UDP port 20048
mount.nfs: mount(2): Permission denied
mount.nfs: access denied by server while mounting zero:/export/krb5

recently changed behavior:

$ uname -a
Linux one.apikia.fake 3.9.0-testing+ #2 SMP Fri May 3 20:29:32 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
$ sudo mount -v -o sec=sys,vers=3 zero:/export/krb5 /mnt
mount.nfs: timeout set for Sun May  5 17:37:17 2013
mount.nfs: trying text-based options 'sec=sys,vers=3,addr=192.168.100.10'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: trying 192.168.100.10 prog 100003 vers 3 prot TCP port 2049
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: trying 192.168.100.10 prog 100005 vers 3 prot UDP port 20048
$ ls /mnt
ls: cannot open directory /mnt: Permission denied
$ sudo ls /mnt
ls: cannot open directory /mnt: Permission denied
$ sudo df /mnt
df: ‘/mnt’: Permission denied
df: no file systems processed
$ sudo umount /mnt
$

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-05-06 17:24:36 -04:00
Trond Myklebust
c8b2d0bfd3 NFSv4.1: Ensure that we free the lock stateid on the server
This ensures that the server doesn't need to keep huge numbers of
lock stateids waiting around for the final CLOSE.
See section 8.2.4 in RFC5661.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-05-06 17:24:27 -04:00
Trond Myklebust
7c1d5fae4a NFSv4: Convert nfs41_free_stateid to use an asynchronous RPC call
The main reason for doing this is will be to allow for an asynchronous
RPC mode that we can use for freeing lock stateids as per section
8.2.4 of RFC5661.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-05-06 17:24:22 -04:00
Trond Myklebust
c5a2a15f81 NFSv4.x: Fix handling of partially delegated locks
If a NFS client receives a delegation for a file after it has taken
a lock on that file, we can currently end up in a situation where
we mistakenly skip unlocking that file.

The following patch swaps an erroneous check in nfs4_proc_unlck for
whether or not the file has a delegation to one which checks whether
or not we hold a lock stateid for that file.

Reported-by: Chuck Lever <Chuck.Lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>=3.7]
Tested-by: Chuck Lever <Chuck.Lever@oracle.com>
2013-05-03 12:18:47 -04:00
Linus Torvalds
2e1deaad1e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem update from James Morris:
 "Just some minor updates across the subsystem"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  ima: eliminate passing d_name.name to process_measurement()
  TPM: Retry SaveState command in suspend path
  tpm/tpm_i2c_infineon: Add small comment about return value of __i2c_transfer
  tpm/tpm_i2c_infineon.c: Add OF attributes type and name to the of_device_id table entries
  tpm_i2c_stm_st33: Remove duplicate inclusion of header files
  tpm: Add support for new Infineon I2C TPM (SLB 9645 TT 1.2 I2C)
  char/tpm: Convert struct i2c_msg initialization to C99 format
  drivers/char/tpm/tpm_ppi: use strlcpy instead of strncpy
  tpm/tpm_i2c_stm_st33: formatting and white space changes
  Smack: include magic.h in smackfs.c
  selinux: make security_sb_clone_mnt_opts return an error on context mismatch
  seccomp: allow BPF_XOR based ALU instructions.
  Fix NULL pointer dereference in smack_inode_unlink() and smack_inode_rmdir()
  Smack: add support for modification of existing rules
  smack: SMACK_MAGIC to include/uapi/linux/magic.h
  Smack: add missing support for transmute bit in smack_str_from_perm()
  Smack: prevent revoke-subject from failing when unseen label is written to it
  tomoyo: use DEFINE_SRCU() to define tomoyo_ss
  tomoyo: use DEFINE_SRCU() to define tomoyo_ss
2013-04-30 16:27:51 -07:00
Linus Torvalds
8728f986fe NFS client bugfixes and cleanups for 3.10
- NLM: stable fix for NFSv2/v3 blocking locks
 - NFSv4.x: stable fixes for the delegation recall error handling code
 - NFSv4.x: Security flavour negotiation fixes and cleanups by Chuck Lever
 - SUNRPC: A number of RPCSEC_GSS fixes and cleanups also from Chuck
 - NFSv4.x assorted state management and reboot recovery bugfixes
 - NFSv4.1: In cases where we have already looked up a file, and hold a
   valid filehandle, use the new open-by-filehandle operation instead of
   opening by name.
 - Allow the NFSv4.1 callback thread to freeze
 - NFSv4.x: ensure that file unlock waits for readahead to complete
 - NFSv4.1: ensure that the RPC layer doesn't override the NFS session
   table size negotiation by limiting the number of slots.
 - NFSv4.x: Fix SETATTR spec compatibility issues
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRfu+cAAoJEGcL54qWCgDylxkP/24cXOLHMKMYnIab0cQYIW2m
 SQgADGE+MqgTVlWjGVWublVMDY1R51iINsksAjxMtXYt50FdBJEqV2uxIGi4VnbR
 nR9eppqQ6vk5e6r5+aZyVmWKoLFnJ4MBF6OpPUZB5mf8iH/fiixmSYLvseopPdDj
 bjHwCxg+xEgew5EhQF/xqkEfkAp2NN84xUksTWb9uDIW2c3SJweY/ZVR2Zsqpugm
 oqYVtrSLvNKqINQG8OP10s+mRWULwoqapF+kEHlxNbRy26C0zlbXPaneSgYzqHsY
 OyRkAT7uJJqStYlqdW7k+DhyNMB+T33WAGJpWQlfJGYk5d/n0rtBJDVo0hfhCSQr
 VkOXiO9J08NMbelCu4+0CJii7h5GCaqpuJEEmNL6AlF/TJVkIQJuRaG2+WDmEtO2
 oYd4UfXlAbUuts1SW7u/yyN/yrjVTm1tZYRBqn2VJdqh1s8dMxEWPct2Yn314mpS
 ODAnbDkEhtWlc9cloSRnwKec5WcxMZb19IJeK9ZvHm7PfIu/QHtj6Ren8s1//bZI
 OMQxC/Vf/wcjMdNtr7QdMNxWG1aK8DL9mYP5XwCrkZ560LIrtxmhyqYeoGAfgO5u
 +K/gKmQwjsaPhEa8jbP2/wI0II9yKPWj/fVwqhbhqaBUx5GA2iAKcdpPP6JAMAti
 +PXkLTtkyrIgSNwzl63S
 =Hgot
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes and cleanups from Trond Myklebust:

 - NLM: stable fix for NFSv2/v3 blocking locks

 - NFSv4.x: stable fixes for the delegation recall error handling code

 - NFSv4.x: Security flavour negotiation fixes and cleanups by Chuck
   Lever

 - SUNRPC: A number of RPCSEC_GSS fixes and cleanups also from Chuck

 - NFSv4.x assorted state management and reboot recovery bugfixes

 - NFSv4.1: In cases where we have already looked up a file, and hold a
   valid filehandle, use the new open-by-filehandle operation instead of
   opening by name.

 - Allow the NFSv4.1 callback thread to freeze

 - NFSv4.x: ensure that file unlock waits for readahead to complete

 - NFSv4.1: ensure that the RPC layer doesn't override the NFS session
   table size negotiation by limiting the number of slots.

 - NFSv4.x: Fix SETATTR spec compatibility issues

* tag 'nfs-for-3.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits)
  NFSv4: Warn once about servers that incorrectly apply open mode to setattr
  NFSv4: Servers should only check SETATTR stateid open mode on size change
  NFSv4: Don't recheck permissions on open in case of recovery cached open
  NFSv4.1: Don't do a delegated open for NFS4_OPEN_CLAIM_DELEG_CUR_FH modes
  NFSv4.1: Use the more efficient open_noattr call for open-by-filehandle
  NFS: Retry SETCLIENTID with AUTH_SYS instead of AUTH_NONE
  NFSv4: Ensure that we clear the NFS_OPEN_STATE flag when appropriate
  LOCKD: Ensure that nlmclnt_block resets block->b_status after a server reboot
  NFSv4: Ensure the LOCK call cannot use the delegation stateid
  NFSv4: Use the open stateid if the delegation has the wrong mode
  nfs: Send atime and mtime as a 64bit value
  NFSv4: Record the OPEN create mode used in the nfs4_opendata structure
  NFSv4.1: Set the RPC_CLNT_CREATE_INFINITE_SLOTS flag for NFSv4.1 transports
  SUNRPC: Allow rpc_create() to request that TCP slots be unlimited
  SUNRPC: Fix a livelock problem in the xprt->backlog queue
  NFSv4: Fix handling of revoked delegations by setattr
  NFSv4 release the sequence id in the return on close case
  nfs: remove unnecessary check for NULL inode->i_flock from nfs_delegation_claim_locks
  NFS: Ensure that NFS file unlock waits for readahead to complete
  NFS: Add functionality to allow waiting on all outstanding reads to complete
  ...
2013-04-30 11:28:08 -07:00
Linus Torvalds
5d434fcb25 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual stuff, mostly comment fixes, typo fixes, printk fixes and small
  code cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (45 commits)
  mm: Convert print_symbol to %pSR
  gfs2: Convert print_symbol to %pSR
  m32r: Convert print_symbol to %pSR
  iostats.txt: add easy-to-find description for field 6
  x86 cmpxchg.h: fix wrong comment
  treewide: Fix typo in printk and comments
  doc: devicetree: Fix various typos
  docbook: fix 8250 naming in device-drivers
  pata_pdc2027x: Fix compiler warning
  treewide: Fix typo in printks
  mei: Fix comments in drivers/misc/mei
  treewide: Fix typos in kernel messages
  pm44xx: Fix comment for "CONFIG_CPU_IDLE"
  doc: Fix typo "CONFIG_CGROUP_CGROUP_MEMCG_SWAP"
  mmzone: correct "pags" to "pages" in comment.
  kernel-parameters: remove outdated 'noresidual' parameter
  Remove spurious _H suffixes from ifdef comments
  sound: Remove stray pluses from Kconfig file
  radio-shark: Fix printk "CONFIG_LED_CLASS"
  doc: put proper reference to CONFIG_MODULE_SIG_ENFORCE
  ...
2013-04-30 09:36:50 -07:00
Trond Myklebust
721ccfb79b NFSv4: Warn once about servers that incorrectly apply open mode to setattr
Debugging aid to help identify servers that incorrectly apply open mode
checks to setattr requests that are not changing the file size.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-29 11:11:58 -04:00
Trond Myklebust
ee3ae84ef4 NFSv4: Servers should only check SETATTR stateid open mode on size change
The NFSv4 and NFSv4.1 specs are both clear that the server should only check
stateid open mode if a SETATTR specifies the size attribute. If the
open mode is not one that allows writing, then it returns NFS4ERR_OPENMODE.

In the case where the SETATTR is not changing the size, the client will
still pass it the delegation stateid to ensure that the server does not
recall that delegation. In that case, the server should _ignore_ the
delegation open mode, and simply apply standard permission checks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-29 11:11:39 -04:00
Trond Myklebust
b0212b84fb Merge branch 'bugfixes' into linux-next
Fix up a conflict between the linux-next branch and mainline.
Conflicts:
	fs/nfs/nfs4proc.c
2013-04-23 15:52:14 -04:00
Trond Myklebust
bd1d421abc Merge branch 'rpcsec_gss-from_cel' into linux-next
* rpcsec_gss-from_cel: (21 commits)
  NFS: Retry SETCLIENTID with AUTH_SYS instead of AUTH_NONE
  NFSv4: Don't clear the machine cred when client establish returns EACCES
  NFSv4: Fix issues in nfs4_discover_server_trunking
  NFSv4: Fix the fallback to AUTH_NULL if krb5i is not available
  NFS: Use server-recommended security flavor by default (NFSv3)
  SUNRPC: Don't recognize RPC_AUTH_MAXFLAVOR
  NFS: Use "krb5i" to establish NFSv4 state whenever possible
  NFS: Try AUTH_UNIX when PUTROOTFH gets NFS4ERR_WRONGSEC
  NFS: Use static list of security flavors during root FH lookup recovery
  NFS: Avoid PUTROOTFH when managing leases
  NFS: Clean up nfs4_proc_get_rootfh
  NFS: Handle missing rpc.gssd when looking up root FH
  SUNRPC: Remove EXPORT_SYMBOL_GPL() from GSS mech switch
  SUNRPC: Make gss_mech_get() static
  SUNRPC: Refactor nfsd4_do_encode_secinfo()
  SUNRPC: Consider qop when looking up pseudoflavors
  SUNRPC: Load GSS kernel module by OID
  SUNRPC: Introduce rpcauth_get_pseudoflavor()
  SUNRPC: Define rpcsec_gss_info structure
  NFS: Remove unneeded forward declaration
  ...
2013-04-23 15:40:40 -04:00
Trond Myklebust
bdeca1b76c NFSv4: Don't recheck permissions on open in case of recovery cached open
If we already checked the user access permissions on the original open,
then don't bother checking again on recovery. Doing so can cause a
deadlock with NFSv4.1, since the may_open() operation is not privileged.
Furthermore, we can't report an access permission failure here anyway.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-23 14:52:44 -04:00
Trond Myklebust
cd4c9be2c6 NFSv4.1: Don't do a delegated open for NFS4_OPEN_CLAIM_DELEG_CUR_FH modes
If we're in a delegation recall situation, we can't do a delegated open.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-23 14:46:25 -04:00
Trond Myklebust
8188df1733 NFSv4.1: Use the more efficient open_noattr call for open-by-filehandle
When we're doing open-by-filehandle in NFSv4.1, we shouldn't need to
do the cache consistency revalidation on the directory. It is
therefore more efficient to just use open_noattr, which returns the
file attributes, but not the directory attributes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-23 14:31:19 -04:00
Chuck Lever
79d852bf5e NFS: Retry SETCLIENTID with AUTH_SYS instead of AUTH_NONE
Recently I changed the SETCLIENTID code to use AUTH_GSS(krb5i), and
then retry with AUTH_NONE if that didn't work.  This was to enable
Kerberos NFS mounts to work without forcing Linux NFS clients to
have a keytab on hand.

Rick Macklem reports that the FreeBSD server accepts AUTH_NONE only
for NULL operations (thus certainly not for SETCLIENTID).  Falling
back to AUTH_NONE means our proposed 3.10 NFS client will not
interoperate with FreeBSD servers over NFSv4 unless Kerberos is
fully configured on both ends.

If the Linux client falls back to using AUTH_SYS instead for
SETCLIENTID, all should work fine as long as the NFS server is
configured to allow AUTH_SYS for SETCLIENTID.

This may still prevent access to Kerberos-only FreeBSD servers by
Linux clients with no keytab.  Rick is of the opinion that the
security settings the server applies to its pseudo-fs should also
apply to the SETCLIENTID operation.

Linux and Solaris NFS servers do not place that limitation on
SETCLIENTID.  The security settings for the server's pseudo-fs are
determined automatically as the union of security flavors allowed on
real exports, as recommended by RFC 3530bis; and the flavors allowed
for SETCLIENTID are all flavors supported by the respective server
implementation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-22 16:09:53 -04:00
Trond Myklebust
fd068b200f NFSv4: Ensure that we clear the NFS_OPEN_STATE flag when appropriate
We should always clear it before initiating file recovery.
Also ensure that we clear it after a CLOSE and/or after TEST_STATEID fails.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-22 11:29:51 -04:00
Trond Myklebust
8e472f33b5 NFSv4: Ensure the LOCK call cannot use the delegation stateid
Defensive patch to ensure that we copy the state->open_stateid, which
can never be set to the delegation stateid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-20 01:39:54 -04:00
Trond Myklebust
92b40e9384 NFSv4: Use the open stateid if the delegation has the wrong mode
Fix nfs4_select_rw_stateid() so that it chooses the open stateid
(or an all-zero stateid) if the delegation does not match the selected
read/write mode.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-20 01:39:42 -04:00
Bryan Schumaker
042ad0b398 nfs: Send atime and mtime as a 64bit value
RFC 3530 says that the seconds value of a nfstime4 structure is a 64bit
value, but we are instead sending a 32-bit 0 and then a 32bit conversion
of the 64bit Linux value.  This means that if we try to set atime to a
value before the epoch (touch -t 196001010101) the client will only send
part of the new value due to lost precision.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-19 17:21:07 -04:00
Trond Myklebust
549b19cc9f NFSv4: Record the OPEN create mode used in the nfs4_opendata structure
If we're doing NFSv4.1 against a server that has persistent sessions,
then we should not need to call SETATTR in order to reset the file
attributes immediately after doing an exclusive create.

Note that since the create mode depends on the type of session that
has been negotiated with the server, we should not choose the
mode until after we've got a session slot.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-16 18:58:26 -04:00
Trond Myklebust
98f98cf571 NFSv4.1: Set the RPC_CLNT_CREATE_INFINITE_SLOTS flag for NFSv4.1 transports
This ensures that the RPC layer doesn't override the NFS session
negotiation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-14 12:59:28 -04:00
Trond Myklebust
b570a975ed NFSv4: Fix handling of revoked delegations by setattr
Currently, _nfs4_do_setattr() will use the delegation stateid if no
writeable open file stateid is available.
If the server revokes that delegation stateid, then the call to
nfs4_handle_exception() will fail to handle the error due to the
lack of a struct nfs4_state, and will just convert the error into
an EIO.

This patch just removes the requirement that we must have a
struct nfs4_state in order to invalidate the delegation and
retry.

Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-12 15:21:15 -04:00
Masanari Iida
a895d57da0 treewide: Fix typo in printks
Correct spelling typos in printk and comments.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-04-12 15:21:36 +02:00
Andy Adamson
b9536ad521 NFSv4 release the sequence id in the return on close case
Otherwise we deadlock if state recovery is initiated while we
sleep.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-11 09:39:53 -04:00
Jeff Layton
314d7cc05d nfs: remove unnecessary check for NULL inode->i_flock from nfs_delegation_claim_locks
The second check was added in commit 65b62a29 but it will never be true.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-10 15:40:31 -04:00
Linus Torvalds
51de017007 NFS client bugfixes for Linux 3.9
- Fix a brain fart in nfs41_walk_client_list
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRZZq5AAoJEGcL54qWCgDypU8P/0daWpe+a8TNpXDA0KdYZKYN
 KNXvZkNNk/TtSiQo5gPzRnD4CgZIZ4n+EX9U94gmdNr/UQz7xiL+bHZY4zFtQ574
 i+QMiLbf687anY7vLBL1eKOhKHeBMoIrk2G3iineEUhfzF97cqtgqIou1pSS/BCa
 2kk/w/LRWPOaMpr802y2p9R/mejRtDbTIwaPURTKA3Pw+odwiVib3FXMIoXDI5Iq
 QzH2fl+Q0me/Z2c5Y+KRs5X3gY1MWdhpZUbEpKy3iLAxlgl3gfp7Mxpb61dw5gBz
 Jl2F1lDOzYmU1Uqe88G7w38RnBD0Q7RWtlQzZFMeIQsk1TqPsx9ymFRxaZu1Q6HZ
 +hdpfVsFDhGNTvLZF4YSP4c7AS9s1yEj8erT8Ro90Ar/PuZi15N6HpDzHHAiIQWK
 HsqSLQBrW24cFk2Ybed7YVcFdNxHdR3DDYVVstodnhIw9VwDSvQfPBlhlPqF+Q/9
 onnAMsc6SqHnLhFV7yCF6tB0Of4ZPO0oIeW8C0Hrxo+sPly03BvasAvaSWr3uheh
 wqEtawNm9QQVMdWSA1hA0LV6P887yTRXruT83uC14doPlz5g0hxlvAZQfDC3Ld3J
 ae4HARv3LLFj7Dk9/9yyM6FELyTIe8YvqvH8u9QenPQEmW0VlaPVp73vPEhL5yPA
 TxWSJtquxq5ajpH5lBeI
 =G1ZG
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.9-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull another nfs fixlet from Trond Myklebust:
 "I suddenly noticed that a one-line issue that I _thought_ I had fixed
  with the nfs41_walk_client_list patch was apparently still there in
  the pull request I sent earlier today.  I'm very sorry for not
  catching that in time.

   - Fix a brain fart in nfs41_walk_client_list"

* tag 'nfs-for-3.9-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Doh! Typo in the fix to nfs41_walk_client_list
2013-04-10 10:26:49 -07:00
Trond Myklebust
eb04e0ac19 NFSv4: Doh! Typo in the fix to nfs41_walk_client_list
Make sure that we set the status to 0 on success. Missed in testing
because it never appears when doing multiple mounts to _different_
servers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: <stable@vger.kernel.org> # 3.7.x: 7b1f1fd: NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list
2013-04-10 12:57:29 -04:00
Linus Torvalds
f94eeb423b NFS client bugfixes for Linux 3.9
- Stable fix for memory corruption issues in nfs4[01]_walk_client_list
 - Stable fix for an Oopsable bug in rpc_clone_client
 - Another state manager deadlock in the NFSv4 open code
 - Memory leaks in nfs4_discover_server_trunking and rpc_new_client
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRZYu9AAoJEGcL54qWCgDySfwP/R2IdO2nfRzmDCPtvD6pPg8T
 l8Gf97Z/8A3g6WwfvmKNt48D1fKnhAcOaKTZQIZuZePAjI/Yy74DFMof6paiDmsO
 8hMcZgvunZotPwmBmhIwmLOxDYgbpdizDBlITsimnUQLrv78bMw2F/cNCcThYgTI
 Q4sNpZsl4kk1nmOYK/tGBCCkq6mIQhc95QeQPgnl2B/NozpZiIqgzrpWpSWMofn2
 cuSLiuEdmpCdJbgQaPEjSWf+doo/nBn720+Xj2RjmLhTTnWUtAsouElAdMs96Jjz
 cEhSll3nLIygr1xdFF7CD8qFjpbtg/YNhKw3HBCFAgHjrAjr+a3N+eHQOz9QQ6W4
 5OL3Mj0VEkvMrK1Sy76smynQJMJhrsn852Zo2wK2mCp+mHNZlBlML529Y4PJy2Ba
 Up4MteIaOTpKGSnBdzWmqPqro9glqlhrUk/o3XipCzIziWC8yDYjl2J9Ez8B7Ren
 uzvBeevYRX9AmQlmZUAPvx8+xVqA6cr0X2q8/6PqPnrNXP6Ff8+rm6gvH4VozyzJ
 qd/r7Bf1ozFXxoKQOztSiGjI5YiBp4DRXycR5td6eF3nZJipmbxY+WKllhaAakn6
 UY2NsGX2zfxkJMltqd2/xRmHtN+Eif1Uoo35pvzNxzBtPsRxBMIiPhGLglQu98Yj
 2NuwfT4//UNfS6JlBe6E
 =kBf2
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.9-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - fix for memory corruption issues in nfs4[01]_walk_client_list (stable)
 - fix for an Oopsable bug in rpc_clone_client (stable)
 - another state manager deadlock in the NFSv4 open code
 - memory leaks in nfs4_discover_server_trunking and rpc_new_client

* tag 'nfs-for-3.9-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Fix another potential state manager deadlock
  SUNRPC: Fix a potential memory leak in rpc_new_client
  NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list
  NFSv4: Fix a memory leak in nfs4_discover_server_trunking
  SUNRPC: Remove extra xprt_put()
2013-04-10 09:00:51 -07:00
Trond Myklebust
fa332941c0 NFSv4: Fix another potential state manager deadlock
Don't hold the NFSv4 sequence id while we check for open permission.
The call to ACCESS may block due to reboot recovery.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-09 13:19:35 -04:00
Trond Myklebust
7a8203d8cb NFS: Ensure that NFS file unlock waits for readahead to complete
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-08 22:12:42 -04:00
Trond Myklebust
577b42327d NFS: Add functionality to allow waiting on all outstanding reads to complete
This will later allow NFS locking code to wait for readahead to complete
before releasing byte range locks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-08 22:12:33 -04:00
Trond Myklebust
bc7a05ca51 NFSv4: Handle timeouts correctly when probing for lease validity
When we send a RENEW or SEQUENCE operation in order to probe if the
lease is still valid, we want it to be able to time out since the
lease we are probing is likely to time out too. Currently, because
we use soft mount semantics for these RPC calls, the return value
is EIO, which causes the state manager to exit with an "unhandled
error" message.
This patch changes the call semantics, so that the RPC layer returns
ETIMEDOUT instead of EIO. We then have the state manager default to
a simple retry instead of exiting.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-08 18:01:59 -04:00
Trond Myklebust
826e001308 NFSv4: Fix CB_RECALL_ANY to only return delegations that are not in use
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-05 17:03:57 -04:00
Trond Myklebust
b02ba0b660 NFSv4: Clean up nfs_expire_all_delegations
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-05 17:03:56 -04:00
Trond Myklebust
5c31e2368f NFSv4: Fix nfs_server_return_all_delegations
If the state manager thread is already running, we may end up
racing with it in nfs_client_return_marked_delegations. Better to
just allow the state manager thread to do the job.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-05 17:03:56 -04:00
Trond Myklebust
b757144fd7 NFSv4: Be less aggressive about returning delegations for open files
Currently, if the application that holds the file open isn't doing
I/O, we may end up returning the delegation. This means that we can
no longer cache the file as aggressively, and often also that we
multiply the state that both the server and the client needs to track.

This patch adds a check for open files to the routine that scans
for delegations that are unreferenced.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-05 17:03:55 -04:00
Trond Myklebust
db4f2e637f NFSv4: Clean up delegation recall error handling
Unify the error handling in nfs4_open_delegation_recall and
nfs4_lock_delegation_recall.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-05 17:03:55 -04:00
Trond Myklebust
be76b5b68d NFSv4: Clean up nfs4_open_delegation_recall
Make it symmetric with nfs4_lock_delegation_recall

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-05 17:03:54 -04:00
Trond Myklebust
4a706fa09f NFSv4: Clean up nfs4_lock_delegation_recall
All error cases are handled by the switch() statement, meaning that the
call to nfs4_handle_exception() is unreachable.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-05 17:03:54 -04:00
Trond Myklebust
8b6cc4d6f8 NFSv4: Handle NFS4ERR_DELAY and NFS4ERR_GRACE in nfs4_open_delegation_recall
A server shouldn't normally return NFS4ERR_GRACE if the client holds a
delegation, since no conflicting lock reclaims can be granted, however
the spec does not require the server to grant the open in this
instance

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-04-05 17:03:53 -04:00
Trond Myklebust
dbb21c25a3 NFSv4: Handle NFS4ERR_DELAY and NFS4ERR_GRACE in nfs4_lock_delegation_recall
A server shouldn't normally return NFS4ERR_GRACE if the client holds a
delegation, since no conflicting lock reclaims can be granted, however
the spec does not require the server to grant the lock in this
instance.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-04-05 17:03:53 -04:00
Jeff Layton
25d280aad8 nfs: allow the v4.1 callback thread to freeze
The v4.1 callback thread has set_freezable() at the top, but it doesn't
ever try to freeze within the loop. Have it call try_to_freeze() at the
top of the loop. If a freeze event occurs, recheck kthread_should_stop()
after thawing.

Reported-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-05 17:03:52 -04:00
Trond Myklebust
7b1f1fd184 NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list
It is unsafe to use list_for_each_entry_safe() here, because
when we drop the nn->nfs_client_lock, we pin the _current_ list
entry and ensure that it stays in the list, but we don't do the
same for the _next_ list entry. Use of list_for_each_entry() is
therefore the correct thing to do.

Also fix the refcounting in nfs41_walk_client_list().

Finally, ensure that the nfs_client has finished being initialised
and, in the case of NFSv4.1, that the session is set up.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
Cc: stable@vger.kernel.org [>= 3.7]
2013-04-05 16:59:19 -04:00
Trond Myklebust
b193d59a48 NFSv4: Fix a memory leak in nfs4_discover_server_trunking
When we assign a new rpc_client to clp->cl_rpcclient, we need to destroy
the old one.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org [>=3.7]
2013-04-05 16:59:15 -04:00
Trond Myklebust
845cbceb22 NFSv4: Don't clear the machine cred when client establish returns EACCES
The expected behaviour is that the client will decide at mount time
whether or not to use a krb5i machine cred, or AUTH_NULL.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
2013-04-05 15:37:04 -04:00
Trond Myklebust
ea33e6c3e7 NFSv4: Fix issues in nfs4_discover_server_trunking
- Ensure that we exit with ENOENT if the call to ops->get_clid_cred()
  fails.
- Handle the case where ops->detect_trunking() exits with an
  unexpected error, and return EIO.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-05 13:22:50 -04:00
Trond Myklebust
23631227a6 NFSv4: Fix the fallback to AUTH_NULL if krb5i is not available
If the rpcsec_gss_krb5 module cannot be loaded, the attempt to create
an rpc_client in nfs4_init_client will currently fail with an EINVAL.
Fix is to retry with AUTH_NULL.

Regression introduced by the commit "NFS: Use "krb5i" to establish NFSv4
state whenever possible"

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
2013-04-04 17:01:25 -04:00
Chuck Lever
4580a92d44 NFS: Use server-recommended security flavor by default (NFSv3)
Since commit ec88f28d in 2009, checking if the user-specified flavor
is in the server's flavor list has been the source of a few
noticeable regressions (now fixed), but there is one that is still
vexing.

An NFS server can list AUTH_NULL in its flavor list, which suggests
a client should try to mount the server with the flavor of the
client's choice, but the server will squash all accesses.  In some
cases, our client fails to mount a server because of this check,
when the mount could have proceeded successfully.

Skip this check if the user has specified "sec=" on the mount
command line.  But do consult the server-provided flavor list to
choose a security flavor if no sec= option is specified on the mount
command.

If a server lists Kerberos pseudoflavors before "sys" in its export
options, our client now chooses Kerberos over AUTH_UNIX for mount
points, when no security flavor is specified by the mount command.
This could be surprising to some administrators or users, who would
then need to have Kerberos credentials to access the export.

Or, a client administrator may not have enabled rpc.gssd.  In this
case, auth_rpcgss.ko might still be loadable, which is enough for
the new logic to choose Kerberos over AUTH_UNIX.  But the mount
would fail since no GSS context can be created without rpc.gssd
running.

To retain the use of AUTH_UNIX by default:

  o  The server administrator can ensure that "sys" is listed before
     Kerberos flavors in its export security options (see
     exports(5)),

  o  The client administrator can explicitly specify "sec=sys" on
     its mount command line (see nfs(5)),

  o  The client administrator can use "Sec=sys" in an appropriate
     section of /etc/nfsmount.conf (see nfsmount.conf(5)), or

  o  The client administrator can blacklist auth_rpcgss.ko.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-04 17:01:01 -04:00
Jeff Layton
094f7b69ea selinux: make security_sb_clone_mnt_opts return an error on context mismatch
I had the following problem reported a while back. If you mount the
same filesystem twice using NFSv4 with different contexts, then the
second context= option is ignored. For instance:

    # mount server:/export /mnt/test1
    # mount server:/export /mnt/test2 -o context=system_u:object_r:tmp_t:s0
    # ls -dZ /mnt/test1
    drwxrwxrwt. root root system_u:object_r:nfs_t:s0       /mnt/test1
    # ls -dZ /mnt/test2
    drwxrwxrwt. root root system_u:object_r:nfs_t:s0       /mnt/test2

When we call into SELinux to set the context of a "cloned" superblock,
it will currently just bail out when it notices that we're reusing an
existing superblock. Since the existing superblock is already set up and
presumably in use, we can't go overwriting its context with the one from
the "original" sb. Because of this, the second context= option in this
case cannot take effect.

This patch fixes this by turning security_sb_clone_mnt_opts into an int
return operation. When it finds that the "new" superblock that it has
been handed is already set up, it checks to see whether the contexts on
the old superblock match it. If it does, then it will just return
success, otherwise it'll return -EBUSY and emit a printk to tell the
admin why the second mount failed.

Note that this patch may cause casualties. The NFSv4 code relies on
being able to walk down to an export from the pseudoroot. If you mount
filesystems that are nested within one another with different contexts,
then this patch will make those mounts fail in new and "exciting" ways.

For instance, suppose that /export is a separate filesystem on the
server:

    # mount server:/ /mnt/test1
    # mount salusa:/export /mnt/test2 -o context=system_u:object_r:tmp_t:s0
    mount.nfs: an incorrect mount option was specified

...with the printk in the ring buffer. Because we *might* eventually
walk down to /mnt/test1/export, the mount is denied due to this patch.
The second mount needs the pseudoroot superblock, but that's already
present with the wrong context.

OTOH, if we mount these in the reverse order, then both mounts work,
because the pseudoroot superblock created when mounting /export is
discarded once that mount is done. If we then however try to walk into
that directory, the automount fails for the similar reasons:

    # cd /mnt/test1/scratch/
    -bash: cd: /mnt/test1/scratch: Device or resource busy

The story I've gotten from the SELinux folks that I've talked to is that
this is desirable behavior. In SELinux-land, mounting the same data
under different contexts is wrong -- there can be only one.

Cc: Steve Dickson <steved@redhat.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2013-04-02 11:30:13 +11:00
Chuck Lever
4edaa30888 NFS: Use "krb5i" to establish NFSv4 state whenever possible
Currently our client uses AUTH_UNIX for state management on Kerberos
NFS mounts in some cases.  For example, if the first mount of a
server specifies "sec=sys," the SETCLIENTID operation is performed
with AUTH_UNIX.  Subsequent mounts using stronger security flavors
can not change the flavor used for lease establishment.  This might
be less security than an administrator was expecting.

Dave Noveck's migration issues draft recommends the use of an
integrity-protecting security flavor for the SETCLIENTID operation.
Let's ignore the mount's sec= setting and use krb5i as the default
security flavor for SETCLIENTID.

If our client can't establish a GSS context (eg. because it doesn't
have a keytab or the server doesn't support Kerberos) we fall back
to using AUTH_NULL.  For an operation that requires a
machine credential (which never represents a particular user)
AUTH_NULL is as secure as AUTH_UNIX.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-29 15:45:22 -04:00
Chuck Lever
c4eafe1135 NFS: Try AUTH_UNIX when PUTROOTFH gets NFS4ERR_WRONGSEC
Most NFSv4 servers implement AUTH_UNIX, and administrators will
prefer this over AUTH_NULL.  It is harmless for our client to try
this flavor in addition to the flavors mandated by RFC 3530/5661.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-29 15:45:09 -04:00
Chuck Lever
9a744ba398 NFS: Use static list of security flavors during root FH lookup recovery
If the Linux NFS client receives an NFS4ERR_WRONGSEC error while
trying to look up an NFS server's root file handle, it retries the
lookup operation with various security flavors to see what flavor
the NFS server will accept for pseudo-fs access.

The list of flavors the client uses during retry consists only of
flavors that are currently registered in the kernel RPC client.
This list may not include any GSS pseudoflavors if auth_rpcgss.ko
has not yet been loaded.

Let's instead use a static list of security flavors that the NFS
standard requires the server to implement (RFC 3530bis, section
3.2.1).  The RPC client should now be able to load support for
these dynamically; if not, they are skipped.

Recovery behavior here is prescribed by RFC 3530bis, section
15.33.5:

> For LOOKUPP, PUTROOTFH and PUTPUBFH, the client will be unable to
> use the SECINFO operation since SECINFO requires a current
> filehandle and none exist for these two [sic] operations.  Therefore,
> the client must iterate through the security triples available at
> the client and reattempt the PUTROOTFH or PUTPUBFH operation.  In
> the unfortunate event none of the MANDATORY security triples are
> supported by the client and server, the client SHOULD try using
> others that support integrity.  Failing that, the client can try
> using AUTH_NONE, but because such forms lack integrity checks,
> this puts the client at risk.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-29 15:44:58 -04:00
Chuck Lever
83ca7f5ab3 NFS: Avoid PUTROOTFH when managing leases
Currently, the compound operation the Linux NFS client sends to the
server to confirm a client ID looks like this:

	{ SETCLIENTID_CONFIRM; PUTROOTFH; GETATTR(lease_time) }

Once the lease is confirmed, it makes sense to know how long before
the client will have to renew it.  And, performing these operations
in the same compound saves a round trip.

Unfortunately, this arrangement assumes that the security flavor
used for establishing a client ID can also be used to access the
server's pseudo-fs.

If the server requires a different security flavor to access its
pseudo-fs than it allowed for the client's SETCLIENTID operation,
the PUTROOTFH in this compound fails with NFS4ERR_WRONGSEC.  Even
though the SETCLIENTID_CONFIRM succeeded, our client's trunking
detection logic interprets the failure of the compound as a failure
by the server to confirm the client ID.

As part of server trunking detection, the client then begins another
SETCLIENTID pass with the same nfs4_client_id.  This fails with
NFS4ERR_CLID_INUSE because the first SETCLIENTID/SETCLIENTID_CONFIRM
already succeeded in confirming that client ID -- it was the
PUTROOTFH operation that caused the SETCLIENTID_CONFIRM compound to
fail.

To address this issue, separate the "establish client ID" step from
the "accessing the server's pseudo-fs root" step.  The first access
of the server's pseudo-fs may require retrying the PUTROOTFH
operation with different security flavors.  This access is done in
nfs4_proc_get_rootfh().

That leaves the matter of how to retrieve the server's lease time.
nfs4_proc_fsinfo() already retrieves the lease time value, though
none of its callers do anything with the retrieved value (nor do
they mark the lease as "renewed").

Note that NFSv4.1 state recovery invokes nfs4_proc_get_lease_time()
using the lease management security flavor.  This may cause some
heartburn if that security flavor isn't the same as the security
flavor the server requires for accessing the pseudo-fs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-29 15:44:49 -04:00
Chuck Lever
2ed4b95b7e NFS: Clean up nfs4_proc_get_rootfh
The long lines with no vertical white space make this function
difficult for humans to read.  Add a proper documenting comment
while we're here.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-29 15:44:12 -04:00
Chuck Lever
75bc8821bd NFS: Handle missing rpc.gssd when looking up root FH
When rpc.gssd is not running, any NFS operation that needs to use a
GSS security flavor of course does not work.

If looking up a server's root file handle results in an
NFS4ERR_WRONGSEC, nfs4_find_root_sec() is called to try a bunch of
security flavors until one works or all reasonable flavors have
been tried.  When rpc.gssd isn't running, this loop seems to fail
immediately after rpcauth_create() craps out on the first GSS
flavor.

When the rpcauth_create() call in nfs4_lookup_root_sec() fails
because rpc.gssd is not available, nfs4_lookup_root_sec()
unconditionally returns -EIO.  This prevents nfs4_find_root_sec()
from retrying any other flavors; it drops out of its loop and fails
immediately.

Having nfs4_lookup_root_sec() return -EACCES instead allows
nfs4_find_root_sec() to try all flavors in its list.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-29 15:43:55 -04:00
Chuck Lever
9568c5e9a6 SUNRPC: Introduce rpcauth_get_pseudoflavor()
A SECINFO reply may contain flavors whose kernel module is not
yet loaded by the client's kernel.  A new RPC client API, called
rpcauth_get_pseudoflavor(), is introduced to do proper checking
for support of a security flavor.

When this API is invoked, the RPC client now tries to load the
module for each flavor first before performing the "is this
supported?" check.  This means if a module is available on the
client, but has not been loaded yet, it will be loaded and
registered automatically when the SECINFO reply is processed.

The new API can take a full GSS tuple (OID, QoP, and service).
Previously only the OID and service were considered.

nfs_find_best_sec() is updated to verify all flavors requested in a
SECINFO reply, including AUTH_NULL and AUTH_UNIX.  Previously these
two flavors were simply assumed to be supported without consulting
the RPC client.

Note that the replaced version of nfs_find_best_sec() can return
RPC_AUTH_MAXFLAVOR if the server returns a recognized OID but an
unsupported "service" value.  nfs_find_best_sec() now returns
RPC_AUTH_UNIX in this case.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-29 15:43:07 -04:00
Chuck Lever
fb15b26f8b SUNRPC: Define rpcsec_gss_info structure
The NFSv4 SECINFO procedure returns a list of security flavors.  Any
GSS flavor also has a GSS tuple containing an OID, a quality-of-
protection value, and a service value, which specifies a particular
GSS pseudoflavor.

For simplicity and efficiency, I'd like to return each GSS tuple
from the NFSv4 SECINFO XDR decoder and pass it straight into the RPC
client.

Define a data structure that is visible to both the NFS client and
the RPC client.  Take structure and field names from the relevant
standards to avoid confusion.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-29 15:42:56 -04:00
Trond Myklebust
809b426c7f NFSv4: Fix Oopses in the fs_locations code
If the server sends us a pathname with more components than the client
limit of NFS4_PATHNAME_MAXCOMPONENTS, more server entries than the client
limit of NFS4_FS_LOCATION_MAXSERVERS, or sends a total number of
fs_locations entries than the client limit of NFS4_FS_LOCATIONS_MAXENTRIES
then we will currently Oops because the limit checks are done _after_ we've
decoded the data into the arrays.

Reported-by: fanchaoting<fanchaoting@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-28 16:22:17 -04:00
Trond Myklebust
91876b13b8 NFSv4: Fix another reboot recovery race
If the open_context for the file is not yet fully initialised,
then open recovery cannot succeed, and since nfs4_state_find_open_context
returns an ENOENT, we end up treating the file as being irrecoverable.

What we really want to do, is just defer the recovery until later.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-28 16:22:16 -04:00
Trond Myklebust
6e3cf24152 NFSv4: Add a mapping for NFS4ERR_FILE_OPEN in nfs4_map_errors
With unlink is an asynchronous operation in the sillyrename case, it
expects nfs4_async_handle_error() to map the error correctly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-27 12:44:40 -04:00
Linus Torvalds
5d538483ea NFS client bugfixes for Linux 3.9
- Fix an NFSv4 idmapper regression
 - Fix an Oops in the pNFS blocks client
 - Fix up various issues with pNFS layoutcommit
 - Ensure correct read ordering of variables in rpc_wake_up_task_queue_locked
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRUedyAAoJEGcL54qWCgDyar0P/2pTT/yxX8ejTu5DmY7e4PYJ
 jhPG2AEqY/yMLn9GvB375VIs1L8tuY50+3NFhWZFjyNbEU3GV+5Y+kPpBtAgYiSI
 VyIXiJ/xMtXdYJMYuE/nh5jbcqJsHwGjpcIaSd5BuWzQUaoUYvLulxWd4QN8mmaT
 5SuzmgV+7WIqV6RjlaYF82srcOKAjwemcrfRkCNzzJr6aT39gH2YdYFbDaTr7qhU
 fw0x3QlI7887vSNQcfaGbC1+jr6oe8wRCneOR0tceU/8bcj6zlUDk5HxqSOc28mA
 jUQieoVRggcM4s5DFpNcuwW6qCPZOmzv/OFD6oqnhyyonPOrue+7zaoujZmGNmjx
 dT2V/jQehanYD25WpDO8OyFXUeYE4x9bgHKsszhBTwr4x5D8ceEJ1sugcOPiTTxu
 tflbbuWbt+BguvXp4p8QayUj0V2cplM/nOovWyUG+BH46sz3Dtv46NOgJeO2a29g
 T6jayxmKCxvtPKtG0j34BzLngiKabZTSEhFms6Qarp9lwWvHWrR9KWGuDBNvy1Ts
 GMBN8P6Ib40yVi6Pwlj5Jpy6yLKVklHtJQpactr63AZmYrF4bBBSom+MWAh3X1iO
 QtF0x9Z1bBkXY2Q/u+3vWMxQtEPeW+pSiloj8aiceFAt33zKM+1bLofDhEw0s2fI
 wJEHYsGyGtDQINgP0v1e
 =OPbZ
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Fix an NFSv4 idmapper regression
 - Fix an Oops in the pNFS blocks client
 - Fix up various issues with pNFS layoutcommit
 - Ensure correct read ordering of variables in
   rpc_wake_up_task_queue_locked

* tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_locked
  NFSv4.1: Add a helper pnfs_commit_and_return_layout
  NFSv4.1: Always clear the NFS_INO_LAYOUTCOMMIT in layoutreturn
  NFSv4.1: Fix a race in pNFS layoutcommit
  pnfs-block: removing DM device maybe cause oops when call dev_remove
  NFSv4: Fix the string length returned by the idmapper
2013-03-26 14:23:45 -07:00
Trond Myklebust
ccb46e2063 NFSv4.1: Use CLAIM_DELEG_CUR_FH opens when available
Now that we do CLAIM_FH opens, we may run into situations where we
get a delegation but don't have perfect knowledge of the file path.
When returning the delegation, we might therefore not be able to
us CLAIM_DELEGATE_CUR opens to convert the delegation into OPEN
stateids and locks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-25 12:04:11 -04:00
Trond Myklebust
49f9a0fafd NFSv4.1: Enable open-by-filehandle
Sometimes, we actually _want_ to do open-by-filehandle, for instance
when recovering opens after a network partition, or when called
from nfs4_file_open.
Enable that functionality using a new capability NFS_CAP_ATOMIC_OPEN_V1,
and which is only enabled for NFSv4.1 servers that support it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-25 12:04:11 -04:00
Trond Myklebust
d9fc6619ca NFSv4.1: Add xdr support for CLAIM_FH and CLAIM_DELEG_CUR_FH opens
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-25 12:04:11 -04:00
Trond Myklebust
4a1c089345 NFSv4: Clean up nfs4_opendata_alloc in preparation for NFSv4.1 open modes
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-25 12:04:11 -04:00
Trond Myklebust
3b66486c4c NFSv4.1: Select the "most recent locking state" for read/write/setattr stateids
Follow the practice described in section 8.2.2 of RFC5661: When sending a
read/write or setattr stateid, set the seqid field to zero in order to
signal that the NFS server should apply the most recent locking state.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-25 12:04:11 -04:00
Trond Myklebust
39c6daae70 NFSv4: Prepare for minorversion-specific nfs_server capabilities
Clean up the setting of the nfs_server->caps, by shoving it all
into nfs4_server_common_setup().
Then add an 'initial capabilities' field into struct nfs4_minor_version_ops.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-25 12:04:11 -04:00
Trond Myklebust
5521abfdcf NFSv4: Resend the READ/WRITE RPC call if a stateid change causes an error
Adds logic to ensure that if the server returns a BAD_STATEID,
or other state related error, then we check if the stateid has
already changed. If it has, then rather than start state recovery,
we should just resend the failed RPC call with the new stateid.

Allow nfs4_select_rw_stateid to notify that the stateid is unstable by
having it return -EWOULDBLOCK if an RPC is underway that might change the
stateid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-25 12:04:10 -04:00
Trond Myklebust
9b20614988 NFSv4: The stateid must remain the same for replayed RPC calls
If we replay a READ or WRITE call, we should not be changing the
stateid. Currently, we may end up doing so, because the stateid
is only selected at xdr encode time.

This patch ensures that we select the stateid after we get an NFSv4.1
session slot, and that we keep that same stateid across retries.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-25 12:04:10 -04:00
Trond Myklebust
8c86899f62 NFS: __nfs_find_lock_context needs to check ctx->lock_context for a match too
Currently, we're forcing an unnecessary duplication of the
initial nfs_lock_context in calls to nfs_get_lock_context, since
__nfs_find_lock_context ignores the ctx->lock_context.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-25 12:04:10 -04:00
Trond Myklebust
c58c844187 NFS: Don't accept more reads/writes if the open context recovery failed
If the state recovery failed, we want to ensure that the application
doesn't try to use the same file descriptor for more reads or writes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-25 12:04:10 -04:00
Trond Myklebust
5d422301f9 NFSv4: Fail I/O if the state recovery fails irrevocably
If state recovery fails with an ESTALE or a ENOENT, then we shouldn't
keep retrying. Instead, mark the stateid as being invalid and
fail the I/O with an EIO error.
For other operations such as POSIX and BSD file locking, truncate
etc, fail with an EBADF to indicate that this file descriptor is no
longer valid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-25 12:04:10 -04:00
Trond Myklebust
240286725d NFSv4.1: Add a helper pnfs_commit_and_return_layout
In order to be able to safely return the layout in nfs4_proc_setattr,
we need to block new uses of the layout, wait for all outstanding
users of the layout to complete, commit the layout and then return it.

This patch adds a helper in order to do all this safely.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
2013-03-21 10:31:21 -04:00
Trond Myklebust
2495680434 NFSv4.1: Always clear the NFS_INO_LAYOUTCOMMIT in layoutreturn
Note that clearing NFS_INO_LAYOUTCOMMIT is tricky, since it requires
you to also clear the NFS_LSEG_LAYOUTCOMMIT bits from the layout
segments.
The only two sites that need to do this are the ones that call
pnfs_return_layout() without first doing a layout commit.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@vger.kernel.org
2013-03-21 10:31:21 -04:00
Trond Myklebust
a073dbff35 NFSv4.1: Fix a race in pNFS layoutcommit
We need to clear the NFS_LSEG_LAYOUTCOMMIT bits atomically with the
NFS_INO_LAYOUTCOMMIT bit, otherwise we may end up with situations
where the two are out of sync.
The first half of the problem is to ensure that pnfs_layoutcommit_inode
clears the NFS_LSEG_LAYOUTCOMMIT bit through pnfs_list_write_lseg.
We still need to keep the reference to those segments until the RPC call
is finished, so in order to make it clear _where_ those references come
from, we add a helper pnfs_list_write_lseg_done() that cleans up after
pnfs_list_write_lseg.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@vger.kernel.org
2013-03-21 10:31:19 -04:00
fanchaoting
4376c94618 pnfs-block: removing DM device maybe cause oops when call dev_remove
when pnfs block using device mapper,if umounting later,it maybe
cause oops. we apply "1 + sizeof(bl_umount_request)" memory for
msg->data, the memory maybe overflow when we do "memcpy(&dataptr
[sizeof(bl_msg)], &bl_umount_request, sizeof(bl_umount_request))",
because the size of bl_msg is more than 1 byte.

Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-21 10:11:06 -04:00
Trond Myklebust
cf4ab538f1 NFSv4: Fix the string length returned by the idmapper
Functions like nfs_map_uid_to_name() and nfs_map_gid_to_group() are
expected to return a string without any terminating NUL character.
Regression introduced by commit 57e62324e4
(NFS: Store the legacy idmapper result in the keyring).

Reported-by: Dave Chiluk <dave.chiluk@canonical.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
Cc: stable@vger.kernel.org [>=3.4]
2013-03-20 16:45:16 -04:00
Eric W. Biederman
fa7614ddd6 fs: Readd the fs module aliases.
I had assumed that the only use of module aliases for filesystems
prior to "fs: Limit sys_mount to only request filesystem modules."
was in request_module.  It turns out I was wrong.  At least mkinitcpio
in Arch linux uses these aliases.

So readd the preexising aliases, to keep from breaking userspace.

Userspace eventually will have to follow and use the same aliases the
kernel does.  So at some point we may be delete these aliases without
problems.  However that day is not today.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-12 18:55:21 -07:00
Eric W. Biederman
7f78e03513 fs: Limit sys_mount to only request filesystem modules.
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives.  Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work.  While writing this patch I saw a handful of such
cases.  The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module.  The common pattern in the kernel is to call request_module()
without regards to the users permissions.  In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted.  In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-03 19:36:31 -08:00
Linus Torvalds
8d05b3771d NFS client bugfixes for Linux 3.9
- Don't allow NFS silly-renamed files to be deleted
 - Don't start the retransmission timer when out of socket space
 - Fix a couple of pnfs-related Oopses.
 - Fix one more NFSv4 state recovery deadlock
 - Don't loop forever when LAYOUTGET returns NFS4ERR_LAYOUTTRYLATER
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRMpMhAAoJEGcL54qWCgDy4BMP/0Zl7Ei7x9bJSb1C1lpPSo5p
 Lr9XoHLYqhPcAwKUXQfgM5IkC69bE62bD5esmdDqkgZYqnmGE0E4LG6MsbsMmvzk
 yug5WOxmjOFee7Bdpd8B86Z0qsa4l2TkQu2h9G3zE36P2rPKQaNzpteIjhis5UEQ
 EfNyLoBdFcuUSh4ztMVZOzbAyDcbNfsyl03XVmlv+Qn/o0l42Zjth0qwOP60bjuM
 zJF1CkHi5NLbXEhmOev9mA6UYz6zWRbiA/Yu92pomtXVDtOtzWpUniBIcf/S1ZH/
 V8Gj6bWdHHyFCa2PjhY1/QdLBOPRPdxpAAJk+q48AKmzyiOU6g3lIHBp5ai1WZNI
 1C+SYxABE/EJgq9SoQYGqq6SUiolrFulqnFHXF0jHF+ifdjoHjSRmpGQAoyoZ0k1
 aSl+Ojqx7QHibJd8GZBavWc3upRDzhHDRRB3tkQCENi+hryBZxEyeS2Z54NmBRUN
 tsOuyac6rtknZdD8Do4DMt9uc9u1DWicaiZbLfkP1VL1Angh6NKSA7qbmH6giLBS
 9Y+DPcIk5e34uKQ21WTxFydGD+SMg0EMnOmfr6EYXWEHBhKNYVR+cHyH0mAF6RzX
 enU2g0H2m+3vUQqajPUP0DV/eLGtdsvWvMjiskc3KX90CWfHmV2C8GFSxjV2OkT1
 vG1KFrICO6DR2943Udit
 =FMtb
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "We've just concluded another Connectathon interoperability testing
  week, and so here are the fixes for the bugs that were discovered:

   - Don't allow NFS silly-renamed files to be deleted
   - Don't start the retransmission timer when out of socket space
   - Fix a couple of pnfs-related Oopses.
   - Fix one more NFSv4 state recovery deadlock
   - Don't loop forever when LAYOUTGET returns NFS4ERR_LAYOUTTRYLATER"

* tag 'nfs-for-3.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: One line comment fix
  NFSv4.1: LAYOUTGET EDELAY loops timeout to the MDS
  SUNRPC: add call to get configured timeout
  PNFS: set the default DS timeout to 60 seconds
  NFSv4: Fix another open/open_recovery deadlock
  nfs: don't allow nfs_find_actor to match inodes of the wrong type
  NFSv4.1: Hold reference to layout hdr in layoutget
  pnfs: fix resend_to_mds for directio
  SUNRPC: Don't start the retransmission timer when out of socket space
  NFS: Don't allow NFS silly-renamed files to be deleted, no signal
2013-03-02 16:46:07 -08:00
Linus Torvalds
b6669737d3 Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from J Bruce Fields:
 "Miscellaneous bugfixes, plus:

   - An overhaul of the DRC cache by Jeff Layton.  The main effect is
     just to make it larger.  This decreases the chances of intermittent
     errors especially in the UDP case.  But we'll need to watch for any
     reports of performance regressions.

   - Containerized nfsd: with some limitations, we now support
     per-container nfs-service, thanks to extensive work from Stanislav
     Kinsbursky over the last year."

Some notes about conflicts, since there were *two* non-data semantic
conflicts here:

 - idr_remove_all() had been added by a memory leak fix, but has since
   become deprecated since idr_destroy() does it for us now.

 - xs_local_connect() had been added by this branch to make AF_LOCAL
   connections be synchronous, but in the meantime Trond had changed the
   calling convention in order to avoid a RCU dereference.

There were a couple of more obvious actual source-level conflicts due to
the hlist traversal changes and one just due to code changes next to
each other, but those were trivial.

* 'for-3.9' of git://linux-nfs.org/~bfields/linux: (49 commits)
  SUNRPC: make AF_LOCAL connect synchronous
  nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum
  svcrpc: fix rpc server shutdown races
  svcrpc: make svc_age_temp_xprts enqueue under sv_lock
  lockd: nlmclnt_reclaim(): avoid stack overflow
  nfsd: enable NFSv4 state in containers
  nfsd: disable usermode helper client tracker in container
  nfsd: use proper net while reading "exports" file
  nfsd: containerize NFSd filesystem
  nfsd: fix comments on nfsd_cache_lookup
  SUNRPC: move cache_detail->cache_request callback call to cache_read()
  SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function
  SUNRPC: rework cache upcall logic
  SUNRPC: introduce cache_detail->cache_request callback
  NFS: simplify and clean cache library
  NFS: use SUNRPC cache creation and destruction helper for DNS cache
  nfsd4: free_stid can be static
  nfsd: keep a checksum of the first 256 bytes of request
  sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer
  sunrpc: fix comment in struct xdr_buf definition
  ...
2013-02-28 18:02:55 -08:00
Weston Andros Adamson
3000512137 NFSv4.1: LAYOUTGET EDELAY loops timeout to the MDS
The client will currently try LAYOUTGETs forever if a server is returning
NFS4ERR_LAYOUTTRYLATER or NFS4ERR_RECALLCONFLICT - even if the client no
longer needs the layout (ie process killed, unmounted).

This patch uses the DS timeout value (module parameter 'dataserver_timeo'
via rpc layer) to set an upper limit of how long the client tries LATOUTGETs
in this situation.  Once the timeout is reached, IO is redirected to the MDS.

This also changes how the client checks if a layout is on the clp list
to avoid a double list_add.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-28 17:41:35 -08:00
Weston Andros Adamson
eb97cbb459 PNFS: set the default DS timeout to 60 seconds
The client should have 60 second default timeouts for DS operations, not 6
seconds.

NFS4_DEF_DS_TIMEO is used as "timeout in tenths of a second" in
nfs_init_timeout_values (and is not used anywhere else).
This matches up with the description of the module param dataserver_timeo.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-28 17:35:00 -08:00
Trond Myklebust
7aa262b522 NFSv4: Fix another open/open_recovery deadlock
If we don't release the open seqid before we wait for state recovery,
then we may end up deadlocking the state recovery thread.
This patch addresses a new deadlock that was introduced by
commit c21443c2c7 (NFSv4: Fix a reboot
recovery race when opening a file)

Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-28 16:19:59 -08:00
Sasha Levin
b67bfe0d42 hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:24 -08:00
Tejun Heo
d687031265 nfs4client: convert to idr_alloc()
Convert to the much saner new idr interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:20 -08:00
Tejun Heo
d97bec801d nfs: idr_destroy() no longer needs idr_remove_all()
idr_destroy() can destroy idr by itself and idr_remove_all() is being
deprecated.  Drop reference to idr_remove_all().  Note that the code
wasn't completely correct before because idr_remove() on all entries
doesn't necessarily release all idr_layers which could lead to memory
leak.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:14 -08:00
Jeff Layton
f6488c9ba5 nfs: don't allow nfs_find_actor to match inodes of the wrong type
Benny Halevy reported the following oops when testing RHEL6:

<7>nfs_update_inode: inode 892950 mode changed, 0040755 to 0100644
<1>BUG: unable to handle kernel NULL pointer dereference at (null)
<1>IP: [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
<4>PGD 81448a067 PUD 831632067 PMD 0
<4>Oops: 0000 [#1] SMP
<4>last sysfs file: /sys/kernel/mm/redhat_transparent_hugepage/enabled
<4>CPU 6
<4>Modules linked in: fuse bonding 8021q garp ebtable_nat ebtables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi softdog bridge stp llc xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_round_robin dm_multipath objlayoutdriver2(U) nfs(U) lockd fscache auth_rpcgss nfs_acl sunrpc vhost_net macvtap macvlan tun kvm_intel kvm be2net igb dca ptp pps_core microcode serio_raw sg iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>
<4>Pid: 6332, comm: dd Not tainted 2.6.32-358.el6.x86_64 #1 HP ProLiant DL170e G6  /ProLiant DL170e G6
<4>RIP: 0010:[<ffffffffa02a52c5>]  [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
<4>RSP: 0018:ffff88081458bb98  EFLAGS: 00010292
<4>RAX: ffffffffa02a52b0 RBX: 0000000000000000 RCX: 0000000000000003
<4>RDX: ffffffffa02e45a0 RSI: ffff88081440b300 RDI: ffff88082d5f5760
<4>RBP: ffff88081458bba8 R08: 0000000000000000 R09: 0000000000000000
<4>R10: 0000000000000772 R11: 0000000000400004 R12: 0000000040000008
<4>R13: ffff88082d5f5760 R14: ffff88082d6e8800 R15: ffff88082f12d780
<4>FS:  00007f728f37e700(0000) GS:ffff8800456c0000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>CR2: 0000000000000000 CR3: 0000000831279000 CR4: 00000000000007e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process dd (pid: 6332, threadinfo ffff88081458a000, task ffff88082fa0e040)
<4>Stack:
<4> 0000000040000008 ffff88081440b300 ffff88081458bbf8 ffffffff81182745
<4><d> ffff88082d5f5760 ffff88082d6e8800 ffff88081458bbf8 ffffffffffffffea
<4><d> ffff88082f12d780 ffff88082d6e8800 ffffffffa02a50a0 ffff88082d5f5760
<4>Call Trace:
<4> [<ffffffff81182745>] __fput+0xf5/0x210
<4> [<ffffffffa02a50a0>] ? do_open+0x0/0x20 [nfs]
<4> [<ffffffff81182885>] fput+0x25/0x30
<4> [<ffffffff8117e23e>] __dentry_open+0x27e/0x360
<4> [<ffffffff811c397a>] ? inotify_d_instantiate+0x2a/0x60
<4> [<ffffffff8117e4b9>] lookup_instantiate_filp+0x69/0x90
<4> [<ffffffffa02a6679>] nfs_intent_set_file+0x59/0x90 [nfs]
<4> [<ffffffffa02a686b>] nfs_atomic_lookup+0x1bb/0x310 [nfs]
<4> [<ffffffff8118e0c2>] __lookup_hash+0x102/0x160
<4> [<ffffffff81225052>] ? selinux_inode_permission+0x72/0xb0
<4> [<ffffffff8118e76a>] lookup_hash+0x3a/0x50
<4> [<ffffffff81192a4b>] do_filp_open+0x2eb/0xdd0
<4> [<ffffffff8104757c>] ? __do_page_fault+0x1ec/0x480
<4> [<ffffffff8119f562>] ? alloc_fd+0x92/0x160
<4> [<ffffffff8117de79>] do_sys_open+0x69/0x140
<4> [<ffffffff811811f6>] ? sys_lseek+0x66/0x80
<4> [<ffffffff8117df90>] sys_open+0x20/0x30
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>Code: 65 48 8b 04 25 c8 cb 00 00 83 a8 44 e0 ff ff 01 5b 41 5c c9 c3 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 9e a0 00 00 00 <48> 8b 3b e8 13 0c f7 ff 48 89 df e8 ab 3d ec e0 48 83 c4 08 31
<1>RIP  [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
<4> RSP <ffff88081458bb98>
<4>CR2: 0000000000000000

I think this is ultimately due to a bug on the server. The client had
previously found a directory dentry. It then later tried to do an atomic
open on a new (regular file) dentry. The attributes it got back had the
same filehandle as the previously found directory inode. It then tried
to put the filp because it failed the aops tests for O_DIRECT opens, and
oopsed here because the ctx was still NULL.

Obviously the root cause here is a server issue, but we can take steps
to mitigate this on the client. When nfs_fhget is called, we always know
what type of inode it is. In the event that there's a broken or
malicious server on the other end of the wire, the client can end up
crashing because the wrong ops are set on it.

Have nfs_find_actor check that the inode type is correct after checking
the fileid. The fileid check should rarely ever match, so it should only
rarely ever get to this check. In the case where we have a broken
server, we may see two different inodes with the same i_ino, but the
client should be able to cope with them without crashing.

This should fix the oops reported here:

    https://bugzilla.redhat.com/show_bug.cgi?id=913660

Reported-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-27 17:28:20 -08:00
Linus Torvalds
d895cb1af1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has ->d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
2013-02-26 20:16:07 -08:00
Jeff Layton
ecf3d1f1aa vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
The following set of operations on a NFS client and server will cause

    server# mkdir a
    client# cd a
    server# mv a a.bak
    client# sleep 30  # (or whatever the dir attrcache timeout is)
    client# stat .
    stat: cannot stat `.': Stale NFS file handle

Obviously, we should not be getting an ESTALE error back there since the
inode still exists on the server. The problem is that the lookup code
will call d_revalidate on the dentry that "." refers to, because NFS has
FS_REVAL_DOT set.

nfs_lookup_revalidate will see that the parent directory has changed and
will try to reverify the dentry by redoing a LOOKUP. That of course
fails, so the lookup code returns ESTALE.

The problem here is that d_revalidate is really a bad fit for this case.
What we really want to know at this point is whether the inode is still
good or not, but we don't really care what name it goes by or whether
the dcache is still valid.

Add a new d_op->d_weak_revalidate operation and have complete_walk call
that instead of d_revalidate. The intent there is to allow for a
"weaker" d_revalidate that just checks to see whether the inode is still
good. This is also gives us an opportunity to kill off the FS_REVAL_DOT
special casing.

[AV: changed method name, added note in porting, fixed confusion re
having it possibly called from RCU mode (it won't be)]

Cc: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:09 -05:00
Weston Andros Adamson
a47970ff78 NFSv4.1: Hold reference to layout hdr in layoutget
This fixes an oops where a LAYOUTGET is in still in the rpciod queue,
but the requesting processes has been killed.  Without this, killing
the process does the final pnfs_put_layout_hdr() and sets NFS_I(inode)->layout
to NULL while the LAYOUTGET rpc task still references it.

Example oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
IP: [<ffffffffa01bd586>] pnfs_choose_layoutget_stateid+0x37/0xef [nfsv4]
PGD 7365b067 PUD 7365d067 PMD 0
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: nfs_layout_nfsv41_files nfsv4 auth_rpcgss nfs lockd sunrpc ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ip6table_filter ip6_tables ppdev e1000 i2c_piix4 i2c_core shpchp parport_pc parport crc32c_intel aesni_intel xts aes_x86_64 lrw gf128mul ablk_helper cryptd mptspi scsi_transport_spi mptscsih mptbase floppy autofs4
CPU 0
Pid: 27, comm: kworker/0:1 Not tainted 3.8.0-dros_cthon2013+ #4 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
RIP: 0010:[<ffffffffa01bd586>]  [<ffffffffa01bd586>] pnfs_choose_layoutget_stateid+0x37/0xef [nfsv4]
RSP: 0018:ffff88007b0c1c88  EFLAGS: 00010246
RAX: ffff88006ed36678 RBX: 0000000000000000 RCX: 0000000ea877e3bc
RDX: ffff88007a729da8 RSI: 0000000000000000 RDI: ffff88007a72b958
RBP: ffff88007b0c1ca8 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007a72b958
R13: ffff88007a729da8 R14: 0000000000000000 R15: ffffffffa011077e
FS:  0000000000000000(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000080 CR3: 00000000735f8000 CR4: 00000000001407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:1 (pid: 27, threadinfo ffff88007b0c0000, task ffff88007c2fa0c0)
Stack:
 ffff88006fc05388 ffff88007a72b908 ffff88007b240900 ffff88006fc05388
 ffff88007b0c1cd8 ffffffffa01a2170 ffff88007b240900 ffff88007b240900
 ffff88007b240970 ffffffffa011077e ffff88007b0c1ce8 ffffffffa0110791
Call Trace:
 [<ffffffffa01a2170>] nfs4_layoutget_prepare+0x7b/0x92 [nfsv4]
 [<ffffffffa011077e>] ? __rpc_atrun+0x15/0x15 [sunrpc]
 [<ffffffffa0110791>] rpc_prepare_task+0x13/0x15 [sunrpc]

Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-25 18:32:59 -08:00
Linus Torvalds
94f2f14234 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace and namespace infrastructure changes from Eric W Biederman:
 "This set of changes starts with a few small enhnacements to the user
  namespace.  reboot support, allowing more arbitrary mappings, and
  support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
  user namespace root.

  I do my best to document that if you care about limiting your
  unprivileged users that when you have the user namespace support
  enabled you will need to enable memory control groups.

  There is a minor bug fix to prevent overflowing the stack if someone
  creates way too many user namespaces.

  The bulk of the changes are a continuation of the kuid/kgid push down
  work through the filesystems.  These changes make using uids and gids
  typesafe which ensures that these filesystems are safe to use when
  multiple user namespaces are in use.  The filesystems converted for
  3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs.  The
  changes for these filesystems were a little more involved so I split
  the changes into smaller hopefully obviously correct changes.

  XFS is the only filesystem that remains.  I was hoping I could get
  that in this release so that user namespace support would be enabled
  with an allyesconfig or an allmodconfig but it looks like the xfs
  changes need another couple of days before it they are ready."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
  cifs: Enable building with user namespaces enabled.
  cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
  cifs: Convert struct cifs_sb_info to use kuids and kgids
  cifs: Modify struct smb_vol to use kuids and kgids
  cifs: Convert struct cifsFileInfo to use a kuid
  cifs: Convert struct cifs_fattr to use kuid and kgids
  cifs: Convert struct tcon_link to use a kuid.
  cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
  cifs: Convert from a kuid before printing current_fsuid
  cifs: Use kuids and kgids SID to uid/gid mapping
  cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
  cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
  cifs: Override unmappable incoming uids and gids
  nfsd: Enable building with user namespaces enabled.
  nfsd: Properly compare and initialize kuids and kgids
  nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
  nfsd: Modify nfsd4_cb_sec to use kuids and kgids
  nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
  nfsd: Convert nfsxdr to use kuids and kgids
  nfsd: Convert nfs3xdr to use kuids and kgids
  ...
2013-02-25 16:00:49 -08:00
Benny Halevy
78f33277f9 pnfs: fix resend_to_mds for directio
Pass the directio request on pageio_init to clean up the API.

Percolate pg_dreq from original nfs_pageio_descriptor to the
pnfs_{read,write}_done_resend_to_mds and use it on respective
call to nfs_pageio_init_{read,write} on the newly created
nfs_pageio_descriptor.

Reproduced by command:
 mount -o vers=4.1 server:/ /mnt
 dd bs=128k count=8 if=/dev/zero of=/mnt/dd.out oflag=direct

BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
IP: [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs]
PGD 34786067 PUD 34794067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: nfs_layout_nfsv41_files nfsv4 nfs nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc btrfs zlib_deflate libcrc32c ipv6 autofs4
CPU 1
Pid: 259, comm: kworker/1:2 Not tainted 3.8.0-rc6 #2 Bochs Bochs
RIP: 0010:[<ffffffffa021a3a8>]  [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs]
RSP: 0018:ffff880038f8fa68  EFLAGS: 00010206
RAX: ffffffffa021a6a9 RBX: ffff880038f8fb48 RCX: 00000000000a0000
RDX: ffffffffa021e616 RSI: ffff8800385e9a40 RDI: 0000000000000028
RBP: ffff880038f8fa68 R08: ffffffff81ad6720 R09: ffff8800385e9510
R10: ffffffffa0228450 R11: ffff880038e87418 R12: ffff8800385e9a40
R13: ffff8800385e9a70 R14: ffff880038f8fb38 R15: ffffffffa0148878
FS:  0000000000000000(0000) GS:ffff88003e400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000028 CR3: 0000000034789000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/1:2 (pid: 259, threadinfo ffff880038f8e000, task ffff880038302480)
Stack:
 ffff880038f8fa78 ffffffffa021a6bf ffff880038f8fa88 ffffffffa021bb82
 ffff880038f8fae8 ffffffffa021f454 ffff880038f8fae8 ffffffff8109689d
 ffff880038f8fab8 ffffffff00000006 0000000000000000 ffff880038f8fb48
Call Trace:
 [<ffffffffa021a6bf>] nfs_direct_pgio_init+0x16/0x18 [nfs]
 [<ffffffffa021bb82>] nfs_pgheader_init+0x6a/0x6c [nfs]
 [<ffffffffa021f454>] nfs_generic_pg_writepages+0x51/0xf8 [nfs]
 [<ffffffff8109689d>] ? mark_held_locks+0x71/0x99
 [<ffffffffa0148878>] ? rpc_release_resources_task+0x37/0x37 [sunrpc]
 [<ffffffffa021bc25>] nfs_pageio_doio+0x1a/0x43 [nfs]
 [<ffffffffa021be7c>] nfs_pageio_complete+0x16/0x2c [nfs]
 [<ffffffffa02608be>] pnfs_write_done_resend_to_mds+0x95/0xc5 [nfsv4]
 [<ffffffffa0148878>] ? rpc_release_resources_task+0x37/0x37 [sunrpc]
 [<ffffffffa028e27f>] filelayout_reset_write+0x8c/0x99 [nfs_layout_nfsv41_files]
 [<ffffffffa028e5f9>] filelayout_write_done_cb+0x4d/0xc1 [nfs_layout_nfsv41_files]
 [<ffffffffa024587a>] nfs4_write_done+0x36/0x49 [nfsv4]
 [<ffffffffa021f996>] nfs_writeback_done+0x53/0x1cc [nfs]
 [<ffffffffa021fb1d>] nfs_writeback_done_common+0xe/0x10 [nfs]
 [<ffffffffa028e03d>] filelayout_write_call_done+0x28/0x2a [nfs_layout_nfsv41_files]
 [<ffffffffa01488a1>] rpc_exit_task+0x29/0x87 [sunrpc]
 [<ffffffffa014a0c9>] __rpc_execute+0x11d/0x3cc [sunrpc]
 [<ffffffff810969dc>] ? trace_hardirqs_on_caller+0x117/0x173
 [<ffffffffa014a39f>] rpc_async_schedule+0x27/0x32 [sunrpc]
 [<ffffffffa014a378>] ? __rpc_execute+0x3cc/0x3cc [sunrpc]
 [<ffffffff8105f8c1>] process_one_work+0x226/0x422
 [<ffffffff8105f7f4>] ? process_one_work+0x159/0x422
 [<ffffffff81094757>] ? lock_acquired+0x210/0x249
 [<ffffffffa014a378>] ? __rpc_execute+0x3cc/0x3cc [sunrpc]
 [<ffffffff810600d8>] worker_thread+0x126/0x1c4
 [<ffffffff8105ffb2>] ? manage_workers+0x240/0x240
 [<ffffffff81064ef8>] kthread+0xb1/0xb9
 [<ffffffff81064e47>] ? __kthread_parkme+0x65/0x65
 [<ffffffff815206ec>] ret_from_fork+0x7c/0xb0
 [<ffffffff81064e47>] ? __kthread_parkme+0x65/0x65
Code: 00 83 38 02 74 12 48 81 4b 50 00 00 01 00 c7 83 60 07 00 00 01 00 00 00 48 89 df e8 55 fe ff ff 5b 41 5c 5d c3 66 90 55 48 89 e5 <f0> ff 07 5d c3 55 48 89 e5 f0 ff 0f 0f 94 c0 84 c0 0f 95 c0 0f
RIP  [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs]
 RSP <ffff880038f8fa68>
CR2: 0000000000000028

Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@kernel.org [>= 3.6]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-24 10:07:36 -05:00
Al Viro
496ad9aa8e new helper: file_inode(file)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:31 -05:00
Trond Myklebust
5a7a613a47 NFS: Don't allow NFS silly-renamed files to be deleted, no signal
Commit 73ca100 broke the code that prevents the client from deleting
a silly renamed dentry.  This affected "delete on last close"
semantics as after that commit, nothing prevented removal of
silly-renamed files.  As a result, a process holding a file open
could easily get an ESTALE on the file in a directory where some
other process issued 'rm -rf some_dir_containing_the_file' twice.
Before the commit, any attempt at unlinking silly renamed files would
fail inside may_delete() with -EBUSY because of the
DCACHE_NFSFS_RENAMED flag.  The following testcase demonstrates
the problem:
  tail -f /nfsmnt/dir/file &
  rm -rf /nfsmnt/dir
  rm -rf /nfsmnt/dir
  # second removal does not fail, 'tail' process receives ESTALE

The problem with the above commit is that it unhashes the old and
new dentries from the lookup path, even in the normal case when
a signal is not encountered and it would have been safe to call
d_move.  Unfortunately the old dentry has the special
DCACHE_NFSFS_RENAMED flag set on it.  Unhashing has the
side-effect that future lookups call d_alloc(), allocating a new
dentry without the special flag for any silly-renamed files.  As a
result, subsequent calls to unlink silly renamed files do not fail
but allow the removal to go through.  This will result in ESTALE
errors for any other process doing operations on the file.

To fix this, go back to using d_move on success.
For the signal case, it's unclear what we may safely do beyond d_drop.

Reported-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org
2013-02-22 14:55:34 -05:00
fanchaoting
5a12cca697 umount oops when remove blocklayoutdriver first
now pnfs client uses block layout, maybe we can remove
blocklayoutdriver first. if we umount later,
it can cause oops in unset_pnfs_layoutdriver.
because nfss->pnfs_curr_ld->clear_layoutdriver is invalid.

reproduce it:
 modprobe  blocklayoutdriver
 mount -t nfs4 -o minorversion=1 pnfsip:/ /mnt/
 rmmod blocklayoutdriver
 umount /mnt

then you can see following

CPU 0
Pid: 17023, comm: umount.nfs4 Tainted: GF          O 3.7.0-rc6-pnfs #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
RIP: 0010:[<ffffffffa04cfe6d>]  [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4]
RSP: 0018:ffff8800022d9e48  EFLAGS: 00010286
RAX: ffffffffa04a1b00 RBX: ffff88000b013800 RCX: 0000000000000001
RDX: ffffffff81ae8ee0 RSI: ffff880001ee94b8 RDI: ffff88000b013800
RBP: ffff8800022d9e58 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880001ee9400
R13: ffff8800105978c0 R14: 00007fff25846c08 R15: 0000000001bba550
FS:  00007f45ae7f0700(0000) GS:ffff880012c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa04a1b38 CR3: 0000000002c0c000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process umount.nfs4 (pid: 17023, threadinfo ffff8800022d8000, task ffff880006e48aa0)
Stack:
ffff8800105978c0 ffff88000b013800 ffff8800022d9e78 ffffffffa04cd0ce
ffff8800022d9e78 ffff88000b013800 ffff8800022d9ea8 ffffffffa04755a7
ffff8800022d9ea8 ffff880002f96400 ffff88000b013800 ffff880002f96400
Call Trace:
[<ffffffffa04cd0ce>] nfs4_destroy_server+0x1e/0x30 [nfsv4]
[<ffffffffa04755a7>] nfs_free_server+0xb7/0x150 [nfs]
[<ffffffffa047d4d5>] nfs_kill_super+0x35/0x40 [nfs]
[<ffffffff81178d35>] deactivate_locked_super+0x45/0x70
[<ffffffff8117986a>] deactivate_super+0x4a/0x70
[<ffffffff81193ee2>] mntput_no_expire+0xd2/0x130
[<ffffffff81194d62>] sys_umount+0x72/0xe0
[<ffffffff8154af59>] system_call_fastpath+0x16/0x1b
Code: 06 e1 b8 ea ff ff ff eb 9e 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 8b 87 80 03 00 00 48 89 fb 48 85 c0 74 29 <48> 8b 40 38 48 85 c0 74 02 ff d0 48 8b 03 3e ff 48 04 0f 94 c2
RIP  [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4]
RSP <ffff8800022d9e48>
CR2: ffffffffa04a1b38
---[ end trace 29f75aaedda058bf ]---

Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-02-17 15:40:15 -05:00
Tim Gardner
96aa1549af nfs: remove kfree() redundant null checks
smatch analysis:

fs/nfs/getroot.c:130 nfs_get_root() info: redundant null
 check on name calling kfree()

fs/nfs/unlink.c:272 nfs_async_unlink() info: redundant null
 check on devname_garbage calling kfree()

Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-17 15:27:21 -05:00
Weston Andros Adamson
085b7a45c6 NFSv4.1: Don't decode skipped layoutgets
layoutget's prepare hook can call rpc_exit with status = NFS4_OK (0).
Because of this, nfs4_proc_layoutget can't depend on a 0 status to mean
that the RPC was successfully sent, received and parsed.

To fix this, use the result's len member to see if parsing took place.

This fixes the following OOPS -- calling xdr_init_decode() with a buffer length
0 doesn't set the stream's 'p' member and ends up using uninitialized memory
in filelayout_decode_layout.

BUG: unable to handle kernel paging request at 0000000000008050
IP: [<ffffffff81282e78>] memcpy+0x18/0x120
PGD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/0000:02:01.0/irq
CPU 1
Modules linked in: nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log dm_mod ppdev parport_pc parport snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 microcode vmware_balloon i2c_piix4 i2c_core sg shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptspi mptscsih mptbase scsi_transport_spi [last unloaded: speedstep_lib]

Pid: 1665, comm: flush-0:22 Not tainted 2.6.32-356-test-2 #2 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
RIP: 0010:[<ffffffff81282e78>]  [<ffffffff81282e78>] memcpy+0x18/0x120
RSP: 0018:ffff88003dfab588  EFLAGS: 00010206
RAX: ffff88003dc42000 RBX: ffff88003dfab610 RCX: 0000000000000009
RDX: 000000003f807ff0 RSI: 0000000000008050 RDI: ffff88003dc42000
RBP: ffff88003dfab5b0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000024
R13: ffff88003dc42000 R14: ffff88003f808030 R15: ffff88003dfab6a0
FS:  0000000000000000(0000) GS:ffff880003420000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000008050 CR3: 000000003bc92000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process flush-0:22 (pid: 1665, threadinfo ffff88003dfaa000, task ffff880037f77540)
Stack:
ffffffffa0398ac1 ffff8800397c5940 ffff88003dfab610 ffff88003dfab6a0
<d> ffff88003dfab5d0 ffff88003dfab680 ffffffffa01c150b ffffea0000d82e70
<d> 000000508116713b 0000000000000000 0000000000000000 0000000000000000
Call Trace:
[<ffffffffa0398ac1>] ? xdr_inline_decode+0xb1/0x120 [sunrpc]
[<ffffffffa01c150b>] filelayout_decode_layout+0xeb/0x350 [nfs_layout_nfsv41_files]
[<ffffffffa01c17fc>] filelayout_alloc_lseg+0x8c/0x3c0 [nfs_layout_nfsv41_files]
[<ffffffff8150e6ce>] ? __wait_on_bit+0x7e/0x90

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-02-17 15:24:16 -05:00
Stanislav Kinsbursky
21cd1254d3 SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function
Passing this pointer is redundant since it's stored on cache_detail structure,
which is also passed to sunrpc_cache_pipe_upcall () function.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-15 10:43:47 -05:00
Stanislav Kinsbursky
73fb847a44 SUNRPC: introduce cache_detail->cache_request callback
This callback will allow to simplify upcalls in further patches in this
series.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-15 10:43:45 -05:00
Stanislav Kinsbursky
462b8f6bf1 NFS: simplify and clean cache library
This is a cleanup patch.
Such helpers like nfs_cache_init() and nfs_cache_destroy() are redundant,
because they are just a wrappers around sunrpc_init_cache_detail() and
sunrpc_destroy_cache_detail() respectively.
So let's remove them completely and move corresponding logic to
nfs_cache_register_net() and nfs_cache_unregister_net() respectively (since
they are called together anyway).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-15 10:43:36 -05:00
Stanislav Kinsbursky
483479c26a NFS: use SUNRPC cache creation and destruction helper for DNS cache
This cache was the first containerized and doesn't use net-aware cache
creation and destruction helpers.
This is a cleanup patch which just makes code looks clearer and reduce amount
of lines of code.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-15 10:43:12 -05:00
Trond Myklebust
fd9a8d7160 NFSv4.1: Fix bulk recall and destroy of layouts
The current code in pnfs_destroy_all_layouts() assumes that removing
the layout from the server->layouts list is sufficient to make it
invisible to other processes. This ignores the fact that most
users access the layout through the nfs_inode->layout...
There is further breakage due to lack of reference counting of the
layouts, meaning that the whole thing Oopses at the drop of a hat.

The code in initiate_bulk_draining() is almost correct, and can be
used as a model for pnfs_destroy_all_layouts(), so move that
code to pnfs.c, and refactor the code to allow us to choose between
a single filesystem bulk recall, and a recall of all layouts.
Also note that initiate_bulk_draining() currently calls iput() while
holding locks. Fix that too.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-02-14 13:22:50 -05:00
Eric W. Biederman
9ff593c473 nfs: kuid and kgid conversions for nfs/inode.c
- Use uid_eq and gid_eq when comparing kuids and kgids.
- Use make_kuid(&init_user_ns, -2) and make_kgid(&init_user_ns, -2) as
  the initial uid and gid on nfs inodes, instead of using the typeunsafe
  value of -2.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:33 -08:00
Eric W. Biederman
e5782076e7 nfs: Convert nfs4xdr to use kuids and kgids
When reading uids and gids off the wire convert them to
kuids and kgids.

When putting kuids and kgids onto the wire first convert
them to uids and gids the other side will understand.

When printing kuids and kgids convert them to values in
the initial user namespace then use normal printf formats.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:32 -08:00
Eric W. Biederman
57a38dae2a nfs: Convert nfs3xdr to use kuids and kgids
When reading uids and gids off the wire convert them to
kuids and kgids.

When putting kuids and kgids onto the wire first convert
them to uids and gids the other side will understand.

Add an additional failure mode incoming for uids or gids
that are invalid.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:31 -08:00
Eric W. Biederman
cfa0898d4f nfs: Convert nfs2xdr to use kuids and kgids
When reading uids and gids off the wire convert them to
kuids and kgids.

When putting kuids and kgids onto the wire first convert
them to uids and gids the other side will understand.

Add an additional failure mode for incoming uid or
gids that are invalid.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:30 -08:00
Eric W. Biederman
9f309c86cf nfs: Convert idmap to use kuids and kgids
Convert nfs_map_name_to_uid to return a kuid_t value.
Convert nfs_map_name_to_gid to return a kgid_t value.
Convert nfs_map_uid_to_name to take a kuid_t paramater.
Convert nfs_map_gid_to_name to take a kgid_t paramater.

Tweak nfs_fattr_map_owner_to_name to use a kuid_t intermediate value.
Tweak nfs_fattr_map_group_to_name to use a kgid_t intermediate value.

Which makes these functions properly handle kuids and kgids, including
erroring of the generated kuid or kgid is invalid.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:29 -08:00
Eric W. Biederman
4e963d4f3e nfs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring alloc
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:27 -08:00
Trond Myklebust
c8da19b986 NFSv4.1: Fix an ABBA locking issue with session and state serialisation
Ensure that if nfs_wait_on_sequence() causes our rpc task to wait for
an NFSv4 state serialisation lock, then we also drop the session slot.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-02-11 19:04:25 -05:00
Trond Myklebust
c21443c2c7 NFSv4: Fix a reboot recovery race when opening a file
If the server reboots after it has replied to our OPEN, but before we
call nfs4_opendata_to_nfs4_state(), then the reboot recovery thread
will not see a stateid for this open, and so will fail to recover it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-11 15:33:14 -05:00
Trond Myklebust
65b62a29f7 NFSv4: Ensure delegation recall and byte range lock removal don't conflict
Add a mutex to the struct nfs4_state_owner to ensure that delegation
recall doesn't conflict with byte range lock removal.

Note that we nest the new mutex _outside_ the state manager reclaim
protection (nfsi->rwsem) in order to avoid deadlocks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-11 15:33:13 -05:00
Trond Myklebust
37380e4264 NFSv4: Fix up the return values of nfs4_open_delegation_recall
Adjust the return values so that they return EAGAIN to the caller in
cases where we might want to retry the delegation recall after
the state recovery has run.
Note that we can't wait and retry in this routine, because the caller
may be the state manager thread.

If delegation recall fails due to a session or reboot related issue,
also ensure that we mark the stateid as delegated so that
nfs_delegation_claim_opens can find it again later.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-11 15:33:13 -05:00
Trond Myklebust
d25be546a8 NFSv4.1: Don't lose locks when a server reboots during delegation return
If the server reboots while we are converting a delegation into
OPEN/LOCK stateids as part of a delegation return, the current code
will simply exit with an error. This causes us to lose both
delegation state and locking state (i.e. locking atomicity).

Deal with this by exposing the delegation stateid during delegation
return, so that we can recover the delegation, and then resume
open/lock recovery.

Note that not having to hold the nfs_inode->rwsem across the
calls to nfs_delegation_claim_opens() also fixes a deadlock against
the NFSv4.1 reboot recovery code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-11 15:33:12 -05:00
Trond Myklebust
9a99af494b NFSv4.1: Prevent deadlocks between state recovery and file locking
We currently have a deadlock in which the state recovery thread
ends up blocking due to one of the locks which it is trying to
recover holding the nfs_inode->rwsem.
The situation is as follows: the state recovery thread is
scheduled in order to recover from a reboot. It immediately
drains the session, forcing all ordinary NFSv4.1 calls to
nfs41_setup_sequence() to be put to sleep.  This includes the
file locking process that holds the nfs_inode->rwsem.
When the thread gets to nfs4_reclaim_locks(), it tries to
grab a write lock on nfs_inode->rwsem, and boom...

Fix is to have the lock drop the nfs_inode->rwsem while it is
doing RPC calls. We use a sequence lock in order to signal to
the locking process whether or not a state recovery thread has
run on that inode, in which case it should retry the lock.

Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-11 15:33:12 -05:00
Trond Myklebust
c137afabe3 NFSv4: Allow the state manager to mark an open_owner as being recovered
This patch adds a seqcount_t lock for use by the state manager to
signal that an open owner has been recovered. This mechanism will be
used by the delegation, open and byte range lock code in order to
figure out if they need to replay requests due to collisions with
lock recovery.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-11 15:33:11 -05:00
Jeff Layton
5976687a2b sunrpc: move address copy/cmp/convert routines and prototypes from clnt.h to addr.h
These routines are used by server and client code, so having them in a
separate header would be best.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-05 09:41:14 -05:00
Trond Myklebust
322b2b9032 Revert "NFS: add nfs_sb_deactive_async to avoid deadlock"
This reverts commit 324d003b0c.

The deadlock turned out to be caused by a workqueue limitation that has
now been worked around in the RPC code (see comment in rpc_free_task).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-01 10:13:48 -05:00
Trond Myklebust
c489ee290b NFSv4.1: Handle NFS4ERR_DELAY when resetting the NFSv4.1 session
NFS4ERR_DELAY is a legal reply when we call DESTROY_SESSION. It
usually means that the server is busy handling an unfinished RPC
request. Just sleep for a second and then retry.
We also need to be able to handle the NFS4ERR_BACK_CHAN_BUSY return
value. If the NFS server has outstanding callbacks, we just want to
similarly sleep & retry.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-01-30 17:45:15 -05:00
Trond Myklebust
ab22541782 NFS: Don't silently fail setattr() requests on mountpoints
Ensure that any setattr and getattr requests for junctions and/or
mountpoints are sent to the server. Ever since commit
0ec26fd069 (vfs: automount should ignore LOOKUP_FOLLOW), we have
silently dropped any setattr requests to a server-side mountpoint.
For referrals, we have silently dropped both getattr and setattr
requests.

This patch restores the original behaviour for setattr on mountpoints,
and tries to do the same for referrals, provided that we have a
filehandle...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-01-30 17:41:04 -05:00
Trond Myklebust
65436ec0c8 NFSv4.1: Ensure that nfs41_walk_client_list() does start lease recovery
We do need to start the lease recovery thread prior to waiting for the
client initialisation to complete in NFSv4.1.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Ben Greear <greearb@candelatech.com>
Cc: stable@vger.kernel.org [>=3.7]
2013-01-27 15:51:41 -05:00
Trond Myklebust
202c312dba NFSv4: Fix NFSv4 trunking discovery
If walking the list in nfs4[01]_walk_client_list fails, then the most
likely explanation is that the server dropped the clientid before we
actually managed to confirm it. As long as our nfs_client is the very
last one in the list to be tested, the caller can be assured that this
is the case when the final return value is NFS4ERR_STALE_CLIENTID.

Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org [>=3.7]
Tested-by: Ben Greear <greearb@candelatech.com>
2013-01-27 15:51:28 -05:00
Trond Myklebust
4ae19c2dd7 NFSv4: Fix NFSv4 reference counting for trunked sessions
The reference counting in nfs4_init_client assumes wongly that it
is safe for nfs4_discover_server_trunking() to return a pointer to a
nfs_client prior to bumping the reference count.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Ben Greear <greearb@candelatech.com>
Cc: stable@vger.kernel.org [>=3.7]
2013-01-27 15:51:15 -05:00
Trond Myklebust
dee972b967 NFS: Fix error reporting in nfs_xdev_mount
Currently, nfs_xdev_mount converts all errors from clone_server() to
ENOMEM, which can then leak to userspace (for instance to 'mount'). Fix that.
Also ensure that if nfs_fs_mount_common() returns an error, we
don't dprintk(0)...

The regression originated in commit 3d176e3fe4
(NFS: Use nfs_fs_mount_common() for xdev mounts)

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.5]
2013-01-27 15:51:15 -05:00
Nickolai Zeldovich
ecf0eb9edb nfs: avoid dereferencing null pointer in initiate_bulk_draining
Fix an inverted null pointer check in initiate_bulk_draining().

Signed-off-by: Nickolai Zeldovich <nickolai@csail.mit.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.7]
2013-01-05 14:26:51 -05:00
Trond Myklebust
6db6dd7d3f NFS: Ensure that we free the rpc_task after read and write cleanups are done
This patch ensures that we free the rpc_task after the cleanup callbacks
are done in order to avoid a deadlock problem that can be triggered if
the callback needs to wait for another workqueue item to complete.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Weston Andros Adamson <dros@netapp.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Bruce Fields <bfields@fieldses.org>
Cc: stable@vger.kernel.org [>= 3.5]
2013-01-04 12:59:10 -05:00
Xi Wang
e25fbe380c nfs: fix null checking in nfs_get_option_str()
The following null pointer check is broken.

	*option = match_strdup(args);
	return !option;

The pointer `option' must be non-null, and thus `!option' is always false.
Use `!*option' instead.

The bug was introduced in commit c5cb09b6f8 ("Cleanup: Factor out some
cut-and-paste code.").

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-01-04 10:54:43 -05:00
Yanchuan Nian
39e88fcfb1 pnfs: Increase the refcount when LAYOUTGET fails the first time
The layout will be set unusable if LAYOUTGET fails. Is it reasonable to
increase the refcount iff LAYOUTGET fails the first time?

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.7]
2013-01-04 10:50:42 -05:00
Weston Andros Adamson
f8d9a897d4 NFS: Fix access to suid/sgid executables
nfs_open_permission_mask() should only check MAY_EXEC for files that
are opened with __FMODE_EXEC.

Also fix NFSv4 access-in-open path in a similar way -- openflags must be
used because fmode will not always have FMODE_EXEC set.

This patch fixes https://bugzilla.kernel.org/show_bug.cgi?id=49101

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-01-03 17:06:27 -05:00
Trond Myklebust
c4271c6e37 NFS: Kill fscache warnings when mounting without -ofsc
The fscache code will currently bleat a "non-unique superblock keys"
warning even if the user is mounting without the 'fsc' option.

There should be no reason to even initialise the superblock cache cookie
unless we're planning on using fscache for something, so ensure that we
check for the NFS_OPTION_FSCACHE flag before calling into the fscache
code.

Reported-by: Paweł Sikora <pawel.sikora@agmk.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-21 08:32:09 -08:00
David Howells
c129c29347 NFS: Provide stub nfs_fscache_wait_on_invalidate() for when CONFIG_NFS_FSCACHE=n
Provide a stub nfs_fscache_wait_on_invalidate() function for when
CONFIG_NFS_FSCACHE=n lest the following error appear:

  fs/nfs/inode.c: In function 'nfs_invalidate_mapping':
  fs/nfs/inode.c:887:2: error: implicit declaration of function 'nfs_fscache_wait_on_invalidate' [-Werror=implicit-function-declaration]
  cc1: some warnings being treated as errors

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-21 08:06:48 -08:00
David Howells
a4ff146881 NFS4: Open files for fscaching
nfs4_file_open() should open files for fscaching.

Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-20 22:19:42 +00:00
David Howells
8c209ce721 NFS: nfs_migrate_page() does not wait for FS-Cache to finish with a page
nfs_migrate_page() does not wait for FS-Cache to finish with a page, probably
leading to the following bad-page-state:

 BUG: Bad page state in process python-bin  pfn:17d39b
 page:ffffea00053649e8 flags:004000000000100c count:0 mapcount:0 mapping:(null)
index:38686 (Tainted: G    B      ---------------- )
 Pid: 31053, comm: python-bin Tainted: G    B      ----------------
2.6.32-71.24.1.el6.x86_64 #1
 Call Trace:
 [<ffffffff8111bfe7>] bad_page+0x107/0x160
 [<ffffffff8111ee69>] free_hot_cold_page+0x1c9/0x220
 [<ffffffff8111ef19>] __pagevec_free+0x59/0xb0
 [<ffffffff8104b988>] ? flush_tlb_others_ipi+0x128/0x130
 [<ffffffff8112230c>] release_pages+0x21c/0x250
 [<ffffffff8115b92a>] ? remove_migration_pte+0x28a/0x2b0
 [<ffffffff8115f3f8>] ? mem_cgroup_get_reclaim_stat_from_page+0x18/0x70
 [<ffffffff81122687>] ____pagevec_lru_add+0x167/0x180
 [<ffffffff811226f8>] __lru_cache_add+0x58/0x70
 [<ffffffff81122731>] lru_cache_add_lru+0x21/0x40
 [<ffffffff81123f49>] putback_lru_page+0x69/0x100
 [<ffffffff8115c0bd>] migrate_pages+0x13d/0x5d0
 [<ffffffff81122687>] ? ____pagevec_lru_add+0x167/0x180
 [<ffffffff81152ab0>] ? compaction_alloc+0x0/0x370
 [<ffffffff8115255c>] compact_zone+0x4cc/0x600
 [<ffffffff8111cfac>] ? get_page_from_freelist+0x15c/0x820
 [<ffffffff810672f4>] ? check_preempt_wakeup+0x1c4/0x3c0
 [<ffffffff8115290e>] compact_zone_order+0x7e/0xb0
 [<ffffffff81152a49>] try_to_compact_pages+0x109/0x170
 [<ffffffff8111e94d>] __alloc_pages_nodemask+0x5ed/0x850
 [<ffffffff814c9136>] ? thread_return+0x4e/0x778
 [<ffffffff81150d43>] alloc_pages_vma+0x93/0x150
 [<ffffffff81167ea5>] do_huge_pmd_anonymous_page+0x135/0x340
 [<ffffffff814cb6f6>] ? rwsem_down_read_failed+0x26/0x30
 [<ffffffff81136755>] handle_mm_fault+0x245/0x2b0
 [<ffffffff814ce383>] do_page_fault+0x123/0x3a0
 [<ffffffff814cbdf5>] page_fault+0x25/0x30

nfs_migrate_page() calls nfs_fscache_release_page() which doesn't actually wait
- even if __GFP_WAIT is set.  The reason that doesn't wait is that
fscache_maybe_release_page() might deadlock the allocator as the work threads
writing to the cache may all end up sleeping on memory allocation.

However, I wonder if that is actually a problem.  There are a number of things
I can do to deal with this:

 (1) Make nfs_migrate_page() wait.

 (2) Make fscache_maybe_release_page() honour the __GFP_WAIT flag.

 (3) Set a timeout around the wait.

 (4) Make nfs_migrate_page() return an error if the page is still busy.

For the moment, I'll select (2) and (4).

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
2012-12-20 22:12:03 +00:00
David Howells
de242c0b8b NFS: Use FS-Cache invalidation
Use the new FS-Cache invalidation facility from NFS to deal with foreign
changes being detected on the server rather than attempting to retire the old
cookie and get a new one.

The problem with the old method was that NFS did not wait for all outstanding
storage and retrieval ops on the cache to complete.  There was no automatic
wait between the calls to ->readpages() and calls to invalidate_inode_pages2()
as the latter can only wait on locked pages that have been added to the
pagecache (which they haven't yet on entry to ->readpages()).

This was leading to oopses like the one below when an outstanding read got cut
off from its cookie by a premature release.

BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
IP: [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
PGD 15889067 PUD 15890067 PMD 0
Oops: 0000 [#1] SMP
CPU 0
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

Pid: 4544, comm: tar Not tainted 3.1.0-rc4-fsdevel+ #1064                  /DG965RY
RIP: 0010:[<ffffffffa0075118>]  [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
RSP: 0018:ffff8800158799e8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800070d41e0 RCX: ffff8800083dc1b0
RDX: 0000000000000000 RSI: ffff880015879960 RDI: ffff88003e627b90
RBP: ffff880015879a28 R08: 0000000000000002 R09: 0000000000000002
R10: 0000000000000001 R11: ffff880015879950 R12: ffff880015879aa4
R13: 0000000000000000 R14: ffff8800083dc158 R15: ffff880015879be8
FS:  00007f671e9d87c0(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000000a8 CR3: 000000001587f000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process tar (pid: 4544, threadinfo ffff880015878000, task ffff880015875040)
Stack:
 ffffffffa00b1759 ffff8800070dc158 ffff8800000213da ffff88002a286508
 ffff880015879aa4 ffff880015879be8 0000000000000001 ffff88002a2866e8
 ffff880015879a88 ffffffffa00b20be 00000000000200da ffff880015875040
Call Trace:
 [<ffffffffa00b1759>] ? nfs_fscache_wait_bit+0xd/0xd [nfs]
 [<ffffffffa00b20be>] __nfs_readpages_from_fscache+0x7e/0x13f [nfs]
 [<ffffffff81095fe7>] ? __alloc_pages_nodemask+0x156/0x662
 [<ffffffffa0098763>] nfs_readpages+0xee/0x187 [nfs]
 [<ffffffff81098a5e>] __do_page_cache_readahead+0x1be/0x267
 [<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267
 [<ffffffff81098d7b>] ra_submit+0x1c/0x20
 [<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a
 [<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a
 [<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e
 [<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs]
 [<ffffffff810c22c4>] do_sync_read+0xba/0xfa
 [<ffffffff810a62c9>] ? might_fault+0x4e/0x9e
 [<ffffffff81177a47>] ? security_file_permission+0x7b/0x84
 [<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8
 [<ffffffff810c29a4>] vfs_read+0xaa/0x13a
 [<ffffffff810c2a79>] sys_read+0x45/0x6c
 [<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b

Reported-by: Mark Moseley <moseleymark@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2012-12-20 22:06:33 +00:00
Linus Torvalds
2d4dce0070 NFS client updates for Linux 3.8
Features include:
 
 - Full audit of BUG_ON asserts in the NFS, SUNRPC and lockd client code
   Remove altogether where possible, and replace with WARN_ON_ONCE and
   appropriate error returns where not.
 - NFSv4.1 client adds session dynamic slot table management. There is
   matching server side code that has been submitted to Bruce for
   consideration. Together, this code allows the server to dynamically
   manage the amount of memory it allocates to the duplicate request
   cache for each client. It will constantly resize those caches to
   reserve more memory for clients that are hot while shrinking caches
   for those that are quiescent.
 
 In addition, there are assorted bugfixes for the generic NFS write code,
 fixes to deal with the drop_nlink() warnings, and yet another fix for
 NFSv4 getacl.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQz8VNAAoJEGcL54qWCgDy7iYQAKbr7AAZOcZPoJigzakZ7nMi
 UKYulGbFais2Llwzw1e+U5RzmorTSbvl7/m8eS7pDf3auYw/t4xtXjKSGZUNxaE1
 q2hNKgVwodMbScYdkZXvKKNckS93oPDttrmEyzjKanqey+1E3HSklvOvikN0ihte
 B/G1OtA7Qpcr92bPrLK+PjDqarCBUI4g42dYbZOBrZnXKTRtzUqsuKPu7WjpPiof
 SHE5b1Emt7oUxgcijWGcvYCQ8voZdeSCnSksH3DgvORlutwdhUD3Yg8KyEfFZdyc
 6C59ozXRLiHkV3c+jMhJzDkQXR9bYHrnK3tlq4G8v1NdJxRktQliZeqecRvip/Wz
 rAxfE6fnPDEvKsCpZb3+5yTAt+aZwzEhRg1fFC9qfGOp+oRa+CWw5kJCyIFHwJu6
 4LOlubQAf6rnIsja1L8D0FdeqHUa1+wy61On5kgVYS5JGtoBsQHpa1zTwdOxPmsR
 2XTMYGNCEabvpKpO9+5xQbUzkFExPTesw47ygXiUuDT/snaarpV3/f05SSCaWZkX
 R8QsGEOXTIh8/S+UxARGpc7H6xi1PdBM5nBziHVzjEdHgZRF4wGFaJe2CirMjSJO
 Df5GEd5Z/8VCGWs+1w7HD5EaQ2n0wbt5daCE80Y2jRBr7NMYnY+ciF8/GktLpHsn
 Zq1bXGOdr3UZ92LXuzL9
 =G3N9
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Features include:

   - Full audit of BUG_ON asserts in the NFS, SUNRPC and lockd client
     code.  Remove altogether where possible, and replace with
     WARN_ON_ONCE and appropriate error returns where not.
   - NFSv4.1 client adds session dynamic slot table management.  There
     is matching server side code that has been submitted to Bruce for
     consideration.

     Together, this code allows the server to dynamically manage the
     amount of memory it allocates to the duplicate request cache for
     each client.  It will constantly resize those caches to reserve
     more memory for clients that are hot while shrinking caches for
     those that are quiescent.

  In addition, there are assorted bugfixes for the generic NFS write
  code, fixes to deal with the drop_nlink() warnings, and yet another
  fix for NFSv4 getacl."

* tag 'nfs-for-3.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (106 commits)
  SUNRPC: continue run over clients list on PipeFS event instead of break
  NFS: Don't use SetPageError in the NFS writeback code
  SUNRPC: variable 'svsk' is unused in function bc_send_request
  SUNRPC: Handle ECONNREFUSED in xs_local_setup_socket
  NFSv4.1: Deal effectively with interrupted RPC calls.
  NFSv4.1: Move the RPC timestamp out of the slot.
  NFSv4.1: Try to deal with NFS4ERR_SEQ_MISORDERED.
  NFS: nfs_lookup_revalidate should not trust an inode with i_nlink == 0
  NFS: Fix calls to drop_nlink()
  NFS: Ensure that we always drop inodes that have been marked as stale
  nfs: Remove unused list nfs4_clientid_list
  nfs: Remove duplicate function declaration in internal.h
  NFS: avoid NULL dereference in nfs_destroy_server
  SUNRPC handle EKEYEXPIRED in call_refreshresult
  SUNRPC set gss gc_expiry to full lifetime
  nfs: fix page dirtying in NFS DIO read codepath
  nfs: don't zero out the rest of the page if we hit the EOF on a DIO READ
  NFSv4.1: Be conservative about the client highest slotid
  NFSv4.1: Handle NFS4ERR_BADSLOT errors correctly
  nfs: don't extend writes to cover entire page if pagecache is invalid
  ...
2012-12-18 09:36:34 -08:00
Andrew Morton
965c8e59cf lseek: the "whence" argument is called "whence"
But the kernel decided to call it "origin" instead.  Fix most of the
sites.

Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17 17:15:12 -08:00
Linus Torvalds
2a74dbb9a8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "A quiet cycle for the security subsystem with just a few maintenance
  updates."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  Smack: create a sysfs mount point for smackfs
  Smack: use select not depends in Kconfig
  Yama: remove locking from delete path
  Yama: add RCU to drop read locking
  drivers/char/tpm: remove tasklet and cleanup
  KEYS: Use keyring_alloc() to create special keyrings
  KEYS: Reduce initial permissions on keys
  KEYS: Make the session and process keyrings per-thread
  seccomp: Make syscall skipping and nr changes more consistent
  key: Fix resource leak
  keys: Fix unreachable code
  KEYS: Add payload preparsing opportunity prior to key instantiate or update
2012-12-16 15:40:50 -08:00
Trond Myklebust
ada8e20d04 NFS: Don't use SetPageError in the NFS writeback code
The writeback code is already capable of passing errors back to user space
by means of the open_context->error. In the case of ENOSPC, Neil Brown
is reporting seeing 2 errors being returned.

Neil writes:

"e.g. if /mnt2/ if an nfs mounted filesystem that has no space then

strace dd if=/dev/zero conv=fsync >> /mnt2/afile count=1

reported Input/output error and the relevant parts of the strace output are:

write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
fsync(1)                                = -1 EIO (Input/output error)
close(1)                                = -1 ENOSPC (No space left on device)"

Neil then shows that the duplication of error messages appears to be due to
the use of the PageError() mechanism, which causes filemap_fdatawait_range
to return the extra EIO. The regression was introduced by
commit 7b281ee026 (NFS: fsync() must exit
with an error if page writeback failed).

Fix this by removing the call to SetPageError(), and just relying on
open_context->error reporting the ENOSPC back to fsync().

Reported-by: Neil Brown <neilb@suse.de>
Tested-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [3.6+]
2012-12-15 17:12:14 -05:00
Trond Myklebust
ac20d163fc NFSv4.1: Deal effectively with interrupted RPC calls.
If an RPC call is interrupted, assume that the server hasn't processed
the RPC call so that the next time we use the slot, we know that if we
get a NFS4ERR_SEQ_MISORDERED or NFS4ERR_SEQ_FALSE_RETRY, we just have
to bump the sequence number.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-15 15:39:59 -05:00
Trond Myklebust
8e63b6a8ad NFSv4.1: Move the RPC timestamp out of the slot.
Shave a few bytes off the slot table size by moving the RPC timestamp
into the sequence results.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-15 15:21:52 -05:00
Trond Myklebust
e879444084 NFSv4.1: Try to deal with NFS4ERR_SEQ_MISORDERED.
If the server returns NFS4ERR_SEQ_MISORDERED, it could be a sign
that the slot was retired at some point. Retry the attempt after
reinitialising the slot sequence number to 1.

Also add a handler for NFS4ERR_SEQ_FALSE_RETRY. Just bump the slot
sequence number and retry...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-15 14:49:09 -05:00
Trond Myklebust
65a0c14954 NFS: nfs_lookup_revalidate should not trust an inode with i_nlink == 0
If the inode has no links, then we should force a new lookup.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-14 17:51:40 -05:00
Trond Myklebust
1f018458b3 NFS: Fix calls to drop_nlink()
It is almost always wrong for NFS to call drop_nlink() after removing a
file. What we really want is to mark the inode's attributes for
revalidation, and we want to ensure that the VFS drops it if we're
reasonably sure that this is the final unlink().
Do the former using the usual cache validity flags, and the latter
by testing if inode->i_nlink == 1, and clearing it in that case.

This also fixes the following warning reported by Neil Brown and
Jeff Layton (among others).

[634155.004438] WARNING:
at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.5.0/lin [634155.004442]
Hardware name: Latitude E6510 [634155.004577]  crc_itu_t crc32c_intel
snd_hwdep snd_pcm snd_timer snd soundcor [634155.004609] Pid: 13402, comm:
bash Tainted: G        W    3.5.0-36-desktop # [634155.004611] Call Trace:
[634155.004630]  [<ffffffff8100444a>] dump_trace+0xaa/0x2b0
[634155.004641]  [<ffffffff815a23dc>] dump_stack+0x69/0x6f
[634155.004653]  [<ffffffff81041a0b>] warn_slowpath_common+0x7b/0xc0
[634155.004662]  [<ffffffff811832e4>] drop_nlink+0x34/0x40
[634155.004687]  [<ffffffffa05bb6c3>] nfs_dentry_iput+0x33/0x70 [nfs]
[634155.004714]  [<ffffffff8118049e>] dput+0x12e/0x230
[634155.004726]  [<ffffffff8116b230>] __fput+0x170/0x230
[634155.004735]  [<ffffffff81167c0f>] filp_close+0x5f/0x90
[634155.004743]  [<ffffffff81167cd7>] sys_close+0x97/0x100
[634155.004754]  [<ffffffff815c3b39>] system_call_fastpath+0x16/0x1b
[634155.004767]  [<00007f2a73a0d110>] 0x7f2a73a0d10f

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [3.3+]
2012-12-14 17:45:11 -05:00
Trond Myklebust
eed9935745 NFS: Ensure that we always drop inodes that have been marked as stale
There is no need to cache stale inodes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-14 14:36:36 -05:00
Yanchuan Nian
48d7a57693 nfs: Remove unused list nfs4_clientid_list
This list was designed to store struct nfs4_client in the client side.
But nfs4_client was obsolete and has been removed from the source code.
So remove the unused list.

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-13 10:40:09 -05:00
Yanchuan Nian
aaea7d2f78 nfs: Remove duplicate function declaration in internal.h
Remove duplicate function declaration in internal.h

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
[Trond: Added nfs_pageio_init_read, which suffered from the same problem]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-13 10:38:54 -05:00
NeilBrown
f259613a1e NFS: avoid NULL dereference in nfs_destroy_server
In rare circumstances, nfs_clone_server() of a v2 or v3 server can get
an error between setting server->destory (to nfs_destroy_server), and
calling nfs_start_lockd (which will set server->nlm_host).

If this happens, nfs_clone_server will call nfs_free_server which
will call nfs_destroy_server and thence nlmclnt_done(NULL).  This
causes the NULL to be dereferenced.

So add a guard to only call nlmclnt_done() if ->nlm_host is not NULL.

The other guards there are irrelevant as nlm_host can only be non-NULL
if one of these flags are set - so remove those tests.  (Thanks to Trond
for this suggestion).

This is suitable for any stable kernel since 2.6.25.

Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-12 23:55:56 -05:00
Andy Adamson
eb96d5c97b SUNRPC handle EKEYEXPIRED in call_refreshresult
Currently, when an RPCSEC_GSS context has expired or is non-existent
and the users (Kerberos) credentials have also expired or are non-existent,
the client receives the -EKEYEXPIRED error and tries to refresh the context
forever.  If an application is performing I/O, or other work against the share,
the application hangs, and the user is not prompted to refresh/establish their
credentials. This can result in a denial of service for other users.

Users are expected to manage their Kerberos credential lifetimes to mitigate
this issue.

Move the -EKEYEXPIRED handling into the RPC layer. Try tk_cred_retry number
of times to refresh the gss_context, and then return -EACCES to the application.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-12 15:36:02 -05:00
Jeff Layton
be7e985804 nfs: fix page dirtying in NFS DIO read codepath
The NFS DIO code will dirty pages that catch read responses in order to
handle the case where someone is doing DIO reads into an mmapped buffer.
The existing code doesn't really do the right thing though since it
doesn't take into account the case where we might be attempting to read
past the EOF.

Fix the logic in that code to only dirty pages that ended up receiving
data from the read. Note too that it really doesn't matter if
NFS_IOHDR_ERROR is set or not. All that matters is if the page was
altered by the read.

Cc: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-12 12:56:19 -05:00
Jeff Layton
67fad106a2 nfs: don't zero out the rest of the page if we hit the EOF on a DIO READ
Eryu provided a test program that would segfault when attempting to read
past the EOF on file that was opened O_DIRECT. The buffer given to the
read() call was on the stack, and when he attempted to read past it it
would scribble over the rest of the stack page.

If we hit the end of the file on a DIO READ request, then we don't want
to zero out the rest of the buffer. These aren't pagecache pages after
all, and there's no guarantee that the buffers that were passed in
represent entire pages.

Cc: <stable@vger.kernel.org> # v3.5+
Cc: Fred Isaman <iisaman@netapp.com>
Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-12 12:56:09 -05:00
Trond Myklebust
b0ef9647a0 NFSv4.1: Be conservative about the client highest slotid
If the server sends us a target that looks like an outlier, but
is lower than the existing target, then respect it anyway.
However defer actually updating the generation counter until
we get a target that doesn't look like an outlier.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-11 12:29:10 -05:00
Trond Myklebust
8556307374 NFSv4.1: Handle NFS4ERR_BADSLOT errors correctly
Most (all) NFS4ERR_BADSLOT errors are due to the client failing to
respect the server's sr_highest_slotid limit. This mainly happens
due to reordered RPC requests.
The way to handle it is simply to drop the slot that we're using,
and retry using the new highest_slotid limits.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-11 10:31:12 -05:00
Trond Myklebust
7ce0171d4f Merge branch 'bugfixes' into nfs-for-next 2012-12-11 09:16:26 -05:00
Jeff Layton
81d9bce530 nfs: don't extend writes to cover entire page if pagecache is invalid
Jian reported that the following sequence would leave "testfile" with
corrupt data:

    # mount localhost:/export /mnt/nfs/ -o vers=3
    # echo abc > /mnt/nfs/testfile; echo def >> /export/testfile; echo ghi >> /mnt/nfs/testfile
    # cat -v /export/testfile
    abc
    ^@^@^@^@ghi

While there's no locking involved here, the operations are serialized,
so CTO should prevent corruption.

The first write to the file is fine and writes 4 bytes. The file is then
extended on the server. When it's reopened a GETATTR is issued and the
size change is noticed. This causes NFS_INO_INVALID_DATA to be set on
the file. Because the file is opened for write only,
nfs_want_read_modify_write() returns 0 to nfs_write_begin().
nfs_updatepage then calls nfs_write_pageuptodate() to see if it should
extend the nfs_page to cover the whole page. NFS_INO_INVALID_DATA is
still set on the file at that point, but that flag is ignored and
nfs_pageuptodate erroneously extends the write to cover the whole page,
with the write done on the server side filled in with zeroes.

This patch just has that function check for NFS_INO_INVALID_DATA in
addition to NFS_INO_REVAL_PAGECACHE. This fixes the bug, but looking
over the code, I wonder if we might have a similar bug in
nfs_revalidate_size(). The difference between those two flags is very
subtle, so it seems like we ought to be checking for
NFS_INO_INVALID_DATA in most of the places that we look for
NFS_INO_REVAL_PAGECACHE.

I believe this is regression introduced by commit 8d197a568. The code
did check for NFS_INO_INVALID_DATA prior to that patch.

Original bug report is here:

    https://bugzilla.redhat.com/show_bug.cgi?id=885743

Cc: <stable@vger.kernel.org> # 3.5+
Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-11 09:14:51 -05:00
Sven Wegener
7d3e91a89b NFSv4: Check for buffer length in __nfs4_get_acl_uncached
Commit 1f1ea6c "NFSv4: Fix buffer overflow checking in
__nfs4_get_acl_uncached" accidently dropped the checking for too small
result buffer length.

If someone uses getxattr on "system.nfs4_acl" on an NFSv4 mount
supporting ACLs, the ACL has not been cached and the buffer suplied is
too short, we still copy the complete ACL, resulting in kernel and user
space memory corruption.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-11 09:14:50 -05:00
Trond Myklebust
1fa8064429 NFSv4.1: Try to eliminate outliers when updating target_highest_slotid
Look for sudden changes in the first and second derivatives in order
to eliminate outlier changes to target_highest_slotid (which are
due to out-of-order RPC replies).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:53 +01:00
Trond Myklebust
b75ad4cda5 NFSv4.1: Ensure smooth handover of slots from one task to the next waiting
Currently, we see a lot of bouncing for the value of highest_used_slotid
due to the fact that slots are getting freed, instead of getting instantly
transmitted to the next waiting task.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:52 +01:00
Trond Myklebust
1e1093c7fd NFSv4.1: Don't mess with task priorities in nfs41_setup_sequence
We want to preserve the rpc_task priority for things like writebacks,
that may have differing levels of urgency.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:51 +01:00
Bryan Schumaker
104287cd4e NFS: Remove _nfs_call_sync_session
All it does is pass its arguments through to another function.  Let's
cut out the middleman...

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:51 +01:00
Trond Myklebust
8fe72bac8d NFSv4: Clean up handling of privileged operations
Privileged rpc calls are those that are run by the state recovery thread,
in cases where we're trying to recover the system after a server reboot
or a network partition. In those cases, we want to fence off all other
rpc calls (see nfs4_begin_drain_session()) so that they don't end up
using stateids or clientids that are in the process of being recovered.

Prior to this patch, we had to set up special callback functions in
order to declare an rpc call as being privileged.
By adding a new field to the sequence arguments, this patch simplifies
things considerably, and allows us to declare the rpc call as privileged
before it is run.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:50 +01:00
Trond Myklebust
275e7e20aa NFSv4.1: Remove the 'FIFO' behaviour for nfs41_setup_sequence
It is more important to preserve the task priority behaviour, which ensures
that things like reclaim writes take precedence over background and kupdate
writes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:50 +01:00
Trond Myklebust
7b939a3f44 NFSv4.1: Clean up nfs41_setup_sequence
Move all the sleep-and-exit cases into a single section of code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:49 +01:00
Trond Myklebust
fd0c09537a NFSv4: Simplify the NFSv4/v4.1 synchronous call switch
We shouldn't need to pass the 'cache_reply' parameter if we
initialise the sequence_args/sequence_res in the caller.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:49 +01:00
Trond Myklebust
d9afbd1b08 NFSv4.1: Simplify the sequence setup
Nobody calls nfs4_setup_sequence or nfs41_setup_sequence without
also calling rpc_call_start() on success. This commit therefore
folds the rpc_call_start call into nfs41_setup_sequence().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:48 +01:00
Trond Myklebust
6ba7db3420 NFSv4.1: Use nfs41_setup_sequence where appropriate
There is no point in using nfs4_setup_sequence or nfs4_sequence_done
in pure NFSv4.1 functions. We already know that those have sessions...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:48 +01:00
Trond Myklebust
c10e449827 NFSv4.1: Ping server when our session table limits are too high
If the server requests a lower target_highest_slotid, then ensure
that we ping it with at least one RPC call containing an
appropriate SEQUENCE op. This ensures that the server won't need to
send a recall callback in order to shrink the slot table.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:47 +01:00
Trond Myklebust
0ca3f4825a NFSv4.1: Set the maximum slot table size to 1024 slots
This means that we end up statically allocating 128 bytes for the
bitmap on each slot table.
For a server that supports 1MB write and read I/O sizes this means
that we can completely fill the maximum 1GB TCP send/receive
windows.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:47 +01:00
Trond Myklebust
76e697ba7e NFSv4.1: Move slot table and session struct definitions to nfs4session.h
Clean up. Gather NFSv4.1 slot definitions in fs/nfs/nfs4session.h.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:46 +01:00
Trond Myklebust
73e39aaa83 NFSv4.1: Cleanup move session slot management to fs/nfs/nfs4session.c
NFSv4.1 session management is getting complex enough to deserve
a separate file.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:45 +01:00
Trond Myklebust
3302127967 NFSv4: Move nfs4_wait_clnt_recover and nfs4_client_recover_expired_lease
nfs4_wait_clnt_recover and nfs4_client_recover_expired_lease are both
generic state related functions. As such, they belong in nfs4state.c,
and not nfs4proc.c

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:45 +01:00
Trond Myklebust
5d63360dd8 NFSv4.1: Clean up session draining
Coalesce nfs4_check_drain_bc_complete and nfs4_check_drain_fc_complete
into a single function that can be called when the slot table is known
to be empty, then change nfs4_callback_free_slot() and nfs4_free_slot()
to use it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:44 +01:00
Trond Myklebust
69d206b5b3 NFSv4.1: If slot allocation fails due to OOM, retry more quickly
If the NFSv4.1 session slot allocation fails due to an ENOMEM condition,
then set the task->tk_timeout to 1/4 second to ensure that we do retry
the slot allocation more quickly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:44 +01:00
Trond Myklebust
ac0748359a NFSv4.1: CB_RECALL_SLOT must schedule a sequence op after updating targets
RFC5661 requires us to make sure that the server knows we've updated
our slot table size by sending at least one SEQUENCE op containing the
new 'highest_slotid' value.
We can do so using the 'CHECK_LEASE' functionality of the state
manager.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:43 +01:00
Trond Myklebust
afa296103e NFSv4.1: Remove the state manager code to resize the slot table
The state manager no longer needs any special machinery to stop the
session flow and resize the slot table. It is all done on the fly by
the SEQUENCE op code now.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:43 +01:00
Trond Myklebust
87dda67e73 NFSv4.1: Allow SEQUENCE to resize the slot table on the fly
Instead of an array of slots, use a singly linked list of slots that
can be dynamically appended to or shrunk.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:42 +01:00
Trond Myklebust
97e548a93d NFSv4.1: Support dynamic resizing of the session slot table
Allow the server to control the size of the session slot table
by adjusting the value of sr_target_max_slots in the reply to the
SEQUENCE operation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:42 +01:00
Trond Myklebust
1b285ff16a NFSv4.1: Allow the server to recall all but one slot
If the server wants to leave us with only one slot, or it wants
to "shrink" our slot table to something larger than we have now,
then so be it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:42 +01:00
Trond Myklebust
d5fb4ce33e NFSv4.1: Don't confuse target_highest_slotid and max_slots in cb_recall_slot
Don't confuse the table size and the target_highest_slotid...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:41 +01:00
Trond Myklebust
ce008c4bb9 NFSv4.1: Fix nfs4_callback_recallslot to work with dynamic slot allocation
Ensure that the NFSv4.1 CB_RECALL_SLOT callback updates the slot table
target max slotid safely.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:37 +01:00
Trond Myklebust
da0507b7c9 NFSv4.1: Reset the sequence number for slots that have been deallocated
When the server tells us that it is dynamically resizing the session
replay cache, we should reset the sequence number for those slots
that have been deallocated.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:30:17 +01:00
Trond Myklebust
464ee9f966 NFSv4.1: Ensure that the client tracks the server target_highest_slotid
Dynamic slot allocation in NFSv4.1 depends on the client being able to
track the server's target value for the highest slotid in the
slot table.  See the reference in Section 2.10.6.1 of RFC5661.

To avoid ordering problems in the case where 2 SEQUENCE replies contain
conflicting updates to this target value, we also introduce a generation
counter, to track whether or not an RPC containing a SEQUENCE operation
was launched before or after the last update.

Also rename the nfs4_slot_table target_max_slots field to
'target_highest_slotid' to avoid confusion with a slot
table size or number of slots.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-06 00:29:47 +01:00
Al Viro
c44600c9d1 nfs_lookup_revalidate(): fix a leak
We are leaking fattr and fhandle if we decide that dentry is not to
be invalidated, after all (e.g. happens to be a mountpoint).  Just
free both before that...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29 22:04:36 -05:00
Al Viro
696199f8cc don't do blind d_drop() in nfs_prime_dcache()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29 22:00:51 -05:00
Trond Myklebust
f4af6e2abc NFSv4.1: Clean up nfs4_free_slot
Change the argument to take the pointer to the slot, instead of
just the slotid.

We know that the new value of highest_used_slot must be less than
the current value. No need to scan the whole table.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-26 17:49:53 -05:00
Trond Myklebust
2dc03b7f00 NFSv4.1: Simplify slot allocation
Clean up the NFSv4.1 slot allocation by replacing nfs_find_slot() with
a function nfs_alloc_slot() that returns a pointer to the nfs4_slot
instead of an offset into the slot table.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-26 17:49:52 -05:00
Trond Myklebust
2b2fa71723 NFSv4.1: Simplify struct nfs4_sequence_args too
Replace the session pointer + slotid with a pointer to the
allocated slot.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-26 17:49:52 -05:00
Trond Myklebust
df2fabffba NFSv4.1: Label each entry in the session slot tables with its slot number
Instead of doing slot table pointer gymnastics every time we want to
know which slot we're using.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-26 17:49:51 -05:00
Trond Myklebust
e3725ec015 NFSv4.1: Shrink struct nfs4_sequence_res by moving the session pointer
Move the session pointer into the slot table, then have struct nfs4_slot
point to that slot table.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-26 17:49:04 -05:00
Yanchuan Nian
4c10021008 nfs: Fix wrong slab cache in nfs_commit_mempool
The slab cache in nfs_commit_mempool is wrong, and I think it is just a slip.
I tested it on a x86-32 machine, the size of nfs_write_header is 544, and
the size of nfs_commit_data is 408, so it works fine. It is also true that
sizeof(struct nfs_write_header) > sizeof(struct nfs_commit_data) on other
platforms in my opinoin. Just fix it.

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-25 11:59:33 -05:00
Jim Rees
d751f748b3 NFS: Reduce stack use in encode_exchange_id()
encode_exchange_id() uses more stack space than necessary, giving a compile
time warning. Reduce the size of the static buffer for implementation name.

Signed-off-by: Jim Rees <rees@umich.edu>
Reviewed-by: "Adamson, Dros" <Weston.Adamson@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-21 22:59:29 -05:00
Trond Myklebust
fe20d7d5ee NFSv4: Fix a compile time warning when #undef CONFIG_NFS_V4_1
The function nfs4_get_machine_cred_locked is used by NFSv4.0 routines
too.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-21 22:59:28 -05:00
Trond Myklebust
933602e368 NFSv4.1: Shrink struct nfs4_sequence_res by moving sr_renewal_time
Store the renewal time inside the session slot instead.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-21 09:29:53 -05:00
Trond Myklebust
9216106a84 NFSv4.1: clean up nfs4_recall_slot to use nfs4_alloc_slots
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-21 09:29:53 -05:00
Trond Myklebust
2d473d378e NFSv4.1: nfs4_alloc_slots doesn't need zeroing
All that memory is going to be initialised to non-zero by
nfs4_add_and_init_slots anyway.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-21 09:29:52 -05:00
Trond Myklebust
43095d3972 NFSv4.1: We must bump the clientid sequence number after CREATE_SESSION
We must always bump the clientid sequence number after a successful
call to CREATE_SESSION on the server. The result of
nfs4_verify_channel_attrs() is irrelevant to that requirement.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-21 09:29:52 -05:00
Trond Myklebust
688a9024e2 NFSv4.1: Adjust CREATE_SESSION arguments when mounting a new filesystem
If we're mounting a new filesystem, ensure that the session has negotiated
large enough request and reply sizes to match the wsize and rsize mount
arguments.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-21 09:29:51 -05:00
Trond Myklebust
ae72ae6760 NFSv4.1: Don't confuse CREATE_SESSION arguments and results
Don't store the target request and response sizes in the same
variables used to store the server's replies to those targets.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-21 09:29:51 -05:00
Trond Myklebust
5df904aeb0 NFSv4.1: Handle session reset and bind_conn_to_session before lease check
We can't send a SEQUENCE op unless the session is OK, so it is pointless
to handle the CHECK_LEASE state before we've dealt with SESSION_RESET
and BIND_CONN_TO_SESSION.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-21 09:28:42 -05:00
Bryan Schumaker
6bdb5f213c NFS: Add sequence_priviliged_ops for nfs4_proc_sequence()
If I mount an NFS v4.1 server to a single client multiple times and then
run xfstests over each mountpoint I usually get the client into a state
where recovery deadlocks.  The server informs the client of a
cb_path_down sequence error, the client then does a
bind_connection_to_session and checks the status of the lease.

I found that bind_connection_to_session sets the NFS4_SESSION_DRAINING
flag on the client, but this flag is never unset before
nfs4_check_lease() reaches nfs4_proc_sequence().  This causes the client
to deadlock, halting all NFS activity to the server.  nfs4_proc_sequence()
is only called by the state manager, so I can change it to run in privileged
mode to bypass the NFS4_SESSION_DRAINING check and avoid the deadlock.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-11-20 23:34:54 -05:00
Trond Myklebust
28d79ea33f NFS: Remove the BUG_ON() in the mount code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:39 -05:00
Trond Myklebust
f48407ddd4 NFS: Remove BUG_ON()s in the fs/nfs/inode.c
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:39 -05:00
Trond Myklebust
4ea8fed593 NFSv4: Get rid of unnecessary BUG_ON()s
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:39 -05:00
Trond Myklebust
deed85e760 NFS: Remove BUG_ON() calls from the generic writeback code
...and ensure that we set the return value for nfs_page_async_flush()
to zero! (Reported-by: Dros Adamson)

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:39 -05:00
Trond Myklebust
bc5a89b337 NFSv4.1: Remove assertion BUG_ON()s from the files and generic layout code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:39 -05:00
Trond Myklebust
eba24e1fe5 NFSv4.1: Remove unused function last_byte_offset
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:38 -05:00
Trond Myklebust
d3edcf9614 NFSv4: Remove the BUG_ON() from nfs4_get_lease_time_prepare()...
An EAGAIN return value would be unexpected, but there is no reason to
BUG...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:38 -05:00
Trond Myklebust
7fc388460e NFS: Remove asserts from the NFS XDR code
Convert the ones that are not trivial to check into WARN_ON_ONCE().
Remove checks for things such as NFS2_MAXPATHLEN, which are trivially
done by the caller.

Add a comment to the case of nfs3_xdr_enc_setacl3args. What is being
done there is just wrong...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:38 -05:00
Trond Myklebust
1fea73a865 NFS: Get rid of unnecessary asserts
If the nfs_client fails to initialise correctly, then it will
return an error condition.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:38 -05:00
Weston Andros Adamson
998f40b550 NFS4: nfs4_opendata_access should return errno
Return errno - not an NFS4ERR_. This worked because NFS4ERR_ACCESS == EACCES.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-02 18:51:54 -04:00
Trond Myklebust
f9b1ef5f06 NFSv4: Initialise the NFSv4.1 slot table highest_used_slotid correctly
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-01 12:02:03 -04:00
Weston Andros Adamson
324d003b0c NFS: add nfs_sb_deactive_async to avoid deadlock
Use nfs_sb_deactive_async instead of nfs_sb_deactive when in a workqueue
context.  This avoids a deadlock where rpc_shutdown_client loops forever
in a workqueue kworker context, trying to kill all RPC tasks associated with
the client, while one or more of these tasks have already been assigned to the
same kworker (and will never run rpc_exit_task).

This approach is needed because RPC tasks that have already been assigned
to a kworker by queue_work cannot be canceled, as explained in the comment
for workqueue.c:insert_wq_barrier.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
[Trond: add module_get/put.]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-31 16:26:26 -04:00
Ben Hutchings
97a5486826 nfs: Show original device name verbatim in /proc/*/mount{s,info}
Since commit c7f404b ('vfs: new superblock methods to override
/proc/*/mount{s,info}'), nfs_path() is used to generate the mounted
device name reported back to userland.

nfs_path() always generates a trailing slash when the given dentry is
the root of an NFS mount, but userland may expect the original device
name to be returned verbatim (as it used to be).  Make this
canonicalisation optional and change the callers accordingly.

[jrnieder@gmail.com: use flag instead of bool argument]
Reported-and-tested-by: Chris Hiestand <chiestand@salk.edu>
Reference: http://bugs.debian.org/669314
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: <stable@vger.kernel.org> # v2.6.39+
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-31 16:26:26 -04:00
Scott Mayhew
acce94e68a nfsv3: Make v3 mounts fail with ETIMEDOUTs instead EIO on mountd timeouts
In very busy v3 environment, rpc.mountd can respond to the NULL
procedure but not the MNT procedure in a timely manner causing
the MNT procedure to time out. The problem is the mount system
call returns EIO which causes the mount to fail, instead of
ETIMEDOUT, which would cause the mount to be retried.

This patch sets the RPC_TASK_SOFT|RPC_TASK_TIMEOUT flags to
the rpc_call_sync() call in nfs_mount() which causes
ETIMEDOUT to be returned on timed out connections.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-10-31 16:26:25 -04:00
Yanchuan Nian
7175fe9015 nfs: Check whether a layout pointer is NULL before free it
The new layout pointer in pnfs_find_alloc_layout() may be NULL because of
out of memory. we must do some check work, otherwise pnfs_free_layout_hdr()
will go wrong because it can not deal with a NULL pointer.

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-31 16:26:25 -04:00
NeilBrown
8d96b10639 NFS: fix bug in legacy DNS resolver.
The DNS resolver's use of the sunrpc cache involves a 'ttl' number
(relative) rather that a timeout (absolute).  This confused me when
I wrote
  commit c5b29f885a
     "sunrpc: use seconds since boot in expiry cache"

and I managed to break it.  The effect is that any TTL is interpreted
as 0, and nothing useful gets into the cache.

This patch removes the use of get_expiry() - which really expects an
expiry time - and uses get_uint() instead, treating the int correctly
as a ttl.

This fixes a regression that has been present since 2.6.37, causing
certain NFS accesses in certain environments to incorrectly fail.

Reported-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-31 16:25:59 -04:00
Trond Myklebust
2b1bc308f4 NFSv4: nfs4_locku_done must release the sequence id
If the state recovery machinery is triggered by the call to
nfs4_async_handle_error() then we can deadlock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-10-31 15:10:04 -04:00
Trond Myklebust
2240a9e2d0 NFSv4.1: We must release the sequence id when we fail to get a session slot
If we do not release the sequence id in cases where we fail to get a
session slot, then we can deadlock if we hit a recovery scenario.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-10-31 15:08:18 -04:00
Bryan Schumaker
399f11c3d8 NFS: Wait for session recovery to finish before returning
Currently, we will schedule session recovery and then return to the
caller of nfs4_handle_exception.  This works for most cases, but causes
a hang on the following test case:

	Client				Server
	------				------
	Open file over NFS v4.1
	Write to file
					Expire client
	Try to lock file

The server will return NFS4ERR_BADSESSION, prompting the client to
schedule recovery.  However, the client will continue placing lock
attempts and the open recovery never seems to be scheduled.  The
simplest solution is to wait for session recovery to run before retrying
the lock.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-10-31 13:13:28 -04:00
Trond Myklebust
e9b7e91745 NFSv4: Fix the return value for nfs_callback_start_svc
returning PTR_ERR(cb_info->task) just after we have set it to
NULL looks like a typo...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
2012-10-16 13:14:42 -04:00
Trond Myklebust
2e928e4878 NFSv4.1: Declare osd_pri_2_pnfs_err(), objio_init_read/write to be static
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-16 12:38:00 -04:00
Trond Myklebust
d6aa6a81d4 NFSv4: fs/nfs/nfs4getroot.c needs to include "internal.h"
Fix a warning about "no previous prototype for ‘nfs4_get_rootfh’"

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-16 12:37:59 -04:00
Trond Myklebust
1813badd98 NFSv4.1: Use kcalloc() to allocate zeroed arrays instead of kzalloc()
Don't circumvent the array size checks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-15 10:49:43 -04:00
Trond Myklebust
d527e5c15d NFSv4.1: Do not call pnfs_return_layout() from an rpciod context
Move the call to pnfs_return_layout() to the read and write rpc_release()
callbacks, so that it gets called from nfsiod, which is a more appropriate
context.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-15 10:49:43 -04:00
Trond Myklebust
8fcdc31b3d NFSv4.1: Kill nfs4_ds_disconnect()
There is nothing to prevent another thread from dereferencing ds->ds_clp
during or after the call to nfs4_ds_disconnect(), and Oopsing due to the
resulting NULL pointer.

Instead, we should just rely on filelayout_mark_devid_invalid() to keep
us out of trouble by avoiding that deviceid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-15 10:49:42 -04:00
Linus Torvalds
bd81ccea85 Merge branch 'for-3.7' of git://linux-nfs.org/~bfields/linux
Pull nfsd update from J Bruce Fields:
 "Another relatively quiet cycle.  There was some progress on my
  remaining 4.1 todo's, but a couple of them were just of the form
  "check that we do X correctly", so didn't have much affect on the
  code.

  Other than that, a bunch of cleanup and some bugfixes (including an
  annoying NFSv4.0 state leak and a busy-loop in the server that could
  cause it to peg the CPU without making progress)."

* 'for-3.7' of git://linux-nfs.org/~bfields/linux: (46 commits)
  UAPI: (Scripted) Disintegrate include/linux/sunrpc
  UAPI: (Scripted) Disintegrate include/linux/nfsd
  nfsd4: don't allow reclaims of expired clients
  nfsd4: remove redundant callback probe
  nfsd4: expire old client earlier
  nfsd4: separate session allocation and initialization
  nfsd4: clean up session allocation
  nfsd4: minor free_session cleanup
  nfsd4: new_conn_from_crses should only allocate
  nfsd4: separate connection allocation and initialization
  nfsd4: reject bad forechannel attrs earlier
  nfsd4: enforce per-client sessions/no-sessions distinction
  nfsd4: set cl_minorversion at create time
  nfsd4: don't pin clientids to pseudoflavors
  nfsd4: fix bind_conn_to_session xdr comment
  nfsd4: cast readlink() bug argument
  NFSD: pass null terminated buf to kstrtouint()
  nfsd: remove duplicate init in nfsd4_cb_recall
  nfsd4: eliminate redundant nfs4_free_stateid
  fs/nfsd/nfs4idmap.c: adjust inconsistent IS_ERR and PTR_ERR
  ...
2012-10-13 10:53:54 +09:00
J. Bruce Fields
a9ca4043d0 Merge Trond's bugfixes
Merge branch 'bugfixes' of git://linux-nfs.org/~trondmy/nfs-2.6 into
for-3.7-incoming.  Mainly needed for Bryan's "SUNRPC: Set alloc_slot for
backchannel tcp ops", without which the 4.1 server oopses.
2012-10-11 12:41:05 -04:00
Linus Torvalds
df632d3ce7 NFS client updates for Linux 3.7
Features include:
 
 - Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1
   Aside from the issues discussed at the LKS, distros are shipping
   NFSv4.1 with all the trimmings.
 - Fix fdatasync()/fsync() for the corner case of a server reboot.
 - NFSv4 OPEN access fix: finally distinguish correctly between
   open-for-read and open-for-execute permissions in all situations.
 - Ensure that the TCP socket is closed when we're in CLOSE_WAIT
 - More idmapper bugfixes
 - Lots of pNFS bugfixes and cleanups to remove unnecessary state and
   make the code easier to read.
 - In cases where a pNFS read or write fails, allow the client to
   resume trying layoutgets after two minutes of read/write-through-mds.
 - More net namespace fixes to the NFSv4 callback code.
 - More net namespace fixes to the NFSv3 locking code.
 - More NFSv4 migration preparatory patches.
   Including patches to detect network trunking in both NFSv4 and NFSv4.1
 - pNFS block updates to optimise LAYOUTGET calls.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQdMvBAAoJEGcL54qWCgDyV84P/0XvcEXj6kdMv9EiWfRczo7r
 iAwAIhiEmG1agtZa6v+Gso2MYRQbkGyJi0LKIwzGqNUi0BLQGQCoV93kB0ITVpiN
 g7poDTnPyoItW1oJCtC48/Mx0G5C1yrHSwFAJrXmtzDF1mwd/BIQReafYp6x+/TU
 Mvwm7au3Y2ySRBEDmY4zyBERHXGt//JmsZ9Ays6jewQg5ZOyjDQKoeHVYaaeJoF0
 A0tQGcBSNdySagI5dt4SlkuO7AClhzVHlilep2dsBu/TLS0F2pEdHXvM2W0koZmM
 uazaIpzd2F7TfokTYExgsyKsqpkzpDf1kebN4Y1+Ioi7Yy30dQrX6lNaUNcOmOJQ
 xx694HDHV90KdRBVSFhOIHMTBRcls68hBcWib3MXWHTKX6HVgnFMwhwxGH0MRezf
 3rmXoqn+CO1j5WeQmA3BqdVbHSZHi913TKEwE/qoW4pmOFhv5I2flXWQS/Rwvdng
 2xDCe6TlvhMS92IpyvNEIicXLRSm+DUAmoAfSqqlifZIAEM5R29e/wCAsmVprO3B
 LPHyUoIMO6SZ1PL6Rk20+6qQfvCK7U/ChULsUL/zb7R88Pc3sFE2BeAvZVATsvH3
 +FJWTz43fwUBoMhPsn8xSBLn/fq6az5C19syz6Fpu3DZ4X0EwyVWifiFk6HgcxZD
 J8ajEl+dNZeFE8rkwykX
 =uBk7
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Features include:

   - Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1
     Aside from the issues discussed at the LKS, distros are shipping
     NFSv4.1 with all the trimmings.
   - Fix fdatasync()/fsync() for the corner case of a server reboot.
   - NFSv4 OPEN access fix: finally distinguish correctly between
     open-for-read and open-for-execute permissions in all situations.
   - Ensure that the TCP socket is closed when we're in CLOSE_WAIT
   - More idmapper bugfixes
   - Lots of pNFS bugfixes and cleanups to remove unnecessary state and
     make the code easier to read.
   - In cases where a pNFS read or write fails, allow the client to
     resume trying layoutgets after two minutes of read/write-
     through-mds.
   - More net namespace fixes to the NFSv4 callback code.
   - More net namespace fixes to the NFSv3 locking code.
   - More NFSv4 migration preparatory patches.
     Including patches to detect network trunking in both NFSv4 and
     NFSv4.1
   - pNFS block updates to optimise LAYOUTGET calls."

* tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (113 commits)
  pnfsblock: cleanup nfs4_blkdev_get
  NFS41: send real read size in layoutget
  NFS41: send real write size in layoutget
  NFS: track direct IO left bytes
  NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked()
  NFSv4.1: Ensure that the layout sequence id stays 'close' to the current
  NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close code
  NFSv4 set open access operation call flag in nfs4_init_opendata_res
  NFSv4.1: Remove the dependency on CONFIG_EXPERIMENTAL
  NFSv4 reduce attribute requests for open reclaim
  NFSv4: nfs4_open_done first must check that GETATTR decoded a file type
  NFSv4.1: Deal with wraparound when updating the layout "barrier" seqid
  NFSv4.1: Deal with wraparound issues when updating the layout stateid
  NFSv4.1: Always set the layout stateid if this is the first layoutget
  NFSv4.1: Fix another refcount issue in pnfs_find_alloc_layout
  NFSv4: don't put ACCESS in OPEN compound if O_EXCL
  NFSv4: don't check MAY_WRITE access bit in OPEN
  NFS: Set key construction data for the legacy upcall
  NFSv4.1: don't do two EXCHANGE_IDs on mount
  NFS: nfs41_walk_client_list(): re-lock before iterating
  ...
2012-10-10 23:52:35 +09:00
J. Bruce Fields
f474af7051 UAPI Disintegration 2012-10-09
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUHPmWxOxKuMESys7AQKN4w//XDwALfbf0MXIw+gwyRiUtJe9mGexvI6X
 1R4FWU9a3ImzEZP4cWnmPGT2wmC/x007DcIvx8cyvbdlSuqtR2i/DC+HbWabiLRn
 nJS7Eer1BJvLv5dn6NmXMEz7yB4Z46+frcmBs3WQeR0sqBMDm+rjQzCqECznO8Jc
 VtCbox+VR2DuWcM++YECTblYEH3Z+doDXUN2eBaD8L9x3klPbPXD7OcRyOnry8w+
 ynmUTKKyH4+hpxDakYrObPIg+vFCxb4QRck1mlgA4wbvb3eqjhM0oOCYJ8GvmILA
 vdFYztWCjkiuOl5djtXBlsClX8SAMOBYlRed+R1GvjNCSR+WCWrFJJ2F8qoQ1w87
 9ts2/8qrozS8luTB475SkT2uLdJkIUKX89Oh+dWeE8YkbPnRPj5lNAdtNY5QSyDq
 VaRpIo+YfmZygyvHJQlAXBuZ0mvzcPzArfcPgSVTD3B7xTEGVu/45V7SnQX5os/V
 v39ySPXMdGOIdvK51gw7OtZl64uqrEKu39PyYDX/GUADflp/CHD0J7PJrQePbsH9
 AQolVZDIxTfKqYQnUdL8+C8Zc24RowEzz3c2+aO89MSzwGqev3q8sXRVbW/Iqryg
 p+V3nHe+ipKcga5tOBlPr9KDtDd7j3xN2yaIwf5/QyO1OHBpjAZP1gjSVDcUcwpi
 svYy4kPn3PA=
 =etoL
 -----END PGP SIGNATURE-----

nfs: disintegrate UAPI for nfs

This is to complete part of the Userspace API (UAPI) disintegration for which
the preparatory patches were pulled recently.  After these patches, userspace
headers will be segregated into:

        include/uapi/linux/.../foo.h

for the userspace interface stuff, and:

        include/linux/.../foo.h

for the strictly kernel internal stuff.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-09 18:35:22 -04:00
Konstantin Khlebnikov
0b173bc4da mm: kill vma flag VM_CAN_NONLINEAR
Move actual pte filling for non-linear file mappings into the new special
vma operation: ->remap_pages().

Filesystems must implement this method to get non-linear mapping support,
if it uses filemap_fault() then generic_file_remap_pages() can be used.

Now device drivers can implement this method and obtain nonlinear vma support.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>	#arch/tile
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:17 +09:00
Peng Tao
af283885b7 pnfsblock: cleanup nfs4_blkdev_get
It is not needed at all and it is messing with return values...

Reported-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-08 19:32:40 -04:00
Peng Tao
1fd937bd75 NFS41: send real read size in layoutget
For buffer read, use offst-to-isize.

For direct read, use dreq->bytes_left.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-08 19:32:34 -04:00
Peng Tao
6296556f0b NFS41: send real write size in layoutget
For buffer write, block layout client scan inode mapping to find
next hole and use offset-to-hole as layoutget length. Object
layout client uses offset-to-isize as layoutget length.

For direct write, both block layout and object layout use dreq->bytes_left.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-08 19:32:22 -04:00
Peng Tao
35754bc00e NFS: track direct IO left bytes
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-08 19:31:55 -04:00
Trond Myklebust
19c54abab7 NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked()
Split it into two functions, one which checks if layoutgets are blocked,
and one which checks if the layout stateid has expired.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-05 16:56:58 -07:00
Trond Myklebust
22aaf71495 NFSv4.1: Ensure that the layout sequence id stays 'close' to the current
Clamp the layout barrier sequence id to the current sequence id
minus the maximum number of outstanding layoutget requests.

Also ensure that we correctly initialise lo->plh_barrier if there are
no layout segments associated to this layout header.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-04 16:57:48 -07:00
Trond Myklebust
0f35ad6f68 NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-04 16:28:17 -07:00
Andy Adamson
5f65753033 NFSv4 set open access operation call flag in nfs4_init_opendata_res
nfs4_open_recover_helper zeros the nfs4_opendata result structures, removing
the result access_request information which leads to an XDR decode error.

Move the setting of the result access_request field to nfs4_init_opendata_res
which sets all the other required nfs4_opendata result fields and is shared
between the open and recover open paths.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-03 17:10:28 -07:00
Trond Myklebust
8544a9dc18 NFSv4.1: Remove the dependency on CONFIG_EXPERIMENTAL
CONFIG_EXPERIMENTAL is deprecated and, regardless of that, this code
is being enabled in most newer distributions.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-03 10:54:50 -07:00
Linus Torvalds
aab174f0df Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs update from Al Viro:

 - big one - consolidation of descriptor-related logics; almost all of
   that is moved to fs/file.c

   (BTW, I'm seriously tempted to rename the result to fd.c.  As it is,
   we have a situation when file_table.c is about handling of struct
   file and file.c is about handling of descriptor tables; the reasons
   are historical - file_table.c used to be about a static array of
   struct file we used to have way back).

   A lot of stray ends got cleaned up and converted to saner primitives,
   disgusting mess in android/binder.c is still disgusting, but at least
   doesn't poke so much in descriptor table guts anymore.  A bunch of
   relatively minor races got fixed in process, plus an ext4 struct file
   leak.

 - related thing - fget_light() partially unuglified; see fdget() in
   there (and yes, it generates the code as good as we used to have).

 - also related - bits of Cyrill's procfs stuff that got entangled into
   that work; _not_ all of it, just the initial move to fs/proc/fd.c and
   switch of fdinfo to seq_file.

 - Alex's fs/coredump.c spiltoff - the same story, had been easier to
   take that commit than mess with conflicts.  The rest is a separate
   pile, this was just a mechanical code movement.

 - a few misc patches all over the place.  Not all for this cycle,
   there'll be more (and quite a few currently sit in akpm's tree)."

Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
  MAX_LFS_FILESIZE should be a loff_t
  compat: fs: Generic compat_sys_sendfile implementation
  fs: push rcu_barrier() from deactivate_locked_super() to filesystems
  btrfs: reada_extent doesn't need kref for refcount
  coredump: move core dump functionality into its own file
  coredump: prevent double-free on an error path in core dumper
  usb/gadget: fix misannotations
  fcntl: fix misannotations
  ceph: don't abuse d_delete() on failure exits
  hypfs: ->d_parent is never NULL or negative
  vfs: delete surplus inode NULL check
  switch simple cases of fget_light to fdget
  new helpers: fdget()/fdput()
  switch o2hb_region_dev_write() to fget_light()
  proc_map_files_readdir(): don't bother with grabbing files
  make get_file() return its argument
  vhost_set_vring(): turn pollstart/pollstop into bool
  switch prctl_set_mm_exe_file() to fget_light()
  switch xfs_find_handle() to fget_light()
  switch xfs_swapext() to fget_light()
  ...
2012-10-02 20:25:04 -07:00
Kirill A. Shutemov
8c0a853770 fs: push rcu_barrier() from deactivate_locked_super() to filesystems
There's no reason to call rcu_barrier() on every
deactivate_locked_super().  We only need to make sure that all delayed rcu
free inodes are flushed before we destroy related cache.

Removing rcu_barrier() from deactivate_locked_super() affects some fast
paths.  E.g.  on my machine exit_group() of a last process in IPC
namespace takes 0.07538s.  rcu_barrier() takes 0.05188s of that time.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-02 21:35:55 -04:00
Andy Adamson
e23008ec81 NFSv4 reduce attribute requests for open reclaim
We currently make no distinction in attribute requests between normal OPENs
and OPEN with CLAIM_PREVIOUS.  This offers more possibility of failures in
the GETATTR response which foils OPEN reclaim attempts.

Reduce the requested attributes to the bare minimum needed to update the
reclaim open stateid and split nfs4_opendata_to_nfs4_state processing
accordingly.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 18:12:25 -07:00
Trond Myklebust
807d66d802 NFSv4: nfs4_open_done first must check that GETATTR decoded a file type
...before it can check the validity of that file type.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 17:09:00 -07:00
Trond Myklebust
25a1a6211d NFSv4.1: Deal with wraparound when updating the layout "barrier" seqid
...and fix a bug in pnfs_set_layout_stateid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 17:04:33 -07:00
Trond Myklebust
5a65503f3d NFSv4.1: Deal with wraparound issues when updating the layout stateid
...and add a helper function.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 16:47:14 -07:00
Trond Myklebust
038d649376 NFSv4.1: Always set the layout stateid if this is the first layoutget
If the list of layout segments is empty, we must unconditionally set
the layout stateid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 16:38:41 -07:00
Trond Myklebust
251ec410c4 NFSv4.1: Fix another refcount issue in pnfs_find_alloc_layout
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 15:41:05 -07:00
Weston Andros Adamson
ae2bb03236 NFSv4: don't put ACCESS in OPEN compound if O_EXCL
Don't put an ACCESS op in OPEN compound if O_EXCL, because ACCESS
will return permission denied for all bits until close.

Fixes a regression due to commit 6168f62c (NFSv4: Add ACCESS operation to
OPEN compound)

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 14:56:19 -07:00
Weston Andros Adamson
bbd3a8eee8 NFSv4: don't check MAY_WRITE access bit in OPEN
Don't check MAY_WRITE as a newly created file may not have write mode bits,
but POSIX allows the creating process to write regardless.
This is ok because NFSv4 OPEN ops handle write permissions correctly -
the ACCESS in the OPEN compound is to differentiate READ v EXEC permissions.

Fixes a regression due to commit 6168f62c (NFSv4: Add ACCESS operation to
OPEN compound)

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 14:55:41 -07:00
Bryan Schumaker
ddfc4e1712 NFS: Set key construction data for the legacy upcall
This prevents a null pointer dereference when
nfs_idmap_complete_pipe_upcall_locked() calls complete_request_key().

Fixes a regression caused by commit 0cac12023 (NFSv4: Ensure that
idmap_pipe_downcall sanity-checks the downcall data).

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 13:04:09 -07:00
Weston Andros Adamson
fd4835708f NFSv4.1: don't do two EXCHANGE_IDs on mount
Since the addition of NFSv4 server trunking detection the mount context
calls nfs4_proc_exchange_id then schedules the state manager, which also
calls nfs4_proc_exchange_id. Setting the NFS4CLNT_LEASE_CONFIRM bit
makes the state manager skip the unneeded EXCHANGE_ID and continue on
with session creation.

Reported-by: Jorge Mora <mora@netapp.com>
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 11:35:47 -07:00
David Howells
f8aa23a55f KEYS: Use keyring_alloc() to create special keyrings
Use keyring_alloc() to create special keyrings now that it has a permissions
parameter rather than using key_alloc() + key_instantiate_and_link().

Also document and export keyring_alloc() so that modules can use it too.

Signed-off-by: David Howells <dhowells@redhat.com>
2012-10-02 19:24:56 +01:00
Linus Torvalds
437589a74b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace changes from Eric Biederman:
 "This is a mostly modest set of changes to enable basic user namespace
  support.  This allows the code to code to compile with user namespaces
  enabled and removes the assumption there is only the initial user
  namespace.  Everything is converted except for the most complex of the
  filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
  nfs, ocfs2 and xfs as those patches need a bit more review.

  The strategy is to push kuid_t and kgid_t values are far down into
  subsystems and filesystems as reasonable.  Leaving the make_kuid and
  from_kuid operations to happen at the edge of userspace, as the values
  come off the disk, and as the values come in from the network.
  Letting compile type incompatible compile errors (present when user
  namespaces are enabled) guide me to find the issues.

  The most tricky areas have been the places where we had an implicit
  union of uid and gid values and were storing them in an unsigned int.
  Those places were converted into explicit unions.  I made certain to
  handle those places with simple trivial patches.

  Out of that work I discovered we have generic interfaces for storing
  quota by projid.  I had never heard of the project identifiers before.
  Adding full user namespace support for project identifiers accounts
  for most of the code size growth in my git tree.

  Ultimately there will be work to relax privlige checks from
  "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
  root in a user names to do those things that today we only forbid to
  non-root users because it will confuse suid root applications.

  While I was pushing kuid_t and kgid_t changes deep into the audit code
  I made a few other cleanups.  I capitalized on the fact we process
  netlink messages in the context of the message sender.  I removed
  usage of NETLINK_CRED, and started directly using current->tty.

  Some of these patches have also made it into maintainer trees, with no
  problems from identical code from different trees showing up in
  linux-next.

  After reading through all of this code I feel like I might be able to
  win a game of kernel trivial pursuit."

Fix up some fairly trivial conflicts in netfilter uid/git logging code.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
  userns: Convert the ufs filesystem to use kuid/kgid where appropriate
  userns: Convert the udf filesystem to use kuid/kgid where appropriate
  userns: Convert ubifs to use kuid/kgid
  userns: Convert squashfs to use kuid/kgid where appropriate
  userns: Convert reiserfs to use kuid and kgid where appropriate
  userns: Convert jfs to use kuid/kgid where appropriate
  userns: Convert jffs2 to use kuid and kgid where appropriate
  userns: Convert hpfs to use kuid and kgid where appropriate
  userns: Convert btrfs to use kuid/kgid where appropriate
  userns: Convert bfs to use kuid/kgid where appropriate
  userns: Convert affs to use kuid/kgid wherwe appropriate
  userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
  userns: On ia64 deal with current_uid and current_gid being kuid and kgid
  userns: On ppc convert current_uid from a kuid before printing.
  userns: Convert s390 getting uid and gid system calls to use kuid and kgid
  userns: Convert s390 hypfs to use kuid and kgid where appropriate
  userns: Convert binder ipc to use kuids
  userns: Teach security_path_chown to take kuids and kgids
  userns: Add user namespace support to IMA
  userns: Convert EVM to deal with kuids and kgids in it's hmac computation
  ...
2012-10-02 11:11:09 -07:00
Linus Torvalds
033d9959ed Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue changes from Tejun Heo:
 "This is workqueue updates for v3.7-rc1.  A lot of activities this
  round including considerable API and behavior cleanups.

   * delayed_work combines a timer and a work item.  The handling of the
     timer part has always been a bit clunky leading to confusing
     cancelation API with weird corner-case behaviors.  delayed_work is
     updated to use new IRQ safe timer and cancelation now works as
     expected.

   * Another deficiency of delayed_work was lack of the counterpart of
     mod_timer() which led to cancel+queue combinations or open-coded
     timer+work usages.  mod_delayed_work[_on]() are added.

     These two delayed_work changes make delayed_work provide interface
     and behave like timer which is executed with process context.

   * A work item could be executed concurrently on multiple CPUs, which
     is rather unintuitive and made flush_work() behavior confusing and
     half-broken under certain circumstances.  This problem doesn't
     exist for non-reentrant workqueues.  While non-reentrancy check
     isn't free, the overhead is incurred only when a work item bounces
     across different CPUs and even in simulated pathological scenario
     the overhead isn't too high.

     All workqueues are made non-reentrant.  This removes the
     distinction between flush_[delayed_]work() and
     flush_[delayed_]_work_sync().  The former is now as strong as the
     latter and the specified work item is guaranteed to have finished
     execution of any previous queueing on return.

   * In addition to the various bug fixes, Lai redid and simplified CPU
     hotplug handling significantly.

   * Joonsoo introduced system_highpri_wq and used it during CPU
     hotplug.

  There are two merge commits - one to pull in IRQ safe timer from
  tip/timers/core and the other to pull in CPU hotplug fixes from
  wq/for-3.6-fixes as Lai's hotplug restructuring depended on them."

Fixed a number of trivial conflicts, but the more interesting conflicts
were silent ones where the deprecated interfaces had been used by new
code in the merge window, and thus didn't cause any real data conflicts.

Tejun pointed out a few of them, I fixed a couple more.

* 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits)
  workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending()
  workqueue: use cwq_set_max_active() helper for workqueue_set_max_active()
  workqueue: introduce cwq_set_max_active() helper for thaw_workqueues()
  workqueue: remove @delayed from cwq_dec_nr_in_flight()
  workqueue: fix possible stall on try_to_grab_pending() of a delayed work item
  workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback()
  workqueue: use __cpuinit instead of __devinit for cpu callbacks
  workqueue: rename manager_mutex to assoc_mutex
  workqueue: WORKER_REBIND is no longer necessary for idle rebinding
  workqueue: WORKER_REBIND is no longer necessary for busy rebinding
  workqueue: reimplement idle worker rebinding
  workqueue: deprecate __cancel_delayed_work()
  workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()
  workqueue: use mod_delayed_work() instead of __cancel + queue
  workqueue: use irqsafe timer for delayed_work
  workqueue: clean up delayed_work initializers and add missing one
  workqueue: make deferrable delayed_work initializer names consistent
  workqueue: cosmetic whitespace updates for macro definitions
  workqueue: deprecate system_nrt[_freezable]_wq
  workqueue: deprecate flush[_delayed]_work_sync()
  ...
2012-10-02 09:54:49 -07:00
Chuck Lever
c2ccc084eb NFS: nfs41_walk_client_list(): re-lock before iterating
Sparse identified an execution path in nfs41_walk_client_list()
where the nfs_client_lock is not re-acquired before taking the next
loop iteration.

fs/nfs/nfs4client.c:437:9: sparse: context imbalance in
 'nfs41_walk_client_list' - different lock contexts for basic block

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 09:25:02 -07:00
Trond Myklebust
ee314c2a35 NFSv4.1: Handle BAD_STATEID and EXPIRED errors in layoutget
If the layoutget call returns a stateid error, we want to invalidate the
layout stateid, and/or recover the open stateid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 08:34:29 -07:00
Trond Myklebust
6f018efac1 NFSv4.1: bl_pg_init_write should be static
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 08:34:29 -07:00
Stanislav Kinsbursky
4e437e95ae nfs: include internal.h in getroot.h
Sparse warning:
fs/nfs/nfs4getroot.c:11:5: warning: symbol 'nfs4_get_rootfh' was not declared.
Should it be static?

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 08:17:04 -07:00
Stanislav Kinsbursky
22e2430961 nfs: include nfs4_fh.h in nfs4sysctl.c
Sparse warnings:
fs/nfs/nfs4sysctl.c:56:5: warning: symbol 'nfs4_register_sysctl' was not
declared. Should it be static?
fs/nfs/nfs4sysctl.c:64:6: warning: symbol 'nfs4_unregister_sysctl' was not
declared. Should it be static?

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 08:17:03 -07:00
Stanislav Kinsbursky
3dd4f8ef7b nfs: declare nfs_xdev_mount as static
Sparse warning:
fs/nfs/super.c:2517:15: warning: symbol 'nfs_xdev_mount' was not declared.
Should it be static?

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 08:17:03 -07:00
Stanislav Kinsbursky
0b37d20ca2 nfs: declare nfs_callback_tcp_port in header
Sparse warning:
fs/nfs/super.c:2638:16: sparse: symbol 'nfs_callback_tcpport' was not
declared. Should it be static?

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 08:17:02 -07:00
Stanislav Kinsbursky
ca57ccc48f nfs: include NFSv4 header in netns.h
Build error:
fs/nfs/netns.h:27:15: error: 'NFS4_MAX_MINOR_VERSION' undeclared here (not in
a function)

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02 08:17:02 -07:00
Andy Adamson
47b803c8d2 NFSv4.0 reclaim reboot state when re-establishing clientid
We should reclaim reboot state when the clientid is stale.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 18:12:41 -07:00
Trond Myklebust
f9d640f3a4 NFSv4: nfs4_match_clientids is only used by NFSv4.1
Fix another compiler warning.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 16:58:48 -07:00
Trond Myklebust
758201e2c9 NFSv4: Fix the minor version callback channel startup
The current spaghetti code confuses some versions of gcc (and just
looks ugly as hell)! Clean up...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 16:58:39 -07:00
Trond Myklebust
9f62387d6e NFSv4: Fix up a merge conflict between migration and container changes
nfs_callback_tcpport is now per-net_namespace.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 16:17:31 -07:00
Trond Myklebust
2afdfa5a84 Merge branch 'bugfixes' into nfs-for-next 2012-10-01 15:42:14 -07:00
Daniel Walter
7297cb682a nfs: replace strict_strto* with kstrto*
[nfs] replace strict_str* with kstr* variants

 * replace string conversions with newer kstr* functions

Signed-off-by: Daniel Walter <sahne@0x90.at>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:40:05 -07:00
Yanchuan Nian
ee34e13620 NFS: Remove unnecessary semicolons (fs/nfs/client.c)
There are some unnecessary semicolons in function find_nfs_version. Just remove them.

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:40:03 -07:00
Wei Yongjun
4e266229db pnfsblock: use list_move_tail instead of list_del/list_add_tail
Using list_move_tail() instead of list_del() + list_add_tail().

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:39:05 -07:00
Peng Tao
96c9eae638 pnfsblock: fix non-aligned DIO write
For DIO writes, if it is not blocksize aligned, we need to do
internal serialization. It may slow down writers anyway. So we
just bail them out and resend to MDS.

Cc: stable <stable@vger.kernel.org> [since v3.4]
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:38:35 -07:00
Peng Tao
f742dc4a32 pnfsblock: fix non-aligned DIO read
For DIO read, if it is not sector aligned, we should reject it
and resend via MDS. Otherwise there might be data corruption.
Also teach bl_read_pagelist to handle partial page reads for DIO.

Cc: stable <stable@vger.kernel.org> [since v3.4]
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:38:29 -07:00
Peng Tao
fe6e1e8d9f pnfsblock: fix partial page buffer wirte
If applications use flock to protect its write range, generic NFS
will not do read-modify-write cycle at page cache level. Therefore
LD should know how to handle non-sector aligned writes. Otherwise
there will be data corruption.

Cc: stable <stable@vger.kernel.org>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:38:24 -07:00
Peng Tao
5d0e3a004f Revert "pnfsblock: bail out partial page IO"
This reverts commit 159e0561e3, in favor
of a more complete fix to the alignment issue.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:38:16 -07:00
Peng Tao
dc182549d4 NFS41: fix error of setting blocklayoutdriver
After commit e38eb650 (NFS: set_pnfs_layoutdriver() from
nfs4_proc_fsinfo()), set_pnfs_layoutdriver() is called inside
nfs4_proc_fsinfo(), but pnfs_blksize is not set. It causes setting
blocklayoutdriver failure and pnfsblock mount failure.

Cc: stable <stable@vger.kernel.org> [since v3.5]
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:37:39 -07:00
Peng Tao
7acdb02681 NFSv41: fix DIO write_io calculation
pnfs_within_mdsthreshold() is called inside pg_init. We need to set
read_io/write_io before that. Otherwise we fail pnfs_within_mdsthreshold()
and IO goes to MDS.
A simple test case:
dd if=foo of=/mnt/pnfs/bar bs=10M count=1 oflag=direct

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:37:34 -07:00
Chuck Lever
6f2ea7f2a3 NFS: Add nfs4_unique_id boot parameter
An optional boot parameter is introduced to allow client
administrators to specify a string that the Linux NFS client can
insert into its nfs_client_id4 id string, to make it both more
globally unique, and to ensure that it doesn't change even if the
client's nodename changes.

If this boot parameter is not specified, the client's nodename is
used, as before.

Client installation procedures can create a unique string (typically,
a UUID) which remains unchanged during the lifetime of that client
instance.  This works just like creating a UUID for the label of the
system's root and boot volumes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:33:33 -07:00
Chuck Lever
05f4c350ee NFS: Discover NFSv4 server trunking when mounting
"Server trunking" is a fancy named for a multi-homed NFS server.
Trunking might occur if a client sends NFS requests for a single
workload to multiple network interfaces on the same server.  There
are some implications for NFSv4 state management that make it useful
for a client to know if a single NFSv4 server instance is
multi-homed.  (Note this is only a consideration for NFSv4, not for
legacy versions of NFS, which are stateless).

If a client cares about server trunking, no NFSv4 operations can
proceed until that client determines who it is talking to.  Thus
server IP trunking discovery must be done when the client first
encounters an unfamiliar server IP address.

The nfs_get_client() function walks the nfs_client_list and matches
on server IP address.  The outcome of that walk tells us immediately
if we have an unfamiliar server IP address.  It invokes
nfs_init_client() in this case.  Thus, nfs4_init_client() is a good
spot to perform trunking discovery.

Discovery requires a client to establish a fresh client ID, so our
client will now send SETCLIENTID or EXCHANGE_ID as the first NFS
operation after a successful ping, rather than waiting for an
application to perform an operation that requires NFSv4 state.

The exact process for detecting trunking is different for NFSv4.0 and
NFSv4.1, so a minorversion-specific init_client callout method is
introduced.

CLID_INUSE recovery is important for the trunking discovery process.
CLID_INUSE is a sign the server recognizes the client's nfs_client_id4
id string, but the client is using the wrong principal this time for
the SETCLIENTID operation.  The SETCLIENTID must be retried with a
series of different principals until one works, and then the rest of
trunking discovery can proceed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:33:33 -07:00
Chuck Lever
e984a55a74 NFS: Use the same nfs_client_id4 for every server
Currently, when identifying itself to NFS servers, the Linux NFS
client uses a unique nfs_client_id4.id string for each server IP
address it talks with.  For example, when client A talks to server X,
the client identifies itself using a string like "AX".  The
requirements for these strings are specified in detail by RFC 3530
(and bis).

This form of client identification presents a problem for Transparent
State Migration.  When client A's state on server X is migrated to
server Y, it continues to be associated with string "AX."  But,
according to the rules of client string construction above, client
A will present string "AY" when communicating with server Y.

Server Y thus has no way to know that client A should be associated
with the state migrated from server X.  "AX" is all but abandoned,
interfering with establishing fresh state for client A on server Y.

To support transparent state migration, then, NFSv4.0 clients must
instead use the same nfs_client_id4.id string to identify themselves
to every NFS server; something like "A".

Now a client identifies itself as "A" to server X.  When a file
system on server X transitions to server Y, and client A identifies
itself as "A" to server Y, Y will know immediately that the state
associated with "A," whether it is native or migrated, is owned by
the client, and can merge both into a single lease.

As a pre-requisite to adding support for NFSv4 migration to the Linux
NFS client, this patch changes the way Linux identifies itself to NFS
servers via the SETCLIENTID (NFSv4 minor version 0) and EXCHANGE_ID
(NFSv4 minor version 1) operations.

In addition to removing the server's IP address from nfs_client_id4,
the Linux NFS client will also no longer use its own source IP address
as part of the nfs_client_id4 string.  On multi-homed clients, the
value of this address depends on the address family and network
routing used to contact the server, thus it can be different for each
server.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:33:33 -07:00
Chuck Lever
896526174c NFS: Introduce "migration" mount option
Currently, the Linux client uses a unique nfs_client_id4.id string
when identifying itself to distinct NFS servers.

To support transparent state migration, the Linux client will have to
use the same nfs_client_id4 string for all servers it communicates
with (also known as the "uniform client string" approach).  Otherwise
NFS servers can not recognize that open and lock state need to be
merged after a file system transition.

Unfortunately, there are some NFSv4.0 servers currently in the field
that do not tolerate the uniform client string approach.

Thus, by default, our NFSv4.0 mounts will continue to use the current
approach, and we introduce a mount option that switches them to use
the uniform model.  Client administrators must identify which servers
can be mounted with this option.  Eventually most NFSv4.0 servers will
be able to handle the uniform approach, and we can change the default.

The first mount of a server controls the behavior for all subsequent
mounts for the lifetime of that set of mounts of that server.  After
the last mount of that server is gone, the client erases the data
structure that tracks the lease.  A subsequent lease may then honor
a different "migration" setting.

This patch adds only the infrastructure for parsing the new mount
option.  Support for uniform client strings is added in a subsequent
patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:33:33 -07:00
Chuck Lever
ba9b584c1d SUNRPC: Introduce rpc_clone_client_set_auth()
An ULP is supposed to be able to replace a GSS rpc_auth object with
another GSS rpc_auth object using rpcauth_create().  However,
rpcauth_create() in 3.5 reliably fails with -EEXIST in this case.
This is because when gss_create() attempts to create the upcall pipes,
sometimes they are already there.  For example if a pipe FS mount
event occurs, or a previous GSS flavor was in use for this rpc_clnt.

It turns out that's not the only problem here.  While working on a
fix for the above problem, we noticed that replacing an rpc_clnt's
rpc_auth is not safe, since dereferencing the cl_auth field is not
protected in any way.

So we're deprecating the ability of rpcauth_create() to switch an
rpc_clnt's security flavor during normal operation.  Instead, let's
add a fresh API that clones an rpc_clnt and gives the clone a new
flavor before it's used.

This makes immediate use of the new __rpc_clone_client() helper.

This can be used in a similar fashion to rpcauth_create() when a
client is hunting for the correct security flavor.  Instead of
replacing an rpc_clnt's security flavor in a loop, the ULP replaces
the whole rpc_clnt.

To fix the -EEXIST problem, any ULP logic that relies on replacing
an rpc_clnt's rpc_auth with rpcauth_create() must be changed to use
this API instead.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:33:33 -07:00
Chuck Lever
ffe5a83005 NFS: Slow down state manager after an unhandled error
If the state manager thread is not actually able to fully recover from
some situation, it wakes up waiters, who kick off a new state manager
thread.  Quite often the fresh invocation of the state manager is just
as successful.

This results in a livelock as the client dumps thousands of NFS
requests a second on the network in a vain attempt to recover.  Not
very friendly.

To mitigate this situation, add a delay in the state manager after
an unhandled error, so that the client sends just a few requests
every second in this case.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:31:51 -07:00
Chuck Lever
8cb7f74eee NFS: nfs_parsed_mount_options can use unsigned int
fs/nfs/super.c: In function ‘nfs_compare_remount_data’:
fs/nfs/super.c:2042:18: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2043:18: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2044:20: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2046:21: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2047:21: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2048:21: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2049:21: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2050:18: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]

Seen with gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:31:41 -07:00
Stanislav Kinsbursky
1dc42e04b7 NFS: add debug messages to callback down function
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:26:06 -07:00
Stanislav Kinsbursky
b3d19c5172 NFS: callback per-net usage counting introduced
This patch also introduces refcount-aware nfs_callback_down_net() wrapper for
svc_shutdown_net().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:25:57 -07:00
Stanislav Kinsbursky
29dcc16a8e NFS: make nfs_callback_tcpport6 per network context
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:25:51 -07:00
Stanislav Kinsbursky
bbe0a3aa4e NFS: make nfs_callback_tcpport per network context
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:25:47 -07:00
Stanislav Kinsbursky
23c20ecd44 NFS: callback up - users counting cleanup
Usage coutner now increased only is the service was started sccessfully.
Even if service is running already, then goto is not required anymore, because
service creation and start will be skipped.
With this patch code looks clearer.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:25:38 -07:00
Stanislav Kinsbursky
8e24614443 NFS: callback service start function introduced
This is just a code move, which from my POW makes code looks better.
I.e. now on start we have 3 different stages:
1) Service creation.
2) Service per-net data allocation.
3) Service start.

Patch also renames goto label "out_err:" into "err_start:" to reflect new
changes.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:25:35 -07:00
Stanislav Kinsbursky
691c457ae6 NFS: callback up - transport backchannel cleanup
No need to assign transports backchannel server explicitly in
nfs41_callback_up() -  there is nfs_callback_bc_serv() function for this.
By using it, nfs4_callback_up() and nfs41_callback_up() can be called without
transport argument.

Note: service have to be passed to nfs_callback_bc_serv() instead of callback,
since callback link can be uninitialized.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:25:29 -07:00
Stanislav Kinsbursky
c946556b87 NFS: move per-net callback thread initialization to nfs_callback_up_net()
v4:
1) Callback transport creation routine selection by version simlified.

This new function in now called before nfs_minorversion_callback_svc_setup()).

Also few small changes:
1) current network namespace in nfs_callback_up() was replaced by transport net.
2) svc_shutdown_net() was moved prior to callback usage counter decrement
(because in case of per-net data allocation faulure svc_shutdown_net() have to
be skipped).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:25:24 -07:00
Stanislav Kinsbursky
dd018428dc NFS: callback service creation function introduced
This function creates service if it's not exist, or increase usage counter of
the existent, and returns pointer to it.
Usage counter will be droppepd by svc_destroy() later in nfs_callback_up().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:25:11 -07:00
Stanislav Kinsbursky
c8ceb4124b NFS: pass net to nfs_callback_down()
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:24:51 -07:00
Weston Andros Adamson
6168f62cbd NFSv4: Add ACCESS operation to OPEN compound
The OPEN operation has no way to differentiate an open for read and an
open for execution - both look like read to the server. This allowed
users to read files that didn't have READ access but did have EXEC access,
which is obviously wrong.

This patch adds an ACCESS call to the OPEN compound to handle the
difference between OPENs for reading and execution.  Since we're going
through the trouble of calling ACCESS, we check all possible access bits
and cache the results hopefully avoiding an ACCESS call in the future.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:20:11 -07:00
Bryan Schumaker
57a51048da NFS: Use kzalloc() instead of kmalloc() in the idmapper
This will allocate memory that has already been zeroed, allowing us to
remove the memset later on.

Signed-off-by: Bryan Schumaker <bjchuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:18:44 -07:00
Bryan Schumaker
6938867edb NFS: Remove bad delegations during open recovery
I put the client into an open recovery loop by:
	Client: Open file
		read half
	Server: Expire client (echo 0 > /sys/kernel/debug/nfsd/forget_clients)
	Client: Drop vm cache (echo 3 > /proc/sys/vm/drop_caches)
		finish reading file

This causes a loop because the client never updates the nfs4_state after
discovering that the delegation is invalid.  This means it will keep
trying to read using the bad delegation rather than attempting to re-open
the file.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
CC: stable@vger.kernel.org [3.4+]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:17:25 -07:00
Bryan Schumaker
fcb6d9c6b7 NFS: Always use the open stateid when checking for expired opens
If we are reading through a delegation, and the delegation is OK then
state->stateid will still point to a delegation stateid and not an open
stateid.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-01 15:17:17 -07:00
Linus Torvalds
99dbb1632f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull the trivial tree from Jiri Kosina:
 "Tiny usual fixes all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
  doc: fix old config name of kprobetrace
  fs/fs-writeback.c: cleanup riteback_sb_inodes kerneldoc
  btrfs: fix the commment for the action flags in delayed-ref.h
  btrfs: fix trivial typo for the comment of BTRFS_FREE_INO_OBJECTID
  vfs: fix kerneldoc for generic_fh_to_parent()
  treewide: fix comment/printk/variable typos
  ipr: fix small coding style issues
  doc: fix broken utf8 encoding
  nfs: comment fix
  platform/x86: fix asus_laptop.wled_type module parameter
  mfd: printk/comment fixes
  doc: getdelays.c: remember to close() socket on error in create_nl_socket()
  doc: aliasing-test: close fd on write error
  mmc: fix comment typos
  dma: fix comments
  spi: fix comment/printk typos in spi
  Coccinelle: fix typo in memdup_user.cocci
  tmiofb: missing NULL pointer checks
  tools: perf: Fix typo in tools/perf
  tools/testing: fix comment / output typos
  ...
2012-10-01 09:06:36 -07:00
Trond Myklebust
849b286fd0 NFSv4.1: nfs4_proc_layoutreturn must always drop the plh_block_lgets count
Currently it does not do so if the RPC call failed to start. Fix is to
move the decrement of plh_block_lgets into nfs4_layoutreturn_release.

Also remove a redundant test of task->tk_status in nfs4_layoutreturn_done:
if lrp->res.lrs_present is set, then obviously the RPC call succeeded.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:18 -04:00
Trond Myklebust
65857d5768 NFSv4.1: _pnfs_return_layout() shouldn't invalidate the layout on failure
Failure of the layoutreturn allocation fails is not a good reason to
mark the pnfs_layout_hdr as having failed a layoutget or i/o. Just
exit cleanly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:18 -04:00
Trond Myklebust
e5929f3cff NFSv4.1: Remove the NFS_LAYOUT_RETURNED state
It serves no purpose that the test for whether or not we have valid
layout segments doesn't already serve.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:17 -04:00
Trond Myklebust
173f77e9c5 NFSv4.1: Clear NFS_LAYOUT_BULK_RECALL when the layout segments are freed
Once all the affected layout segments have been freed up, clear the
NFS_LAYOUT_BULK_RECALL flag so that we can reuse the pnfs_layout_hdr

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:17 -04:00
Trond Myklebust
8006bfba36 NFSv4.1: Get rid of the NFS_LAYOUT_DESTROYED state
We already have a mechanism for blocking LAYOUTGET by means of the
plh_block_lgets counter. The only "service" that NFS_LAYOUT_DESTROYED
provides at this point is to block layoutget once the layout segment
list is empty, which basically means that you have to wait until
the pnfs_layout_hdr is destroyed before you can do pNFS on that file
again.

This patch enables the reuse of the pnfs_layout_hdr if the layout
segment list is empty.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:16 -04:00
Trond Myklebust
579342785f NFSv4.1: Remove unused 'default allocation' for pnfs_alloc_layout_hdr()
...and ditto for pnfs_free_layout_hdr()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:16 -04:00
Trond Myklebust
a9136d4914 NFSv4.1: Get rid of pNFS spin lock debugging asserts...
These are all in static declared functions that are called only once.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:16 -04:00
Trond Myklebust
8f0d27dc5d NFSv4.1: Balance pnfs_layout_hdr refcount in pnfs_layout_(insert|remove)_lseg
Ensure that the reference count for pnfs_layout_hdr reverts to the
original value after a call to pnfs_layout_remove_lseg().

Note that the caller is expected to hold a reference to the struct
pnfs_layout_hdr.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:15 -04:00
Trond Myklebust
905ca191cf NFSv4.1: Clean up pnfs_put_lseg()
There is no longer a need to use pnfs_free_lseg_list(). Just call
pnfs_free_lseg() directly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:15 -04:00
Trond Myklebust
9c6263819f NFSv4.1: Clean up the removal of pnfs_layout_hdr from the server list
Move the code into pnfs_free_layout_hdr(), and add checks to
get_layout_by_fh_locked to ensure that they don't reference a layout
that is being freed.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:14 -04:00
Trond Myklebust
6622c3ea05 NFSv4.1: Free the pnfs_layout_hdr outside the inode->i_lock
None of the existing pNFS layout drivers seem to require the inode
to be locked while they free the layout header.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:14 -04:00
Trond Myklebust
01d39ce82b NFSv4.1: Remove redundant reference to the pnfs_layout_hdr
Each layout segment already holds a reference to the pnfs_layout_hdr,
so there is no need to hold an extra reference that is released once
the last layout segment is freed.

Ensure that pnfs_find_alloc_layout() always returns a reference
to the pnfs_layout_hdr, which will be matched by the final call to
pnfs_put_layout_hdr() in pnfs_update_layout().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:13 -04:00
Trond Myklebust
57036a3776 NFSv4.1: Rename the pnfs_put_lseg_common to pnfs_layout_remove_lseg
The latter name is more descriptive of the actual function.
Also rename pnfs_insert_layout to pnfs_layout_insert_lseg.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:13 -04:00
Trond Myklebust
bb346f6397 NFSv4.1: reset the inode MDS threshold counters on layout destruction
Instead of resetting the inode MDS threshold counters when we mark
the layout for destruction, do it as part of freeing the layout.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:12 -04:00
Trond Myklebust
965938b83b NFSv4.1: Get rid of pNFS layout state "NFS_LAYOUT_INVALID"
In all cases where we set NFS_LAYOUT_INVALID, we also set NFS_LAYOUT_DESTROYED.
Furthermore, in all cases where we test for NFS_LAYOUT_INVALID, we should
also be testing for NFS_LAYOUT_DESTROYED, since the latter means that
we hold no valid layout segments.
Ergo the two are redundant.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:12 -04:00
Trond Myklebust
1f7977c136 NFSv4.1: Simplify the pNFS return-on-close code
Confine it to the nfs4_do_close() code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:12 -04:00
Trond Myklebust
7fdab069b7 NFSv4.1: Fix a race in the pNFS return-on-close code
If we sleep after dropping the inode->i_lock, then we are no longer
atomic with respect to the rpc_wake_up() call in pnfs_layout_remove_lseg().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:11 -04:00
Trond Myklebust
115ce575cb NFSv4.1: pnfs_layout_io_set_failed must clear invalid lsegs
If pnfs_layout_io_test_failed() authorises a retry of the failed layoutgets,
we should clear the existing layout segments so that we start afresh. Do
this in pnfs_layout_io_set_failed().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:11 -04:00
Trond Myklebust
3e62121493 NFSv4.1: Don't drop the pnfs_layout_hdr after a layoutget failure
We want to cache the pnfs_layout_hdr after a layoutget or i/o
failure so that pnfs_update_layout() can find it and know when
it is time to retry.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:10 -04:00
Trond Myklebust
830ffb5657 NFSv4.1: Fix a reference leak in pnfs_update_layout
If we exit after the call to pnfs_find_alloc_layout(), we have to ensure
that we put the struct pnfs_layout_hdr.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:10 -04:00
Trond Myklebust
1dfed2737d NFSv4.1: pNFS data servers may be temporarily offline
In cases where the pNFS data server is just temporarily out of service,
we want to mark it as such, and then try again later. Typically that will
be in cases of network connection errors etc.
This patch allows us to mark the devices as being "unavailable" for such
transient errors, and will make them available for retries after a
2 minute timeout period.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:09 -04:00
Trond Myklebust
25c7533357 NFSv4.1: Retry pNFS after a 2 minute timeout
If we had to fall back to read/write through MDS, then assume that we should
retry pNFS after a suitable timeout period.
The following patch sets a timeout of 2 minutes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:09 -04:00
Trond Myklebust
b9e028fd89 NFSv4.1: Add helpers for setting/reading the I/O fail bit
...and make them local to the pnfs.c file.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:09 -04:00
Trond Myklebust
f86bbcf85d NFSv4.1: Replace dprintk() in pnfs_update_layout with something less buggy
Dereferencing nfsi->layout in order to read plh_flags without holding
a spin lock is bug prone. Furthermore, the dprintk() tells you nothing
about whether or not the call succeeded.
Replace it with something that tells you about whether or not a valid
layout segment was returned for the inode in question.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:08 -04:00
Trond Myklebust
78e4e05c64 NFSv4.1: Replace get_device_info() with filelayout_get_device_info()
Fix the namespace pollution issue.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:08 -04:00
Trond Myklebust
9369a431bc NFSv4.1: Cleanup; add "pnfs_" prefix to put_lseg() and get_lseg()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:07 -04:00
Trond Myklebust
70c3bd2bdf NFSv4.1: Cleanup; add "pnfs_" prefix to get_layout_hdr() and put_layout_hdr()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:07 -04:00
Trond Myklebust
49a85061b0 NFSv4.1: Cleanup add a "pnfs_" prefix to mark_matching_lsegs_invalid
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:06 -04:00
Trond Myklebust
a0b0a6e39b NFS: Clean up the pNFS layoutget interface
Ensure that we do return errors from nfs4_proc_layoutget() and that we
don't mark the layout as having failed if the error was due to a
signal or resource problem on the client side.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:06 -04:00
Trond Myklebust
dcfc4f2546 NFS: Write the entire file if a server reboot occurs during fsync()
This is to ensure that we don't clear the NFS_CONTEXT_RESEND_WRITES
flag while there are still writes that haven't been resent.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:05 -04:00
Trond Myklebust
05990d1bf2 NFS: Fix fdatasync/fsync() when confronted with a server reboot
If the server reboots before it can commit the unstable writes to disk,
then nfs_commit_release_pages() will detect this when it compares the
verifier returned by COMMIT to the one returned by WRITE. When this
happens, the client needs to resend those writes in order to guarantee
that they make it to stable storage.

This patch adds a signalling mechanism to notify fsync() that it
needs to retry all writes before it can exit.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:05 -04:00
Trond Myklebust
795a88c968 NFSv4: Convert the nfs4_lock_state->ls_flags to a bit field
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:04 -04:00
Trond Myklebust
2a369153c8 NFS: Clean up helper function nfs4_select_rw_stateid()
We want to be able to pass on the information that the page was not
dirtied under a lock. Instead of adding a flag parameter, do this
by passing a pointer to a 'struct nfs_lock_owner' that may be NULL.

Also reuse this structure in struct nfs_lock_context to carry the
fl_owner_t and pid_t.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:04 -04:00
Trond Myklebust
b3c54de6f8 NFS: Convert nfs_get_lock_context to return an ERR_PTR on failure
We want to be able to distinguish between allocation failures, and
the case where the lock context is not needed (because there are no
locks).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:03 -04:00
Trond Myklebust
0cac120233 NFSv4: Ensure that idmap_pipe_downcall sanity-checks the downcall data
Use the idmapper upcall data to verify that the legacy idmapper daemon
is indeed responding to an upcall that we sent.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
2012-09-28 13:43:34 -04:00
Trond Myklebust
e9ab41b620 NFSv4: Clean up the legacy idmapper upcall
Replace the BUG_ON(idmap->idmap_key_cons != NULL) with a
WARN_ON_ONCE(). Then get rid of the ACCESS_ONCE(idmap->idmap_key_cons).

Then add helper functions for starting, finishing and aborting the
legacy upcall.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
2012-09-28 13:42:41 -04:00
Trond Myklebust
0e24d849c4 NFSv4: Remove BUG_ON() and ACCESS_ONCE() calls in the idmapper
The use of ACCESS_ONCE() is wrong, since the various routines that set/clear
idmap->idmap_key_cons should be strictly ordered w.r.t. each other, and
the idmap->idmap_mutex ensures that only one thread at a time may be in
an upcall situation.

Also replace the BUG_ON()s with WARN_ON_ONCE() where appropriate.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 12:27:11 -04:00
Trond Myklebust
13fe4ba1b6 NFSv4.1: decode_getdeviceinfo should check xdr_read_pages() return value
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-26 12:43:10 -04:00
NeilBrown
62d98c9354 NFS4: avoid underflow when converting error to pointer.
In nfs4_create_sec_client, 'flavor' can hold a negative error
code (returned from nfs4_negotiate_security), even though it
is an 'enum' and hence unsigned.

The code is careful to cast it to an (int) before testing if it
is negative, however it doesn't cast to an (int) before calling
ERR_PTR.

On a machine where "void*" is larger than "int", this results in
the unsigned equivalent of -1 (e.g. 0xffffffff) being converted
to a pointer.  Subsequent code determines that this is not
negative, and so  dereferences it with predictable results.

So: cast 'flavor' to a (signed) int before passing to ERR_PTR.

cc: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-25 10:38:54 -04:00
Wei Yongjun
e8d920c58d NFS: fix the return value check by using IS_ERR
In case of error, the function rpcauth_create() returns ERR_PTR()
and never returns NULL pointer. The NULL test in the return value
check should be replaced with IS_ERR().

dpatch engine is used to auto generated this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-25 10:36:37 -04:00
Eric W. Biederman
5f3a4a28ec userns: Pass a userns parameter into posix_acl_to_xattr and posix_acl_from_xattr
- Pass the user namespace the uid and gid values in the xattr are stored
   in into posix_acl_from_xattr.

 - Pass the user namespace kuid and kgid values should be converted into
   when storing uid and gid values in an xattr in posix_acl_to_xattr.

- Modify all callers of posix_acl_from_xattr and posix_acl_to_xattr to
  pass in &init_user_ns.

In the short term this change is not strictly needed but it makes the
code clearer.  In the longer term this change is necessary to be able to
mount filesystems outside of the initial user namespace that natively
store posix acls in the linux xattr format.

Cc: Theodore Tso <tytso@mit.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-18 01:01:35 -07:00
Trond Myklebust
7b281ee026 NFS: fsync() must exit with an error if page writeback failed
We need to ensure that if the call to filemap_write_and_wait_range()
fails, then we report that error back to the application.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-11 15:38:32 -04:00
Weston Andros Adamson
01913b49cf NFS: return error from decode_getfh in decode open
If decode_getfh failed, nfs4_xdr_dec_open would return 0 since the last
decode_* call must have succeeded.

Cc: stable@vger.kernel.org
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-06 16:01:33 -04:00
Trond Myklebust
1f1ea6c2d9 NFSv4: Fix buffer overflow checking in __nfs4_get_acl_uncached
Pass the checks made by decode_getacl back to __nfs4_get_acl_uncached
so that it knows if the acl has been truncated.

The current overflow checking is broken, resulting in Oopses on
user-triggered nfs4_getfacl calls, and is opaque to the point
where several attempts at fixing it have failed.
This patch tries to clean up the code in addition to fixing the
Oopses by ensuring that the overflow checks are performed in
a single place (decode_getacl). If the overflow check failed,
we will still be able to report the acl length, but at least
we will no longer attempt to cache the acl or copy the
truncated contents to user space.

Reported-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Sachin Prabhu <sprabhu@redhat.com>
2012-09-06 11:11:53 -04:00
Trond Myklebust
21f498c2f7 NFSv4: Fix range checking in __nfs4_get_acl_uncached and __nfs4_proc_set_acl
Ensure that the user supplied buffer size doesn't cause us to overflow
the 'pages' array.

Also fix up some confusion between the use of PAGE_SIZE and
PAGE_CACHE_SIZE when calculating buffer sizes. We're not using
the page cache for anything here.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-04 14:52:43 -04:00
Trond Myklebust
872ece86ea NFS: Fix a problem with the legacy binary mount code
Apparently, am-utils is still using the legacy binary mountdata interface,
and is having trouble parsing /proc/mounts due to the 'port=' field being
incorrectly set.

The following patch should fix up the regression.

Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-09-04 14:52:43 -04:00
Trond Myklebust
c3f52af3e0 NFS: Fix the initialisation of the readdir 'cookieverf' array
When the NFS_COOKIEVERF helper macro was converted into a static
inline function in commit 99fadcd764 (nfs: convert NFS_*(inode)
helpers to static inline), we broke the initialisation of the
readdir cookies, since that depended on doing a memset with an
argument of 'sizeof(NFS_COOKIEVERF(inode))' which therefore
changed from sizeof(be32 cookieverf[2]) to sizeof(be32 *).

At this point, NFS_COOKIEVERF seems to be more of an obfuscation
than a helper, so the best thing would be to just get rid of it.

Also see: https://bugzilla.kernel.org/show_bug.cgi?id=46881

Reported-by: Andi Kleen <andi@firstfloor.org>
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-09-04 14:52:42 -04:00
Peter Meerwald
1856b225ca nfs: comment fix
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-09-01 10:09:44 -07:00
J. Bruce Fields
5b444cc9a4 svcrpc: remove handling of unknown errors from svc_recv
svc_recv() returns only -EINTR or -EAGAIN.  If we really want to worry
about the case where it has a bug that causes it to return something
else, we could stick a WARN() in svc_recv.  But it's silly to require
every caller to have all this boilerplate to handle that case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 17:42:00 -04:00
Trond Myklebust
0866004304 NFSv3: Ensure that do_proc_get_root() reports errors correctly
If the rpc call to NFS3PROC_FSINFO fails, then we need to report that
error so that the mount fails. Otherwise we can end up with a
superblock with completely unusable values for block sizes, maxfilesize,
etc.

Reported-by: Yuanming Chen <hikvision_linux@163.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-08-20 12:52:42 -04:00
Trond Myklebust
7653f6ff4e NFSv4: Ensure that nfs4_alloc_client cleans up on error.
Any pointer that was allocated through nfs_alloc_client() needs to be
freed via a call to nfs_free_client().

Reported-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-08-20 12:12:29 -04:00
Bryan Schumaker
12dfd08055 NFS: return -ENOKEY when the upcall fails to map the name
This allows the normal error-paths to handle the error, rather than
making a special call to complete_request_key() just for this instance.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Tested-by: William Dauchy <wdauchy@gmail.com>
Cc: stable@vger.kernel.org [>= 3.4]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-08-16 17:20:06 -04:00
Bryan Schumaker
c5066945b7 NFS: Clear key construction data if the idmap upcall fails
idmap_pipe_downcall already clears this field if the upcall succeeds,
but if it fails (rpc.idmapd isn't running) the field will still be set
on the next call triggering a BUG_ON().  This patch tries to handle all
possible ways that the upcall could fail and clear the idmap key data
for each one.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Tested-by: William Dauchy <wdauchy@gmail.com>
Cc: stable@vger.kernel.org [>= 3.4]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-08-16 17:20:02 -04:00
Trond Myklebust
cff298c721 NFSv4: Don't use private xdr_stream fields in decode_getacl
Instead of using the private field xdr->p from struct xdr_stream,
use the public xdr_stream_pos().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-08-16 16:15:51 -04:00
Trond Myklebust
b291f1b1c8 NFSv4: Fix the acl cache size calculation
Currently, we do not take into account the size of the 16 byte
struct nfs4_cached_acl header, when deciding whether or not we should
cache the acl data.  Consequently, we will end up allocating an
8k buffer in order to fit a maximum size 4k acl.

This patch adjusts the calculation so that we limit the cache size
to 4k for the acl header+data.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-08-16 16:15:50 -04:00
Trond Myklebust
519d3959e3 NFSv4: Fix pointer arithmetic in decode_getacl
Resetting the cursor xdr->p to a previous value is not a safe
practice: if the xdr_stream has crossed out of the initial iovec,
then a bunch of other fields would need to be reset too.

Fix this issue by using xdr_enter_page() so that the buffer gets
page aligned at the bitmap _before_ we decode it.

Also fix the confusion of the ACL length with the page buffer length
by not adding the base offset to the ACL length...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-08-16 16:15:50 -04:00
bjschuma@gmail.com
425e776d93 NFS: Alias the nfs module to nfs4
This allows distros to remove the line from their modprobe
configuration.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-08-16 16:15:49 -04:00
bjschuma@gmail.com
1ae811ee27 NFS: Fix a regression when loading the NFS v4 module
Some systems have a modprobe.d/nfs.conf file that sets an nfs4 alias
pointing to nfs.ko, rather than nfs4.ko.  This can prevent the v4 module
from loading on mount, since the kernel sees that something named "nfs4"
has already been loaded.  To work around this, I've renamed the modules
to "nfsv2.ko" "nfsv3.ko" and "nfsv4.ko".

I also had to move the nfs4_fs_type back to nfs.ko to ensure that `mount
-t nfs4` still works.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-08-16 16:15:49 -04:00
Tejun Heo
41f63c5359 workqueue: use mod_delayed_work() instead of cancel + queue
Convert delayed_work users doing cancel_delayed_work() followed by
queue_delayed_work() to mod_delayed_work().

Most conversions are straight-forward.  Ones worth mentioning are,

* drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always
  use mod_delayed_work() and cancel loop in
  edac_mc_reset_delay_period() is dropped.

* drivers/platform/x86/thinkpad_acpi.c: No need to remember whether
  watchdog is active or not.  @fan_watchdog_active and related code
  dropped.

* drivers/power/charger-manager.c: Seemingly a lot of
  delayed_work_pending() abuse going on here.
  [delayed_]work_pending() are unsynchronized and racy when used like
  this.  I converted one instance in fullbatt_handler().  Please
  conver the rest so that it invokes workqueue APIs for the intended
  target state rather than trying to game work item pending state
  transitions.  e.g. if timer should be modified - call
  mod_delayed_work(), canceled - call cancel_delayed_work[_sync]().

* drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling()
  simplified.  Note that round_jiffies() calls in this function are
  meaningless.  round_jiffies() work on absolute jiffies not delta
  delay used by delayed_work.

v2: Tomi pointed out that __cancel_delayed_work() users can't be
    safely converted to mod_delayed_work().  They could be calling it
    from irq context and if that happens while delayed_work_timer_fn()
    is running, it could deadlock.  __cancel_delayed_work() users are
    dropped.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Anton Vorontsov <cbouatmailru@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Roland Dreier <roland@kernel.org>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
2012-08-13 16:27:37 -07:00
Trond Myklebust
47fbf7976e NFSv4.1: Remove a bogus BUG_ON() in nfs4_layoutreturn_done
Ever since commit 0a57cdac3f (NFSv4.1 send layoutreturn to fence
disconnected data server) we've been sending layoutreturn calls
while there is potentially still outstanding I/O to the data
servers. The reason we do this is to avoid races between replayed
writes to the MDS and the original writes to the DS.

When this happens, the BUG_ON() in nfs4_layoutreturn_done can
be triggered because it assumes that we would never call
layoutreturn without knowing that all I/O to the DS is
finished. The fix is to remove the BUG_ON() now that the
assumptions behind the test are obsolete.

Reported-by: Boaz Harrosh <bharrosh@panasas.com>
Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>=3.5]
2012-08-08 16:03:13 -04:00
Boaz Harrosh
7de6e28417 pnfs-obj: Better IO pattern in case of unaligned offset
Depending on layout and ARCH, ORE has some limits on max IO sizes
which is communicated on (what else) ore_layout->max_io_length,
which is always stripe aligned.
This was considered as the pg_test boundary for splitting and starting
a new IO.

But in the case of a long IO where the start offset is not aligned
what would happen is that both end of IO[N] and start of IO[N+1]
would be unaligned, causing each IO boundary parity unit to be
calculated and written twice.

So what we do in this patch is split the very start of an unaligned
IO, up to a stripe boundary, and then next IO's can continue fully
aligned til the end.

We might be sacrificing the case where the full unaligned IO would
fit within a single max_io_length, but the sacrifice is well worth
the elimination of double calculation and parity units IO.
Actually the sacrificing is marginal and is almost unmeasurable.

TODO:
	If we know the total expected linear segment that will
	be received, at pg_init, we could use that information
	in many places:
	1. blocks-layout get_layout write segment size
	2. Better mds-threshold
	3. In above situation for a better clean split

	I will do this in future submission.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-08-02 17:42:51 -04:00
Peng Tao
f616638409 NFS41: add pg_layout_private to nfs_pageio_descriptor
To allow layout driver to pass private information around
pg_init/pg_doio.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-08-02 17:41:18 -04:00
Idan Kedar
21d1f58aed pnfs: nfs4_proc_layoutget returns void
since the only user of nfs4_proc_layoutget is send_layoutget, which
ignores its return value, there is no reason to return any value.

Signed-off-by: Idan Kedar <idank@tonian.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-08-02 17:39:06 -04:00
Idan Kedar
8554116e17 pnfs: defer release of pages in layoutget
we have encountered a bug whereby reading a lot of files (copying
fedora's /bin) from a pNFS mount and hitting Ctrl+C in the middle caused
a general protection fault in xdr_shrink_bufhead. this function is
called when decoding the response from LAYOUTGET. the decoding is done
by a worker thread, and the caller of LAYOUTGET waits for the worker
thread to complete.

hitting Ctrl+C caused the synchronous wait to end and the next thing the
caller does is to free the pages, so when the worker thread calls
xdr_shrink_bufhead, the pages are gone. therefore, the cleanup of these
pages has been moved to nfs4_layoutget_release.

Signed-off-by: Idan Kedar <idank@tonian.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-08-02 17:38:54 -04:00
Jeff Layton
3dd4765fce nfs: tear down caches in nfs_init_writepagecache when allocation fails
...and ensure that we tear down the nfs_commit_data cache too when
unloading the module.

Cc: Bryan Schumaker <bjschuma@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-08-02 17:36:07 -04:00
Linus Torvalds
ac694dbdbc Merge branch 'akpm' (Andrew's patch-bomb)
Merge Andrew's second set of patches:
 - MM
 - a few random fixes
 - a couple of RTC leftovers

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits)
  rtc/rtc-88pm80x: remove unneed devm_kfree
  rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
  mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
  tmpfs: distribute interleave better across nodes
  mm: remove redundant initialization
  mm: warn if pg_data_t isn't initialized with zero
  mips: zero out pg_data_t when it's allocated
  memcg: gix memory accounting scalability in shrink_page_list
  mm/sparse: remove index_init_lock
  mm/sparse: more checks on mem_section number
  mm/sparse: optimize sparse_index_alloc
  memcg: add mem_cgroup_from_css() helper
  memcg: further prevent OOM with too many dirty pages
  memcg: prevent OOM with too many dirty pages
  mm: mmu_notifier: fix freed page still mapped in secondary MMU
  mm: memcg: only check anon swapin page charges for swap cache
  mm: memcg: only check swap cache pages for repeated charging
  mm: memcg: split swapin charge function into private and public part
  mm: memcg: remove needless !mm fixup to init_mm when charging
  mm: memcg: remove unneeded shmem charge type
  ...
2012-07-31 19:25:39 -07:00
Linus Torvalds
6dbb35b0a7 NFS client updates for Linux 3.6
Features include:
 - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into
   separate modules.
 - Fix Oopses in the NFSv4 idmapper
 - Fix a deadlock whereby rpciod tries to allocate a new socket and
   ends up recursing into the NFS code due to memory reclaim.
 - Increase the number of permitted callback connections.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQGF+vAAoJEGcL54qWCgDy6ncP/jKVl/Yjz6i1b/8QtC0EmYFb
 XNwbjZicNupVM98XPLm+sfdWxvoGiHSMbwG8t/hx5Z6CteM18adeGNO1KqP9xQyg
 scPvFmqj5VJNHsHztcHIFHFYdpsuJqRKcW3TA2l8AkNOm6uBcnXArRosYN6LrXik
 hI/jWcyv8wZuooSQrLf463JK37t/s6LuUQo6jKP4582sbANHvPdLkHFCYg3yt1Sk
 2IIo2CLLs2cJUswGJnsHT76Jxfvh21NdGtvydjVjpr4H0/LY5GykBc23AAL9TWj6
 KlegDO902hLzEE93xazF59hQZrPuwBi3quEMJ3X0mjdNQWb4G96s/t2hmwpTjWyc
 mjpRsuaGlCXDxcrI5dnU52hqKa0Ju1ipbLpghxSVfilRy998xvGp5Yc6ADU+pYnt
 uP09lamCyjFEa+gDSqGt7lv4dFmdPJjWZdkXLPv0ah5sXVMYAT8NRLUfiE1+hLLu
 zKaM84PHSEvykFeJ2WvjDTgF3L55y3x6L3LN4UkKmfO5nqQf7csPcquU1NfHh25S
 jjKGq2SKGoYG1l6FwK3hemup5cnFUEX78E2QeLrmaHg92fJDGrz587i6MPc5jrSe
 sOSX/CpuCJd4VhodIo8T00me0GZSHEU7RP2KEc518ZvrPmz13avXeLeG9nYhjuxM
 cV9Ex8zh2Y4aiUXCy3uA
 =KSix
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull second wave of NFS client updates from Trond Myklebust:

 - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into
   separate modules.

 - Fix Oopses in the NFSv4 idmapper

 - Fix a deadlock whereby rpciod tries to allocate a new socket and ends
   up recursing into the NFS code due to memory reclaim.

 - Increase the number of permitted callback connections.

* tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: explicitly reject LOCK_MAND flock() requests
  nfs: increase number of permitted callback connections.
  SUNRPC: return negative value in case rpcbind client creation error
  NFS: Convert v4 into a module
  NFS: Convert v3 into a module
  NFS: Convert v2 into a module
  NFS: Keep module parameters in the generic NFS client
  NFS: Split out remaining NFS v4 inode functions
  NFS: Pass super operations and xattr handlers in the nfs_subversion
  NFS: Only initialize the ACL client in the v3 case
  NFS: Create a try_mount rpc op
  NFS: Remove the NFS v4 xdev mount function
  NFS: Add version registering framework
  NFS: Fix a number of bugs in the idmapper
  nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
  sunrpc: clarify comments on rpc_make_runnable
  pnfsblock: bail out partial page IO
2012-07-31 18:45:44 -07:00
Mel Gorman
192e501b04 nfs: prevent page allocator recursions with swap over NFS.
GFP_NOFS is _more_ permissive than GFP_NOIO in that it will initiate IO,
just not of any filesystem data.

The problem is that previously NOFS was correct because that avoids
recursion into the NFS code.  With swap-over-NFS, it is no longer correct
as swap IO can lead to this recursion.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:48 -07:00
Mel Gorman
a564b8f039 nfs: enable swap on NFS
Implement the new swapfile a_ops for NFS and hook up ->direct_IO.  This
will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under
PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol
->connect() method.

PF_MEMALLOC should allow the allocation of struct socket and related
objects and the early (re)setting of SOCK_MEMALLOC should allow us to
receive the packets required for the TCP connection buildup.

[jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases]
[dfeng@redhat.com: Fix handling of multiple swap files]
[a.p.zijlstra@chello.nl: Original patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:48 -07:00
Mel Gorman
29418aa4bd nfs: disable data cache revalidation for swapfiles
The VM does not like PG_private set on PG_swapcache pages.  As suggested
by Trond in http://lkml.org/lkml/2006/8/25/348, this patch disables NFS
data cache revalidation on swap files.  as it does not make sense to have
other clients change the file while it is being used as swap.  This avoids
setting PG_private on swap pages, since there ought to be no further races
with invalidate_inode_pages2() to deal with.

Since we cannot set PG_private we cannot use page->private which is
already used by PG_swapcache pages to store the nfs_page.  Thus augment
the new nfs_page_find_request logic.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:47 -07:00
Mel Gorman
d56b4ddf77 nfs: teach the NFS client how to treat PG_swapcache pages
Replace all relevant occurences of page->index and page->mapping in the
NFS client with the new page_file_index() and page_file_mapping()
functions.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:47 -07:00
Linus Torvalds
08843b79fb Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from J. Bruce Fields:
 "This has been an unusually quiet cycle--mostly bugfixes and cleanup.
  The one large piece is Stanislav's work to containerize the server's
  grace period--but that in itself is just one more step in a
  not-yet-complete project to allow fully containerized nfs service.

  There are a number of outstanding delegation, container, v4 state, and
  gss patches that aren't quite ready yet; 3.7 may be wilder."

* 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits)
  NFSd: make boot_time variable per network namespace
  NFSd: make grace end flag per network namespace
  Lockd: move grace period management from lockd() to per-net functions
  LockD: pass actual network namespace to grace period management functions
  LockD: manage grace list per network namespace
  SUNRPC: service request network namespace helper introduced
  NFSd: make nfsd4_manager allocated per network namespace context.
  LockD: make lockd manager allocated per network namespace
  LockD: manage grace period per network namespace
  Lockd: add more debug to host shutdown functions
  Lockd: host complaining function introduced
  LockD: manage used host count per networks namespace
  LockD: manage garbage collection timeout per networks namespace
  LockD: make garbage collector network namespace aware.
  LockD: mark host per network namespace on garbage collect
  nfsd4: fix missing fault_inject.h include
  locks: move lease-specific code out of locks_delete_lock
  locks: prevent side-effects of locks_release_private before file_lock is initialized
  NFSd: set nfsd_serv to NULL after service destruction
  NFSd: introduce nfsd_destroy() helper
  ...
2012-07-31 14:42:28 -07:00
Jeff Layton
ad0fcd4eb6 nfs: explicitly reject LOCK_MAND flock() requests
We have no mechanism to emulate LOCK_MAND locks on NFSv4, so explicitly
return -EINVAL if someone requests it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-31 14:42:20 -04:00
NeilBrown
b042414feb nfs: increase number of permitted callback connections.
By default a sunrpc service is limited to (N+3)*20 connections
where N is the number of threads.  This is 80 when N==1.
If this number is exceeded a warning is printed suggesting that
the number of threads be increased.  However with services which
run a single thread, this is impossible.

For such services there is a ->sv_maxconn setting that can be
used to forcibly increase the limit, and silence the message.
This is used by lockd.

The nfs client uses a sunrpc service to handle callbacks and
it too is single-threaded, so to avoid the useless messages,
and to allow a reasonable number of concurrent connections,
we need to set ->sv_maxconn.  1024 seems like a good number.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-31 12:33:29 -04:00
Linus Torvalds
1fad1e9a74 NFS client updates for Linux 3.6
Features include:
 - More preparatory patches for modularising NFSv2/v3/v4.
   Split out the various NFSv2/v3/v4-specific code into separate
   files
 - More preparation for the NFSv4 migration code
 - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold parameters
 - pNFS fast failover when the data servers are down
 - Various cleanups and debugging patches
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQFwYRAAoJEGcL54qWCgDy6YAQAIZuUPFCQbuzWev4L/XA0uhg
 0xjr9YBmg57UOs8e7FhS0azJip+D/kmF2Rx5gCMX0QO/IwaEVoms4BS8zjFOr3yL
 cPIyY5ErF+WJ25f3Mz8k0wOHTzrOvMdq4/7TQf5hztsrGYFid78sSb3O9ZO9bgb4
 QxqhgA0Z4S2vIqeObJAF9BT5UOT/cXfVKV0iTd52ZQVA+ZhqJ2SDlNFsoxnVOweV
 qzKOi0FZCPsjH+Z4a/LLdieHG5AYRPhOScBdG7gTFAdkysI8DDtSNCmJ68dMHYRw
 OHtyt/X4k9TImfwauV3Eww1sw3NkDswX+dv3GOSDTlMvVBrm0WXWj2zoHZbOB+0Q
 ETI9Zpolvd9pADtyfu9c+kCYIR+NdB4lGKd15iZfdEy/CZ0CtbLAbn5Szo2YcwNR
 iVjswR0jsbklD+mZjlMeZoG4JX2d9ZRqLs20POz7QQBmdIQ8jb9/SwVj9zGvX0w0
 NyVUHTFlGaNwkpo7oeRoWi1xjZWE6OzfoncV/46DOJWb08OKZ4fR4ugMYJWGi7Xx
 I4nQrIiHtInqL+sF5Y3wlZ48dufLuIhAcHh7klxgud4GRj7t8j+a99pTrRmpHvvC
 4K2Yu69VYcIA7N2RsSLohIllGPFmMwpdar7yERdD0So8/fz/zf7wn0dFFYY11+bL
 YNVVGhvNxccMU5EU9YR4
 =Lc59
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Features include:
   - More preparatory patches for modularising NFSv2/v3/v4.  Split out
     the various NFSv2/v3/v4-specific code into separate files
   - More preparation for the NFSv4 migration code
   - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold
     parameters
   - pNFS fast failover when the data servers are down
   - Various cleanups and debugging patches"

* tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits)
  nfs: fix fl_type tests in NFSv4 code
  NFS: fix pnfs regression with directio writes
  NFS: fix pnfs regression with directio reads
  sunrpc: clnt: Add missing braces
  nfs: fix stub return type warnings
  NFS: exit_nfs_v4() shouldn't be an __exit function
  SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors
  NFS: Split out NFS v4 client functions
  NFS: Split out the NFS v4 filesystem types
  NFS: Create a single nfs_clone_super() function
  NFS: Split out NFS v4 server creating code
  NFS: Initialize the NFS v4 client from init_nfs_v4()
  NFS: Move the v4 getroot code to nfs4getroot.c
  NFS: Split out NFS v4 file operations
  NFS: Initialize v4 sysctls from nfs_init_v4()
  NFS: Create an init_nfs_v4() function
  NFS: Split out NFS v4 inode operations
  NFS: Split out NFS v3 inode operations
  NFS: Split out NFS v2 inode operations
  NFS: Clean up nfs4_proc_setclientid() and friends
  ...
2012-07-30 19:16:57 -07:00
Bryan Schumaker
89d77c8fa8 NFS: Convert v4 into a module
This patch exports symbols needed by the v4 module.  In addition, I also
switch over to using IS_ENABLED() to check if CONFIG_NFS_V4 or
CONFIG_NFS_V4_MODULE are set.

The module (nfs4.ko) will be created in the same directory as nfs.ko and
will be automatically loaded the first time you try to mount over NFS v4.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 19:06:52 -04:00
Bryan Schumaker
1c606fb74c NFS: Convert v3 into a module
This patch exports symbols and moves over the final structures needed by
the v3 module.  In addition, I also switch over to using IS_ENABLED() to
check if CONFIG_NFS_V3 or CONFIG_NFS_V3_MODULE are set.

The module (nfs3.ko) will be created in the same directory as nfs.ko and
will be automatically loaded the first time you try to mount over NFS v3.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 19:06:46 -04:00
Bryan Schumaker
ddda8e0aa8 NFS: Convert v2 into a module
The module (nfs2.ko) will be created in the same directory as nfs.ko and
will be automatically loaded the first time you try to mount over NFS v2.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 19:06:41 -04:00
Bryan Schumaker
fac1e8e4ef NFS: Keep module parameters in the generic NFS client
Otherwise we break backwards compatibility when v4 becomes a modules.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 19:06:31 -04:00
Bryan Schumaker
19d87ca362 NFS: Split out remaining NFS v4 inode functions
Somehow I missed this in my previous patch series, but these functions
are only needed by the v4 code and should be moved to a v4-only file.  I
wasn't exactly sure where I should put these functions, so I moved them
into nfs4super.c where I could make them static.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 19:06:20 -04:00
Bryan Schumaker
6a74490dca NFS: Pass super operations and xattr handlers in the nfs_subversion
I can set all variables in the nfs_fill_super() function, allowing me to
remove the nfs4_fill_super() function.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 19:06:05 -04:00
Bryan Schumaker
1179acc6a3 NFS: Only initialize the ACL client in the v3 case
v2 and v4 don't use it, so I create two new nfs_rpc_ops functions to
initialize the ACL client only when we are using v3.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 19:05:54 -04:00
Bryan Schumaker
ff9099f266 NFS: Create a try_mount rpc op
I'm already looking up the nfs subversion in nfs_fs_mount(), so I have
easy access to rpc_ops that used to be difficult to reach.  This allows
me to set up a different mount path for NFS v2/3 and NFS v4.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 19:04:53 -04:00
Bryan Schumaker
e8f25e6d6d NFS: Remove the NFS v4 xdev mount function
I can now share this code with the v2 and v3 code by using the NFS
subversion structure.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 19:04:45 -04:00
Bryan Schumaker
ab7017a3a0 NFS: Add version registering framework
This patch adds in the code to track multiple versions of the NFS
protocol.  I created default structures for v2, v3 and v4 so that each
version can continue to work while I convert them into kernel modules.
I also removed the const parameter from the rpc_version array so that I
can change it at runtime.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 19:04:17 -04:00
David Howells
a427b9ec4e NFS: Fix a number of bugs in the idmapper
Fix a number of bugs in the NFS idmapper code:

 (1) Only registered key types can be passed to the core keys code, so
     register the legacy idmapper key type.

     This is a requirement because the unregister function cleans up keys
     belonging to that key type so that there aren't dangling pointers to the
     module left behind - including the key->type pointer.

 (2) Rename the legacy key type.  You can't have two key types with the same
     name, and (1) would otherwise require that.

 (3) complete_request_key() must be called in the error path of
     nfs_idmap_legacy_upcall().

 (4) There is one idmap struct for each nfs_client struct.  This means that
     idmap->idmap_key_cons is shared without the use of a lock.  This is a
     problem because key_instantiate_and_link() - as called indirectly by
     idmap_pipe_downcall() - releases anyone waiting for the key to be
     instantiated.

     What happens is that idmap_pipe_downcall() running in the rpc.idmapd
     thread, releases the NFS filesystem in whatever thread that is running in
     to continue.  This may then make another idmapper call, overwriting
     idmap_key_cons before idmap_pipe_downcall() gets the chance to call
     complete_request_key().

     I *think* that reading idmap_key_cons only once, before
     key_instantiate_and_link() is called, and then caching the result in a
     variable is sufficient.

Bug (4) is the cause of:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<          (null)>]           (null)
PGD 0
Oops: 0010 [#1] SMP
CPU 1
Modules linked in: ppdev parport_pc lp parport ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack nfs fscache xt_CHECKSUM auth_rpcgss iptable_mangle nfs_acl bridge stp llc lockd be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_realtek snd_usb_audio snd_hda_intel snd_hda_codec snd_seq snd_pcm snd_hwdep snd_usbmidi_lib snd_rawmidi snd_timer uvcvideo videobuf2_core videodev media videobuf2_vmalloc snd_seq_device videobuf2_memops e1000e vhost_net iTCO_wdt joydev coretemp snd soundcore macvtap macvlan i2c_i801 snd_page_alloc tun iTCO_vendor_support microcode kvm_intel kvm sunrpc hid_logitech_dj usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
Pid: 1229, comm: rpc.idmapd Not tainted 3.4.2-1.fc16.x86_64 #1 Gateway DX4710-UB801A/G33M05G1
RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
RSP: 0018:ffff8801a3645d40  EFLAGS: 00010246
RAX: ffff880077707e30 RBX: ffff880077707f50 RCX: ffff8801a18ccd80
RDX: 0000000000000006 RSI: ffff8801a3645e75 RDI: ffff880077707f50
RBP: ffff8801a3645d88 R08: ffff8801a430f9c0 R09: ffff8801a3645db0
R10: 000000000000000a R11: 0000000000000246 R12: ffff8801a18ccd80
R13: ffff8801a3645e75 R14: ffff8801a430f9c0 R15: 0000000000000006
FS:  00007fb6fb51a700(0000) GS:ffff8801afc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001a49b0000 CR4: 00000000000027e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rpc.idmapd (pid: 1229, threadinfo ffff8801a3644000, task ffff8801a3bf9710)
Stack:
 ffffffff81260878 ffff8801a3645db0 ffff8801a3645db0 ffff880077707a90
 ffff880077707f50 ffff8801a18ccd80 0000000000000006 ffff8801a3645e75
 ffff8801a430f9c0 ffff8801a3645dd8 ffffffff81260983 ffff8801a3645de8
Call Trace:
 [<ffffffff81260878>] ? __key_instantiate_and_link+0x58/0x100
 [<ffffffff81260983>] key_instantiate_and_link+0x63/0xa0
 [<ffffffffa057062b>] idmap_pipe_downcall+0x1cb/0x1e0 [nfs]
 [<ffffffffa0107f57>] rpc_pipe_write+0x67/0x90 [sunrpc]
 [<ffffffff8117f833>] vfs_write+0xb3/0x180
 [<ffffffff8117fb5a>] sys_write+0x4a/0x90
 [<ffffffff81600329>] system_call_fastpath+0x16/0x1b
Code:  Bad RIP value.
RIP  [<          (null)>]           (null)
 RSP <ffff8801a3645d40>
CR2: 0000000000000000

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.4]
2012-07-30 18:57:39 -04:00
Jeff Layton
5cf02d09b5 nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-07-30 18:55:59 -04:00
Peng Tao
159e0561e3 pnfsblock: bail out partial page IO
Current block layout driver read/write code assumes page
aligned IO in many places. Add a checker to validate the assumption.
Otherwise there would be data corruption like when application does
open(O_WRONLY) and page unaliged write.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 18:52:06 -04:00
Jeff Layton
f44106e217 nfs: fix fl_type tests in NFSv4 code
fl_type is not a bitmap.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 18:09:13 -04:00
Fred Isaman
c95908e4c5 NFS: fix pnfs regression with directio writes
Commit 57208fa7e5 "NFS: Create an write_pageio_init() function"
did not modify the calls in direct.c, preventing direct io from
using pnfs.  This reintroduces that capability.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 18:09:06 -04:00
Fred Isaman
59948db3be NFS: fix pnfs regression with directio reads
Commit 1abb50886a "NFS: Create an read_pageio_init() function"
did not modify the call in direct.c, preventing direct io from
using pnfs.  This reintroduces that capability.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 18:08:53 -04:00
Randy Dunlap
0add3e8567 nfs: fix stub return type warnings
Fix numerous repeated warnings by making the stub function
void instead of non-void:

fs/nfs/nfs4_fs.h: In function 'nfs4_unregister_sysctl':
fs/nfs/nfs4_fs.h:385:1: warning: no return statement in function returning non-void

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc:	Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30 17:30:24 -04:00
Stanislav Kinsbursky
9695c7057f SUNRPC: service request network namespace helper introduced
This is a cleanup patch - makes code looks simplier.
It replaces widely used rqstp->rq_xprt->xpt_net by introduced SVC_NET(rqstp).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:21 -04:00
Linus Torvalds
a66d2c8f7e Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull the big VFS changes from Al Viro:
 "This one is *big* and changes quite a few things around VFS.  What's in there:

   - the first of two really major architecture changes - death to open
     intents.

     The former is finally there; it was very long in making, but with
     Miklos getting through really hard and messy final push in
     fs/namei.c, we finally have it.  Unlike his variant, this one
     doesn't introduce struct opendata; what we have instead is
     ->atomic_open() taking preallocated struct file * and passing
     everything via its fields.

     Instead of returning struct file *, it returns -E...  on error, 0
     on success and 1 in "deal with it yourself" case (e.g.  symlink
     found on server, etc.).

     See comments before fs/namei.c:atomic_open().  That made a lot of
     goodies finally possible and quite a few are in that pile:
     ->lookup(), ->d_revalidate() and ->create() do not get struct
     nameidata * anymore; ->lookup() and ->d_revalidate() get lookup
     flags instead, ->create() gets "do we want it exclusive" flag.

     With the introduction of new helper (kern_path_locked()) we are rid
     of all struct nameidata instances outside of fs/namei.c; it's still
     visible in namei.h, but not for long.  Come the next cycle,
     declaration will move either to fs/internal.h or to fs/namei.c
     itself.  [me, miklos, hch]

   - The second major change: behaviour of final fput().  Now we have
     __fput() done without any locks held by caller *and* not from deep
     in call stack.

     That obviously lifts a lot of constraints on the locking in there.
     Moreover, it's legal now to call fput() from atomic contexts (which
     has immediately simplified life for aio.c).  We also don't need
     anti-recursion logics in __scm_destroy() anymore.

     There is a price, though - the damn thing has become partially
     asynchronous.  For fput() from normal process we are guaranteed
     that pending __fput() will be done before the caller returns to
     userland, exits or gets stopped for ptrace.

     For kernel threads and atomic contexts it's done via
     schedule_work(), so theoretically we might need a way to make sure
     it's finished; so far only one such place had been found, but there
     might be more.

     There's flush_delayed_fput() (do all pending __fput()) and there's
     __fput_sync() (fput() analog doing __fput() immediately).  I hope
     we won't need them often; see warnings in fs/file_table.c for
     details.  [me, based on task_work series from Oleg merged last
     cycle]

   - sync series from Jan

   - large part of "death to sync_supers()" work from Artem; the only
     bits missing here are exofs and ext4 ones.  As far as I understand,
     those are going via the exofs and ext4 trees resp.; once they are
     in, we can put ->write_super() to the rest, along with the thread
     calling it.

   - preparatory bits from unionmount series (from dhowells).

   - assorted cleanups and fixes all over the place, as usual.

  This is not the last pile for this cycle; there's at least jlayton's
  ESTALE work and fsfreeze series (the latter - in dire need of fixes,
  so I'm not sure it'll make the cut this cycle).  I'll probably throw
  symlink/hardlink restrictions stuff from Kees into the next pile, too.
  Plus there's a lot of misc patches I hadn't thrown into that one -
  it's large enough as it is..."

* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (127 commits)
  ext4: switch EXT4_IOC_RESIZE_FS to mnt_want_write_file()
  btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file()
  switch dentry_open() to struct path, make it grab references itself
  spufs: shift dget/mntget towards dentry_open()
  zoran: don't bother with struct file * in zoran_map
  ecryptfs: don't reinvent the wheels, please - use struct completion
  don't expose I_NEW inodes via dentry->d_inode
  tidy up namei.c a bit
  unobfuscate follow_up() a bit
  ext3: pass custom EOF to generic_file_llseek_size()
  ext4: use core vfs llseek code for dir seeks
  vfs: allow custom EOF in generic_file_llseek code
  vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
  vfs: Remove unnecessary flushing of block devices
  vfs: Make sys_sync writeout also block device inodes
  vfs: Create function for iterating over block devices
  vfs: Reorder operations during sys_sync
  quota: Move quota syncing to ->sync_fs method
  quota: Split dquot_quota_sync() to writeback and cache flushing part
  vfs: Move noop_backing_dev_info check from sync into writeback
  ...
2012-07-23 12:27:27 -07:00
Linus Torvalds
ce9f8d6b39 Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
Pull pnfs/ore fixes from Boaz Harrosh:
 "These are catastrophic fixes to the pnfs objects-layout that were just
  discovered.  They are also destined for @stable.

  I have found these and worked on them at around RC1 time but
  unfortunately went to the hospital for kidney stones and had a very
  slow recovery.  I refrained from sending them as is, before proper
  testing, and surly I have found a bug just yesterday.

  So now they are all well tested, and have my sign-off.  Other then
  fixing the problem at hand, and assuming there are no bugs at the new
  code, there is low risk to any surrounding code.  And in anyway they
  affect only these paths that are now broken.  That is RAID5 in pnfs
  objects-layout code.  It does also affect exofs (which was not broken)
  but I have tested exofs and it is lower priority then objects-layout
  because no one is using exofs, but objects-layout has lots of users."

* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  pnfs-obj: Fix __r4w_get_page when offset is beyond i_size
  pnfs-obj: don't leak objio_state if ore_write/read fails
  ore: Unlock r4w pages in exact reverse order of locking
  ore: Remove support of partial IO request (NFS crash)
  ore: Fix NFS crash by supporting any unaligned RAID IO
2012-07-20 11:43:53 -07:00
Boaz Harrosh
c999ff6802 pnfs-obj: Fix __r4w_get_page when offset is beyond i_size
It is very common for the end of the file to be unaligned on
stripe size. But since we know it's beyond file's end then
the XOR should be preformed with all zeros.

Old code used to just read zeros out of the OSD devices, which is a great
waist. But what scares me more about this situation is that, we now have
pages attached to the file's mapping that are beyond i_size. I don't
like the kind of bugs this calls for.

Fix both birds, by returning a global zero_page, if offset is beyond
i_size.

TODO:
	Change the API to ->__r4w_get_page() so a NULL can be
	returned without being considered as error, since XOR API
	treats NULL entries as zero_pages.

[Bug since 3.2. Should apply the same way to all Kernels since]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-07-20 11:50:31 +03:00
Boaz Harrosh
9909d45a85 pnfs-obj: don't leak objio_state if ore_write/read fails
[Bug since 3.2 Kernel]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-07-20 11:50:30 +03:00
Bryan Schumaker
bb6e071f84 NFS: exit_nfs_v4() shouldn't be an __exit function
... yet.  Right now, init_nfs() is calling this function if an error is
encountered when loading the nfs module.  An __exit function can't be
called from one declared as __init.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17 17:02:57 -04:00
Bryan Schumaker
ec409897e7 NFS: Split out NFS v4 client functions
These functions are only needed by NFS v4, so they can be moved into a
v4 specific file.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17 13:33:56 -04:00
Bryan Schumaker
fbdefd6442 NFS: Split out the NFS v4 filesystem types
This allows me to move the v4 mounting and unmounting functions out of
the generic client and into a file that is only compiled when CONFIG_NFS_V4
is enabled.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17 13:33:55 -04:00
Bryan Schumaker
3cadf4b864 NFS: Create a single nfs_clone_super() function
v2 and v3 shared a function for this, but v4 implemented something only
slightly different.  Might as well share code whenever possible...

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17 13:33:54 -04:00
Bryan Schumaker
fcf10398f6 NFS: Split out NFS v4 server creating code
These functions are specific to NFS v4 and can be moved to nfs4client.c
to keep them out of the generic client.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17 13:33:53 -04:00
Bryan Schumaker
428360d77c NFS: Initialize the NFS v4 client from init_nfs_v4()
And split these functions out of the generic client into a v4 specific
file.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17 13:33:52 -04:00
Bryan Schumaker
a38a9eac75 NFS: Move the v4 getroot code to nfs4getroot.c
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17 13:33:51 -04:00
Bryan Schumaker
ce4ef7c0a8 NFS: Split out NFS v4 file operations
This patch moves the NFS v4 file functions into a new file that is only
compiled when CONFIG_NFS_V4 is enabled.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17 13:33:50 -04:00
Bryan Schumaker
466bfe7f4a NFS: Initialize v4 sysctls from nfs_init_v4()
And split them out of the generic client into their own file.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17 13:33:18 -04:00
Bryan Schumaker
129d1977ed NFS: Create an init_nfs_v4() function
I want to initialize all of NFS v4 in a single function that will
eventually be used as the v4 module init function.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17 13:33:13 -04:00
Bryan Schumaker
73a79706d7 NFS: Split out NFS v4 inode operations
The NFS v4 file inode operations are already already in nfs4proc.c, so
this patch just needs to move the directory operations to the same file.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17 13:33:05 -04:00
Bryan Schumaker
ab96291ea1 NFS: Split out NFS v3 inode operations
This patch moves the NFS v3 file and directory inode functions into
files that are only compiled whet CONFIG_NFS_V3 is enabled.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17 13:33:03 -04:00
Bryan Schumaker
597d92891b NFS: Split out NFS v2 inode operations
This patch moves the NFS v2 file and directory inode functions into
files that are only compiled whet CONFIG_NFS_V2 is enabled.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17 13:32:55 -04:00
Chuck Lever
6bbb4ae8ff NFS: Clean up nfs4_proc_setclientid() and friends
Add documenting comments and appropriate debugging messages.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16 15:12:16 -04:00
Chuck Lever
de73483122 NFS: Treat NFS4ERR_CLID_INUSE as a fatal error
For NFSv4 minor version 0, currently the cl_id_uniquifier allows the
Linux client to generate a unique nfs_client_id4 string whenever a
server replies with NFS4ERR_CLID_INUSE.

This implementation seems to be based on a flawed reading of RFC
3530.  NFS4ERR_CLID_INUSE actually means that the client has presented
this nfs_client_id4 string with a different principal at some time in
the past, and that lease is still in use on the server.

For a Linux client this might be rather difficult to achieve: the
authentication flavor is named right in the nfs_client_id4.id
string.  If we change flavors, we change strings automatically.

So, practically speaking, NFS4ERR_CLID_INUSE means there is some other
client using our string.  There is not much that can be done to
recover automatically.  Let's make it a permanent error.

Remove the recovery logic in nfs4_proc_setclientid(), and remove the
cl_id_uniquifier field from the nfs_client data structure.  And,
remove the authentication flavor from the nfs_client_id4 string.

Keeping the authentication flavor in the nfs_client_id4.id string
means that we could have a separate lease for each authentication
flavor used by mounts on the client.  But we want just one lease for
all the mounts on this client.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16 15:12:16 -04:00
Chuck Lever
46a87b8a7b NFS: When state recovery fails, waiting tasks should exit
NFSv4 state recovery is not always successful.  Failure is signalled
by setting the nfs_client.cl_cons_state to a negative (errno) value,
then waking waiters.

Currently this can happen only during mount processing.  I'm about to
add an explicit case where state recovery failure during normal
operation should force all NFS requests waiting on that state recovery
to exit.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16 15:12:15 -04:00
Chuck Lever
6a1a1e34dc SUNRPC: Add rpcauth_list_flavors()
The gss_mech_list_pseudoflavors() function provides a list of
currently registered GSS pseudoflavors.  This list does not include
any non-GSS flavors that have been registered with the RPC client.
nfs4_find_root_sec() currently adds these extra flavors by hand.

Instead, nfs4_find_root_sec() should be looking at the set of flavors
that have been explicitly registered via rpcauth_register().  And,
other areas of code will soon need the same kind of list that
contains all flavors the kernel currently knows about (see below).

Rather than cloning the open-coded logic in nfs4_find_root_sec() to
those new places, introduce a generic RPC function that generates a
full list of registered auth flavors and pseudoflavors.

A new rpc_authops method is added that lists a flavor's
pseudoflavors, if it has any.  I encountered an interesting module
loader loop when I tried to get the RPC client to invoke
gss_mech_list_pseudoflavors() by name.

This patch is a pre-requisite for server trunking discovery, and a
pre-requisite for fixing up the in-kernel mount client to do better
automatic security flavor selection.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16 15:12:15 -04:00
Chuck Lever
56d08fef23 NFS: nfs_getaclargs.acl_len is a size_t
Squelch compiler warnings:

fs/nfs/nfs4proc.c: In function ‘__nfs4_get_acl_uncached’:
fs/nfs/nfs4proc.c:3811:14: warning: comparison between signed and
	unsigned integer expressions [-Wsign-compare]
fs/nfs/nfs4proc.c:3818:15: warning: comparison between signed and
	unsigned integer expressions [-Wsign-compare]

Introduced by commit bf118a34 "NFSv4: include bitmap in nfsv4 get
acl data", Dec 7, 2011.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16 14:53:43 -04:00
Chuck Lever
38527b153a NFS: Clean up TEST_STATEID and FREE_STATEID error reporting
As a finishing touch, add appropriate documenting comments and some
debugging printk's.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16 14:53:34 -04:00
Chuck Lever
3e60ffdd36 NFS: Clean up nfs41_check_expired_stateid()
Clean up: Instead of open-coded flag manipulation, use test_bit() and
clear_bit() just like all other accessors of the state->flag field.
This also eliminates several unnecessary implicit integer type
conversions.

To make it absolutely clear what is going on, a number of comments
are introduced.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16 14:49:40 -04:00
Chuck Lever
eb64cf964d NFS: State reclaim clears OPEN and LOCK state
The "state->flags & flags" test in nfs41_check_expired_stateid()
allows the state manager to squelch a TEST_STATEID operation when
it is known for sure that a state ID is no longer valid.  If the
lease was purged, for example, the client already knows that state
ID is now defunct.

But open recovery is still needed for that inode.

To force a call to nfs4_open_expired(), change the default return
value for nfs41_check_expired_stateid() to force open recovery, and
the default return value for nfs41_check_locks() to force lock
recovery, if the requested flags are clear.  Fix suggested by Bryan
Schumaker.

Also, the presence of a delegation state ID must not prevent normal
open recovery.  The delegation state ID must be cleared if it was
revoked, but once cleared I don't think it's presence or absence has
any bearing on whether open recovery is still needed.  So the logic
is adjusted to ignore the TEST_STATEID result for the delegation
state ID.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16 14:48:53 -04:00
Chuck Lever
89af273958 NFS: Don't free a state ID the server does not recognize
The result of a TEST_STATEID operation can indicate a few different
things:

  o If NFS_OK is returned, then the client can continue using the
    state ID under test, and skip recovery.

  o RFC 5661 says that if the state ID was revoked, then the client
    must perform an explicit FREE_STATEID before trying to re-open.

  o If the server doesn't recognize the state ID at all, then no
    FREE_STATEID is needed, and the client can immediately continue
    with open recovery.

Let's err on the side of caution: if the server clearly tells us the
state ID is unknown, we skip the FREE_STATEID.  For any other error,
we issue a FREE_STATEID.  Sometimes that FREE_STATEID will be
unnecessary, but leaving unused state IDs on the server needlessly
ties up resources.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16 14:48:10 -04:00
Chuck Lever
377e507d15 NFS: Fix up TEST_STATEID and FREE_STATEID return code handling
The TEST_STATEID and FREE_STATEID operations can return
-NFS4ERR_BAD_STATEID, -NFS4ERR_OLD_STATEID, or -NFS4ERR_DEADSESSION.

nfs41_{test,free}_stateid() should not pass these errors to
nfs4_handle_exception() during state recovery, since that will
recursively kick off state recovery again, resulting in a deadlock.

In particular, when the TEST_STATEID operation returns NFS4_OK,
res.status can contain one of these errors.  _nfs41_test_stateid()
replaces NFS4_OK with the value in res.status, which is then returned
to callers.

But res.status is not passed through nfs4_stat_to_errno(), and thus is
a positive NFS4ERR value.  Currently callers are only interested in
!NFS4_OK, and nfs4_handle_exception() ignores positive values.

Thus the res.status values are currently ignored by
nfs4_handle_exception() and won't cause the deadlock above.  Thanks to
this missing negative, it is only when these operations fail (which
is very rare) that a deadlock can occur.

Bryan agrees the original intent was to return res.status as a
negative NFS4ERR value to callers of nfs41_test_stateid().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16 14:47:52 -04:00
Andy Adamson
293b3b065c NFSv4.1 do not send LAYOUTRETURN on emtpy plh_segs list
mark_matching_lsegs_invalid() resets the mds_threshold counters and can
dereference the layout hdr on an initial empty plh_segs list. It returns 0 both
in the case of an initial empty list and in a non-emtpy list that was cleared
by calls to mark_lseg_invalid.

Don't send a LAYOUTRETURN if the list was initially empty.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16 14:39:00 -04:00
Andy Adamson
366d50521c NFSv4.1 mark layout when already returned
When the file layout driver is fencing a DS, _pnfs_return_layout can be
called mulitple times per inode due to in-flight i/o referencing lsegs on it's
plh_segs list.

Remember that LAYOUTRETURN has been called, and do not call it again.
Allow LAYOUTRETURNs after a subsequent LAYOUTGET.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16 14:37:25 -04:00
Andy Adamson
baf6c2a44a NFSv4.1 don't send LAYOUTCOMMIT if data resent through MDS
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16 14:37:00 -04:00
Andy Adamson
82c7c7a5a9 NFSv4.1 return the LAYOUT for each file with failed DS connection I/O
First mark the deviceid invalid to prevent any future use. Then fence all
files involved in I/O to a DS with a connection error by sending a
LAYOUTRETURN.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16 14:36:52 -04:00
Trond Myklebust
8626e4a426 Merge commit '9249e17fe094d853d1ef7475dd559a2cc7e23d42' into nfs-for-3.6
Resolve conflicts with the VFS atomic open and sget changes.

Conflicts:
	fs/nfs/nfs4proc.c
2012-07-16 12:01:42 -04:00
David Howells
9249e17fe0 VFS: Pass mount flags to sget()
Pass mount flags to sget() so that it can use them in initialising a new
superblock before the set function is called.  They could also be passed to the
compare function.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:38:34 +04:00
Al Viro
ebfc3b49a7 don't pass nameidata to ->create()
boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:47 +04:00
Al Viro
00cd8dd3bf stop passing nameidata to ->lookup()
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument.  And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:32 +04:00
Al Viro
0b728e1911 stop passing nameidata * to ->d_revalidate()
Just the lookup flags.  Die, bastard, die...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:14 +04:00
Al Viro
fa3c56bbda fs/nfs/dir.c: switch to passing nd->flags instead of nd wherever possible
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:07 +04:00
Al Viro
facc3530fb nfs_lookup_verify_inode() - nd is *always* non-NULL here
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:02 +04:00
Al Viro
93420b40bb switch nfs_lookup_check_intent() away from nameidata
just pass the flags

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:57 +04:00
Al Viro
e45198a6ac make finish_no_open() return int
namely, 1 ;-)  That's what we want to return from ->atomic_open()
instances after finish_no_open().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:45 +04:00
Al Viro
30d9049474 kill struct opendata
Just pass struct file *.  Methods are happier that way...
There's no need to return struct file * from finish_open() now,
so let it return int.  Next: saner prototypes for parts in
namei.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:39 +04:00
Al Viro
d95852777b make ->atomic_open() return int
Change of calling conventions:
old		new
NULL		1
file		0
ERR_PTR(-ve)	-ve

Caller *knows* that struct file *; no need to return it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:35 +04:00
Al Viro
47237687d7 ->atomic_open() prototype change - pass int * instead of bool *
... and let finish_open() report having opened the file via that sucker.
Next step: don't modify od->filp at all.

[AV: FILE_CREATE was already used by cifs; Miklos' fix folded]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:31 +04:00
Miklos Szeredi
eda72afb9e nfs: don't use intents for checking atomic open
is_atomic_open() is now only used by nfs4_lookup_revalidate() to check whether
it's okay to skip normal revalidation.

It does a racy check for mount read-onlyness and falls back to normal
revalidation if the open would fail.  This makes little sense now that this
function isn't used for determining whether to actually open the file or not.

The d_mountpoint() check still makes sense since it is an indication that we
might be following a mount and so open may not revalidate the dentry.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:12 +04:00
Miklos Szeredi
50de348c36 nfs: don't use nd->intent.open.flags
Instead check LOOKUP_EXCL in nd->flags, which is basically what the open intent
flags were used for.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:10 +04:00
Miklos Szeredi
8867fe5899 nfs: clean up ->create in nfs_rpc_ops
Don't pass nfs_open_context() to ->create().  Only the NFS4 implementation
needed that and only because it wanted to return an open file using open
intents.  That task has been replaced by ->atomic_open so it is not necessary
anymore to pass the context to the create rpc operation.

Despite nfs4_proc_create apparently being okay with a NULL context it Oopses
somewhere down the call chain.  So allocate a context here.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:08 +04:00
Miklos Szeredi
0dd2b474d0 nfs: implement i_op->atomic_open()
Replace NFS4 specific ->lookup implementation with ->atomic_open impelementation
and use the generic nfs_lookup for other lookups.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:33:06 +04:00
Al Viro
b3d9b7a3c7 vfs: switch i_dentry/d_alias to hlist
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:32:55 +04:00
Trond Myklebust
f1daf666dd NFSv4: Fix an NFSv4 mount regression
The helper nfs_fs_mount() will always call nfs4_try_mount with the
mount_info->fill_super argument pointing to nfs_fill_super, which is
NFSv2/v3 only.
Fix is to have nfs4_try_mount replace it with nfs4_fill_super.

The regression was introduced by commit c40f8d1d (NFS: Create a common
fs_mount() function)

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-10 13:25:39 -04:00
Trond Myklebust
4035c2487f NFS: Fix list manipulation snafus in fs/nfs/direct.c
Fix 2 bugs in nfs_direct_write_reschedule:

 - The request needs to be removed from the 'reqs' list before it can
   be added to 'failed'.
 - Fix an infinite loop if the 'failed' list is non-empty.

Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-08 10:32:08 -04:00
Bryan Schumaker
a8d8f02cf0 NFS: Create custom NFS v4 write_inode() function
This gives pnfs a chance to do a layout commit inside the v4 code.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:47 -04:00
Bryan Schumaker
57208fa7e5 NFS: Create an write_pageio_init() function
pNFS needs to select a write function based on the layout driver
currently in use, so I let each NFS version decide how to best handle
initializing writes.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:46 -04:00
Bryan Schumaker
1abb50886a NFS: Create an read_pageio_init() function
pNFS needs to select a read function based on the layout driver
currently in use, so I let each NFS version decide how to best handle
initializing reads.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:46 -04:00
Bryan Schumaker
6663ee7f81 NFS: Create an alloc_client rpc_op
This gives NFS v4 a way to set up callbacks and sessions without v2 or
v3 having to do them as well.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:46 -04:00
Bryan Schumaker
cdb7ecedec NFS: Create a free_client rpc_op
NFS v4 needs a way to shut down callbacks and sessions, but v2 and v3
don't.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:45 -04:00
Bryan Schumaker
57ec14c55d NFS: Create a return_delegation rpc op
Delegations are a v4 feature, so push return_delegation out of the
generic client by creating a new rpc_op and renaming the old function to
be in the nfs v4 "namespace"

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:45 -04:00
Bryan Schumaker
011e2a7fd5 NFS: Create a have_delegation rpc_op
Delegations are a v4 feature, so push them out of the generic code.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:44 -04:00
Bryan Schumaker
a5c58892b4 NFS: Create a v4-specific fsync function
v2 and v3 don't need to worry about doing a pnfs layoutcommit.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:44 -04:00
Bryan Schumaker
eeebf91675 NFS: Use nfs4_destroy_server() to clean up NFS v4
I can use this function to return delegations and unset the pnfs layout
driver rather than continuing to do these things in the generic client.
With this change, we no longer need an nfs4_kill_super().

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:44 -04:00
Bryan Schumaker
e38eb6506f NFS: set_pnfs_layoutdriver() from nfs4_proc_fsinfo()
The generic client doesn't need to know about pnfs layout drivers, so
this should be done in the v4 code.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:43 -04:00
Andy Adamson
6e5b587d2f NFSv4.1 handle OPEN O_CREATE mdsthreshold
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:33:43 -04:00
Trond Myklebust
e3074507d9 NFS: Simplify NFSv4.1 Kconfig
Convert the pNFS file layout to use the same system as the
object and block layout.
Remove unnecessary dependencies on NFS_FS

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:51 -04:00
Andy Adamson
05bf14adca NFSv4.1: Use session max response size for GETDEVICEINFO gdia_maxcount
We prepare for the largest possible GETDEVICEINFO response, which
can not be greater than the negotiated session maximum response size.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:50 -04:00
Trond Myklebust
2f2c63bc22 NFS: Cleanup - only store the write verifier in struct nfs_page
The 'committed' field is not needed once we have put the struct nfs_page
on the right list.

Also correct the type of the verifier: it is not an array of __be32, but
simply an 8 byte long opaque array.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:50 -04:00
Trond Myklebust
98d9452448 NFSv4: Decode getdevicelist should use nfs4_verifier
The verifier returned by the GETDEVICELIST operation is not a write
verifier, but a nfs4_verifier.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:50 -04:00
Trond Myklebust
140150dbb1 SUNRPC: Remove unused function xdr_encode_pages
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:49 -04:00
Trond Myklebust
b42353ff8d NFSv4.1: Clean up nfs4_reclaim_lease
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:49 -04:00
Trond Myklebust
1a47e7a666 NFSv4.1: Cleanup - move nfs4_has_session tests out of state manager loop
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:48 -04:00
Trond Myklebust
60f00153d9 NFSv4.1: Clean up nfs4_recall_slot()
Move the test for nfs4_has_session out of the nfs4_state_manager()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:48 -04:00
Trond Myklebust
1a2dd948e2 NFSv4.1: Handle slot recalls before doing state recovery
Handling a slot recall situation should always takes precedence over
state recovery to allow the server to manage its resources.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:47 -04:00
Weston Andros Adamson
8ed27d4fb1 NFS: add more context to state manager error mesgs
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:47 -04:00
Trond Myklebust
1aecca3e83 NFSv3: Don't open code stream position calculation in decode_getacl3resok
Use the new xdr_stream_pos() helper instead.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:45 -04:00
Trond Myklebust
256e48bb47 NFSv4: Simplify the GETATTR attribute length calculation
Use the xdr_stream position counter as the basis for the calculation
instead of assuming that we can calculate an offset to the start
of the iovec.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:44 -04:00
Trond Myklebust
64bd577ea0 NFS: Let xdr_read_pages() check for buffer overflows
xdr_read_pages will already do all of the buffer overflow checks that are
currently being open-coded in the various callers. This patch simplifies
the existing code by replacing the open coded checks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:43 -04:00
Trond Myklebust
db3a3bcf08 NFSv2/v3: Remove incorrect dprintks from the readdir reply code
The actual size of the directory is unknown to the client, so it is
always requesting the maximum number it can handle. If the server
is replying with fewer entries than was requested, then that will
usually reflect the fact that we've hit the end of the directory.
Flagging it as an error is therefore incorrect.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-24 16:20:07 -04:00
Bryan Schumaker
b1027439df NFS: Force the legacy idmapper to be single threaded
It was initially coded under the assumption that there would only be one
request at a time, so use a lock to enforce this requirement..

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
CC: stable@vger.kernel.org [3.4+]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-20 14:38:11 -04:00
Trond Myklebust
1a0de48ae5 NFS: Initialise commit_info.rpc_out when !defined(CONFIG_NFS_V4)
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-06-19 18:42:28 -04:00
Trond Myklebust
5a695da263 NFS: Fix a refcounting issue in O_DIRECT
In nfs_direct_write_reschedule(), the requests from nfs_scan_commit_list
have a refcount of 2, whereas the operations in
nfs_direct_write_completion_ops expect them to have a refcount of 1.

This patch adds a call to release the extra references.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-06-19 18:42:14 -04:00
Trond Myklebust
0a9c63fae7 NFSv4.1: Fix a race in set_pnfs_layoutdriver
The call to try_module_get() dereferences ld_type outside the
spin locks, which means that it may be pointing to garbage if
a module unload was in progress.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-19 13:32:45 -04:00
Trond Myklebust
2a4c8994ee NFSv4.1: Fix umount when filelayout DS is also the MDS
Currently there is a 'chicken and egg' issue when the DS is also the mounted
MDS. The nfs_match_client() reference from nfs4_set_ds_client bumps the
cl_count, the nfs_client is not freed at umount, and nfs4_deviceid_purge_client
is not called to dereference the MDS usage of a deviceid which holds a
reference to the DS nfs_client.  The result is the umount program returns,
but the nfs_client is not freed, and the cl_session hearbeat continues.

The MDS (and all other nfs mounts) lose their last nfs_client reference in
nfs_free_server when the last nfs_server (fsid) is umounted.
The file layout DS lose their last nfs_client reference in destroy_ds
when the last deviceid referencing the data server is put and destroy_ds is
called. This is triggered by a call to nfs4_deviceid_purge_client which
removes references to a pNFS deviceid used by an MDS mount.

The fix is to track how many pnfs enabled filesystems are mounted from
this server, and then to purge the device id cache once that count reaches
zero.

Reported-by: Jorge Mora <Jorge.Mora@netapp.com>
Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-18 08:45:16 -04:00
Linus Torvalds
873b779d99 NFS client bugfixes for Linux 3.5
Highlights include:
 
  - Fix a couple of mount regressions due to the recent cleanups.
  - Fix an Oops in the open recovery code
  - Fix an rpc_pipefs upcall hang that results from some of the
    net namespace work from 3.4.x (stable kernel candidate).
  - Fix a couple of write and o_direct regressions that were found
    at last weeks Bakeathon testing event in Ann Arbor.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJP2gmaAAoJEGcL54qWCgDyrBMP/RY/T++He8y5k3M9aEqiIv0q
 D8ZVMwzID6f4Zgw4xRg96aYr02sBTw0q+0mP5x1EZmg8mK29rnBiVeKHE1iwSfXq
 10/SYISlpIjhJC4I4kHXGd2KClgj7qRRCbDKFRWwoIIwYU+kJn8MRnPa9XqdL8kP
 q68lrtayW8THSJDR8bk1GQn+ARxGeoY++qzHxm3vpQCbZVVb19VqKMWAWSN4VKqb
 epWehOSAzB3iA7HrLRbf8Y8/sDdXewxCQpr9CC/wxuu++l5ifPphR0ToX+k9VZXI
 BKFLUojCUZHTMAgCxuxjrFYehMeyClbzL2lLkz5Pgj0gQhOX6Myj+WMXoEg/uWfo
 XNf51FH3yBbnfayTaOUs6Y50iuU+dQO7TUTAoWTPpW9V/iT5z/fWAKUVJhDtrPk5
 DVDkR6SEgb4P1RqkehZKLq5k5GSAcTR+MZr452eDrFYXJrY8ORDE6o6kP4Rr3Nnd
 n8gap0gHxzIYlhBghem6+nLN+HhpZQopWeD8mNub20VuXsChRDr9/+XWuMCSJaZF
 2kleVdt2+rTDzi9bJTRYlsX397oaThL0NbRvshHAwnXIDtIQrzxx6+dUyOsEWMEu
 go/EdSUUESXGNlsWTqewCBsOjPeE4L5ijI/QglfDkF+CzD5dDjrxl+5i57iMKVfc
 Ydste3pQJkS7PiZu1sWA
 =unbu
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

   - Fix a couple of mount regressions due to the recent cleanups.
   - Fix an Oops in the open recovery code
   - Fix an rpc_pipefs upcall hang that results from some of the net
     namespace work from 3.4.x (stable kernel candidate).
   - Fix a couple of write and o_direct regressions that were found at
     last weeks Bakeathon testing event in Ann Arbor."

* tag 'nfs-for-3.5-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: add an endian notation for sparse
  NFSv4.1: integer overflow in decode_cb_sequence_args()
  rpc_pipefs: allow rpc_purge_list to take a NULL waitq pointer
  NFSv4 do not send an empty SETATTR compound
  NFSv2: EOF incorrectly set on short read
  NFS: Use the NFS_DEFAULT_VERSION for v2 and v3 mounts
  NFS: fix directio refcount bug on commit
  NFSv4: Fix unnecessary delegation returns in nfs4_do_open
  NFSv4.1: Convert another trivial printk into a dprintk
  NFS4: Fix open bug when pnfs module blacklisted
  NFS: Remove incorrect BUG_ON in nfs_found_client
  NFS: Map minor mismatch error to protocol not support error.
  NFS: Fix a commit bug
  NFS4: Set parsed mount data version to 4
  NFSv4.1: Ensure we clear session state flags after a session creation
  NFSv4.1: Convert a trivial printk into a dprintk
  NFSv4: Fix up decode_attr_mdsthreshold
  NFSv4: Fix an Oops in the open recovery code
  NFSv4.1: Fix a request leak on the back channel
2012-06-15 17:37:23 -07:00
Stanislav Kinsbursky
12918b10d5 NFS: hard-code init_net for NFS callback transports
In case of destroying mount namespace on child reaper exit, nsproxy is zeroed
to the point already. So, dereferencing of it is invalid.
This patch hard-code "init_net" for all network namespace references for NFS
callback services. This will be fixed with proper NFS callback
containerization.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-14 13:53:43 -04:00
Dan Carpenter
e216c8c771 NFS: add an endian notation for sparse
This is supposed to be a __be32 value.  Sparse complains a lot:

fs/nfs/callback_xdr.c:699:30: warning: incorrect type in initializer (different base types)
fs/nfs/callback_xdr.c:699:30:    expected unsigned int [unsigned] status
fs/nfs/callback_xdr.c:699:30:    got restricted __be32 const [usertype] csr_status
fs/nfs/callback_xdr.c:715:9: warning: cast to restricted __be32
fs/nfs/callback_xdr.c:716:16: warning: incorrect type in return expression (different base types)
fs/nfs/callback_xdr.c:716:16:    expected restricted __be32
fs/nfs/callback_xdr.c:716:16:    got unsigned int [unsigned] status

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-12 09:54:40 -04:00
Dan Carpenter
0439f31c35 NFSv4.1: integer overflow in decode_cb_sequence_args()
This seems like it could overflow on 32 bits.  Use kmalloc_array() which
has overflow protection built in.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-12 09:54:36 -04:00
Andy Adamson
2669940db8 NFSv4 do not send an empty SETATTR compound
Commit 536e43d12b ATTR_OPEN check can result in
an ia_valid with only ATTR_FILE set, and no NFS_VALID_ATTRS attributes to
request from the server.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-11 17:25:53 -04:00
Sachin Prabhu
64f9a83665 NFSv2: EOF incorrectly set on short read
In cases where the server returns fewer bytes then those requested, we
can incorrectly set the eof flag for the file. Fixing this allows the
request to be retried with updated offset and count arguments.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-11 17:25:00 -04:00
Bryan Schumaker
c5afc8da5b NFS: Use the NFS_DEFAULT_VERSION for v2 and v3 mounts
Older versions of nfs utils don't always pass a "vers=" mount option for
NFS.  This chould lead to attempts at using NFS v0 due to a zeroed out
nfs_parsed_mount_data struct.  I solve this by setting the default NFS
version to NFS_DEFAULT_VERSION in the v2 and v3 cases (v4 has already been
taken care of by a similar patch).

Reported-by: Joerg Roedel <joro@&bytes.org>
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-09 14:38:59 -04:00
Fred Isaman
906369e43c NFS: fix directio refcount bug on commit
This reverts a hunk from commit 0427708657
"NFS: Clean up - Simplify reference counting in fs/nfs/direct.c"

The cleanups in that patch affect the write path, but by the time
processing hits commit the removed reference has been added back by
nfs_scan_commit_list().  Without this reversion, any page that is
sent to commit holds on to an unbalanced reference that is never
freed.  The immediate effect is an imbalance over the wire between
OPENs and CLOSEs.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-09 14:32:45 -04:00
Trond Myklebust
2d0dbc6ae8 NFSv4: Fix unnecessary delegation returns in nfs4_do_open
While nfs4_do_open() expects the fmode argument to be restricted to
combinations of FMODE_READ and FMODE_WRITE, both nfs4_atomic_open()
and nfs4_proc_create will pass the nfs_open_context->mode,
which contains the full fmode_t.

This patch ensures that nfs4_do_open strips the other fmode_t bits,
fixing a problem in which the nfs4_do_open call would result in an
unnecessary delegation return.

Reported-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-06-08 11:08:42 -04:00
Trond Myklebust
02c67525cf NFSv4.1: Convert another trivial printk into a dprintk
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-07 13:45:53 -04:00
Fred Isaman
fb47ddc9d5 NFS4: Fix open bug when pnfs module blacklisted
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-07 13:44:24 -04:00
Trond Myklebust
f07936f2c4 NFS: Remove incorrect BUG_ON in nfs_found_client
It is perfectly valid for nfs_get_client() to return a nfs_client that
is in the process of setting up the NFSv4.1 session.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-07 10:21:03 -04:00
Steve Dickson
f25efd851c NFS: Map minor mismatch error to protocol not support error.
Sservers that only have NFSv4.1 support the
NFS4ERR_MINOR_VERS_MISMATCH error is return on
v4.0 mounts. Mapping that error to EPROTONOSUPPORT
will cause the mount to back off to v3 instead of
failing.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-06 14:32:40 -04:00
Trond Myklebust
9bce008bae NFS: Fix a commit bug
The new commit code fails to copy the verifier into the wb_verf field
of _all_ the nfs_page structures; it only copies it into the first entry.
The consequence is that most requests end up failing to match in
nfs_commit_release.

Fix is to copy the verifier into the req->wb_verf field in
nfs_write_completion.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-06-05 18:38:47 -04:00
Bryan Schumaker
cdf66442fa NFS4: Set parsed mount data version to 4
This patch only affects mounting through "-t nfs4" since it doesn't set
up an nfs version to use in the mount data.  The nfs client was trying
to mount using NFS v0, causing either a BUG() or a protocol not
supported message.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-05 15:50:47 -04:00
Trond Myklebust
bda197f5d7 NFSv4.1: Ensure we clear session state flags after a session creation
Both nfs4_reset_session and nfs41_init_clientid need to clear all the
session related state flags on success.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-05 10:22:14 -04:00
Trond Myklebust
08106ac7c8 NFSv4.1: Convert a trivial printk into a dprintk
There is no need to bug the user about the server returning an error
on destroy_session. The error will be handled by the state manager,
without any need for further input from anyone else.
So convert that printk into a debugging dprintk.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-05 10:08:24 -04:00
Trond Myklebust
029c534737 NFSv4: Fix up decode_attr_mdsthreshold
Fix an incorrect use of 'likely()'. The FATTR4_WORD2_MDSTHRESHOLD
bit is only expected in NFSv4.1 OPEN calls, and so is actually
rather _unlikely_.

decode_attr_mdsthreshold needs to clear FATTR4_WORD2_MDSTHRESHOLD
from the attribute bitmap after it has decoded the data.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Andy Adamson <andros@netapp.com>
2012-06-05 10:00:47 -04:00
Trond Myklebust
1549210fcc NFSv4: Fix an Oops in the open recovery code
The open recovery code does not need to request a new value for the
mdsthreshold, and so does not allocate a struct nfs4_threshold.
The problem is that encode_getfattr_open() will still request an
mdsthreshold, and so we end up Oopsing in decode_attr_mdsthreshold.

This patch fixes encode_getfattr_open so that it doesn't request an
mdsthreshold when the caller isn't asking for one. It also fixes
decode_attr_mdsthreshold so that it errors if the server returns
an mdsthreshold that we didn't ask for (instead of Oopsing).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Andy Adamson <andros@netapp.com>
2012-06-05 10:00:14 -04:00
Linus Torvalds
1193755ac6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs changes from Al Viro.
 "A lot of misc stuff.  The obvious groups:
   * Miklos' atomic_open series; kills the damn abuse of
     ->d_revalidate() by NFS, which was the major stumbling block for
     all work in that area.
   * ripping security_file_mmap() and dealing with deadlocks in the
     area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in
     general.
   * ->encode_fh() switched to saner API; insane fake dentry in
     mm/cleancache.c gone.
   * assorted annotations in fs (endianness, __user)
   * parts of Artem's ->s_dirty work (jff2 and reiserfs parts)
   * ->update_time() work from Josef.
   * other bits and pieces all over the place.

  Normally it would've been in two or three pull requests, but
  signal.git stuff had eaten a lot of time during this cycle ;-/"

Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the
'truncate_range' inode method was removed by the VM changes, the VFS
update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due
to sparse fix added twice, with other changes nearby).

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits)
  nfs: don't open in ->d_revalidate
  vfs: retry last component if opening stale dentry
  vfs: nameidata_to_filp(): don't throw away file on error
  vfs: nameidata_to_filp(): inline __dentry_open()
  vfs: do_dentry_open(): don't put filp
  vfs: split __dentry_open()
  vfs: do_last() common post lookup
  vfs: do_last(): add audit_inode before open
  vfs: do_last(): only return EISDIR for O_CREAT
  vfs: do_last(): check LOOKUP_DIRECTORY
  vfs: do_last(): make ENOENT exit RCU safe
  vfs: make follow_link check RCU safe
  vfs: do_last(): use inode variable
  vfs: do_last(): inline walk_component()
  vfs: do_last(): make exit RCU safe
  vfs: split do_lookup()
  Btrfs: move over to use ->update_time
  fs: introduce inode operation ->update_time
  reiserfs: get rid of resierfs_sync_super
  reiserfs: mark the superblock as dirty a bit later
  ...
2012-06-01 10:34:35 -07:00
Miklos Szeredi
0ef97dcfce nfs: don't open in ->d_revalidate
NFSv4 can't do reliable opens in d_revalidate, since it cannot know whether a
mount needs to be followed or not.  It does check d_mountpoint() on the dentry,
which can result in a weird error if the VFS found that the mount does not in
fact need to be followed, e.g.:

  # mount --bind /mnt/nfs /mnt/nfs-clone
  # echo something > /mnt/nfs/tmp/bar
  # echo x > /tmp/file
  # mount --bind /tmp/file /mnt/nfs-clone/tmp/bar
  # cat  /mnt/nfs/tmp/bar
  cat: /mnt/nfs/tmp/bar: Not a directory

Which should, by any sane filesystem, result in "something" being printed.

So instead do the open in f_op->open() and in the unlikely case that the cached
dentry turned out to be invalid, drop the dentry and return EOPENSTALE to let
the VFS retry.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:12:02 -04:00
Linus Torvalds
419f431949 Merge branch 'for-3.5' of git://linux-nfs.org/~bfields/linux
Pull the rest of the nfsd commits from Bruce Fields:
 "... and then I cherry-picked the remainder of the patches from the
  head of my previous branch"

This is the rest of the original nfsd branch, rebased without the
delegation stuff that I thought really needed to be redone.

I don't like rebasing things like this in general, but in this situation
this was the lesser of two evils.

* 'for-3.5' of git://linux-nfs.org/~bfields/linux: (50 commits)
  nfsd4: fix, consolidate client_has_state
  nfsd4: don't remove rebooted client record until confirmation
  nfsd4: remove some dprintk's and a comment
  nfsd4: return "real" sequence id in confirmed case
  nfsd4: fix exchange_id to return confirm flag
  nfsd4: clarify that renewing expired client is a bug
  nfsd4: simpler ordering of setclientid_confirm checks
  nfsd4: setclientid: remove pointless assignment
  nfsd4: fix error return in non-matching-creds case
  nfsd4: fix setclientid_confirm same_cred check
  nfsd4: merge 3 setclientid cases to 2
  nfsd4: pull out common code from setclientid cases
  nfsd4: merge last two setclientid cases
  nfsd4: setclientid/confirm comment cleanup
  nfsd4: setclientid remove unnecessary terms from a logical expression
  nfsd4: move rq_flavor into svc_cred
  nfsd4: stricter cred comparison for setclientid/exchange_id
  nfsd4: move principal name into svc_cred
  nfsd4: allow removing clients not holding state
  nfsd4: rearrange exchange_id logic to simplify
  ...
2012-06-01 08:32:58 -07:00
J. Bruce Fields
03a4e1f6dd nfsd4: move principal name into svc_cred
Instead of keeping the principal name associated with a request in a
structure that's private to auth_gss and using an accessor function,
move it to svc_cred.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:55 -04:00
Stanislav Kinsbursky
786185b5f8 SUNRPC: move per-net operations from svc_destroy()
The idea is to separate service destruction and per-net operations,
because these are two different things and the mix looks ugly.

Notes:

1) For NFS server this patch looks ugly (sorry for that). But these
place will be rewritten soon during NFSd containerization.

2) LockD per-net counter increase int lockd_up() was moved prior to
make_socks() to make lockd_down_net() call safe in case of error.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:40 -04:00
Stanislav Kinsbursky
9793f7c889 SUNRPC: new svc_bind() routine introduced
This new routine is responsible for service registration in a specified
network context.

The idea is to separate service creation from per-net operations.

Note also: since registering service with svc_bind() can fail, the
service will be destroyed and during destruction it will try to
unregister itself from rpcbind. In this case unregistration has to be
skipped.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:39 -04:00
Trond Myklebust
1d59d61f60 NFS: Ensure that setattr and getattr wait for O_DIRECT write completion
Use the same mechanism as the block devices are using, but move the
helper functions from fs/direct-io.c into fs/inode.c to remove the
dependency on CONFIG_BLOCK.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31 11:41:36 -07:00
Linus Torvalds
53f2c4a8fd NFS client updates for Linux 3.5
New features include:
 - Rewrite the O_DIRECT code so that it can share the same coalescing and
   pNFS functionality as the page cache code.
 - Allow the server to provide hints as to when we should use pNFS, and
   when it is more efficient to read and write through the metadata
   server.
 - NFS cache consistency updates:
   - Use the ctime to emulate a change attribute for NFSv2/v3 so that
     all NFS versions can share the same cache management code.
   - New cache management code will only look at the change attribute
     and size attribute when deciding whether or not our cached data
     is still valid or not.
   - Don't request NFSv4 post-op attributes on writes in cases such as
     O_DIRECT, where we don't care about data cache consistency, or
     when we have a write delegation, and know that our cache is
     still consistent.
   - Don't request NFSv4 post-op attributes on operations such as
     COMMIT, where there are no expected metadata updates.
   - Don't request NFSv4 directory post-op attributes in cases where
     the operations themselves already return change attribute updates:
     i.e.  operations such as OPEN, CREATE, REMOVE, LINK and RENAME.
 - Speed up 'ls' and friends by using READDIR rather than READDIRPLUS
   if we detect no attempts to lookup filenames.
 - Improve the code sharing between NFSv2/v3 and v4 mounts
 - NFSv4.1 state management efficiency improvements
 - More patches in preparation for NFSv4/v4.1 migration functionality.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPw/MNAAoJEGcL54qWCgDyxU8P/2kKqhAlhoLEArBqo9FT3/OK
 YrNs5uO/erTgnCG8L0XQvTKjHB9F7TAeFXqTmBZuPlb1afRpHHt2vzPqzIvUCeOC
 ZXm8vzZf4nxWZgEFoTDdUBvqQi9lLdIzCRhSaVCKcRnNwiuaKDd/iwykbWGcHqmv
 jtR4lzXPllJdKCUL3yb3juVrpq6Vvn254ID2pqdnYcEtIJIHgaRZpwdp4Iz9+8b5
 Moishiw2rgCBJIhf+VCYd8B2oYfMgSDPxG1o3etkwY46qo+4s+CIls9Vu/6YzGXK
 3+NdLatRDqKhQpLm0/R+dI3rntnTZ8x6LgWnTGxUsiqb6pAaHZPK284rf2eh/s7M
 Q4G4203r0uw539kIt6eKOGqC9c8kZAPCHlQSPCaImZyCJsz+6OMShNlGB5bZpFPr
 tbdxaxudrhCF7UVKXicJCWgv2nIHtek6fNwey1jqFoYgZP5ipiBKymvXQC5WAMBw
 7RHJor/JEC+UJkVg/7Mkpg0UNw3E36CTYLeRJKlNCS6YO9NJQseCDxhhMNAy/ab7
 RGO8DVMkUsOUH20S+a19LyeFQtveWFIE0DiDqRn0KnNGhGwHrv2t4xFukjlrf4Sw
 8FQUBRdtFxfmspfA1IdoTY49XZQda5eagvTy1MyaWEh+jPSJ4G5j3sSjFiaKAJqw
 79iQKFGkxPOSHx2yCdAF
 =suVW
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "New features include:
   - Rewrite the O_DIRECT code so that it can share the same coalescing
     and pNFS functionality as the page cache code.
   - Allow the server to provide hints as to when we should use pNFS,
     and when it is more efficient to read and write through the
     metadata server.
   - NFS cache consistency updates:
     * Use the ctime to emulate a change attribute for NFSv2/v3 so that
       all NFS versions can share the same cache management code.
     * New cache management code will only look at the change attribute
       and size attribute when deciding whether or not our cached data
       is still valid or not.
     * Don't request NFSv4 post-op attributes on writes in cases such as
       O_DIRECT, where we don't care about data cache consistency, or
       when we have a write delegation, and know that our cache is still
       consistent.
     * Don't request NFSv4 post-op attributes on operations such as
       COMMIT, where there are no expected metadata updates.
     * Don't request NFSv4 directory post-op attributes in cases where
       the operations themselves already return change attribute
       updates: i.e. operations such as OPEN, CREATE, REMOVE, LINK and
       RENAME.
   - Speed up 'ls' and friends by using READDIR rather than READDIRPLUS
     if we detect no attempts to lookup filenames.
   - Improve the code sharing between NFSv2/v3 and v4 mounts
   - NFSv4.1 state management efficiency improvements
   - More patches in preparation for NFSv4/v4.1 migration functionality."

Fix trivial conflict in fs/nfs/nfs4proc.c that was due to the dcache
qstr name initialization changes (that made the length/hash a 64-bit
union)

* tag 'nfs-for-3.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (146 commits)
  NFSv4: Add debugging printks to state manager
  NFSv4: Map NFS4ERR_SHARE_DENIED into an EACCES error instead of EIO
  NFSv4: update_changeattr does not need to set NFS_INO_REVAL_PAGECACHE
  NFSv4.1: nfs4_reset_session should use nfs4_handle_reclaim_lease_error
  NFSv4.1: Handle other occurrences of NFS4ERR_CONN_NOT_BOUND_TO_SESSION
  NFSv4.1: Handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION in the state manager
  NFSv4.1: Handle errors in nfs4_bind_conn_to_session
  NFSv4.1: nfs4_bind_conn_to_session should drain the session
  NFSv4.1: Don't clobber the seqid if exchange_id returns a confirmed clientid
  NFSv4.1: Add DESTROY_CLIENTID
  NFSv4.1: Ensure we use the correct credentials for bind_conn_to_session
  NFSv4.1: Ensure we use the correct credentials for session create/destroy
  NFSv4.1: Move NFSPROC4_CLNT_BIND_CONN_TO_SESSION to the end of the operations
  NFSv4.1: Handle NFS4ERR_SEQ_MISORDERED when confirming the lease
  NFSv4: When purging the lease, we must clear NFS4CLNT_LEASE_CONFIRM
  NFSv4: Clean up the error handling for nfs4_reclaim_lease
  NFSv4.1: Exchange ID must use GFP_NOFS allocation mode
  nfs41: Use BIND_CONN_TO_SESSION for CB_PATH_DOWN*
  nfs4.1: add BIND_CONN_TO_SESSION operation
  NFSv4.1 test the mdsthreshold hint parameters
  ...
2012-05-29 10:43:51 -07:00
Trond Myklebust
cc0a984368 NFSv4: Add debugging printks to state manager
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-28 17:21:57 -04:00
Trond Myklebust
fb13bfa7e1 NFSv4: Map NFS4ERR_SHARE_DENIED into an EACCES error instead of EIO
If a file OPEN is denied due to a share lock, the resulting
NFS4ERR_SHARE_DENIED is currently mapped to the default EIO.
This patch adds a more appropriate mapping, and brings Linux
into line with what Solaris 10 does.

See https://bugzilla.kernel.org/show_bug.cgi?id=43286

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-05-28 17:21:48 -04:00
Linus Torvalds
90324cc1b1 avoid iput() from flusher thread
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPw2J/AAoJECvKgwp+S8Ja5jkP/3uMxkhf8XQpXCI3O1QVfaQr
 uZFfM8sINqIPDVm1dtFjFj7f8Bw9mhE2KAnnJ1rKT8tQwqq9yAse1QPlhCG1ZqoP
 +AnMDDXHtx7WmQZXhBvS9b+unpZ7Jr6r6pO5XrmTL2kRL3YJPUhZ2+xbTT5belTB
 KoAu4WqORZRxfXoC76S7U8K+D4NcAGhAOxCClsIjmY+oocCiCag4FZOyzYIFViqc
 ghUN/+rLQ3fqGGv2yO7Ylx1gUM7sxIwkZQ/h962jFAtxz9czImr2NmRoMliOaOkS
 tvcnIf+E3u0n/zIjzFvzhxKgHJPP8PkcPMk60d3jKmFngBkqFTzNUeVTP8md7HrV
 4DlXisWr+z7YVyWUCFaNcJLmjiWSwQ8DV/clRLobeBf9EJKan5F1PjFgl6PLJM5F
 Qr1+LHMNaetdulBwMRTyveZTzYqw9RmDnD9dWMo4mX/kTpvtC4jTPVV7hkRD+Qlv
 5vTRR+VXL3Q50yClLf0AQMSKTnH2gBuepM/b+7cShLGfsMln8DtUjmbigv+niL63
 BibcCIbIlP2uWGnl37VhsC34AT+RKt3lggrBOpn/7XJMq/wKR7IRP/7V9TfYgaUN
 NBa+wtnLDa1pZEn/X7izdcQP62PzDtmB+ObvYT0Yb40A4+2ud3qF/lB53c1A1ewF
 /9c4zxxekjHZnn2oooEa
 =oLXf
 -----END PGP SIGNATURE-----

Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux

Pull writeback tree from Wu Fengguang:
 "Mainly from Jan Kara to avoid iput() in the flusher threads."

* tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: Avoid iput() from flusher thread
  vfs: Rename end_writeback() to clear_inode()
  vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
  writeback: Refactor writeback_single_inode()
  writeback: Remove wb->list_lock from writeback_single_inode()
  writeback: Separate inode requeueing after writeback
  writeback: Move I_DIRTY_PAGES handling
  writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
  writeback: Move clearing of I_SYNC into inode_sync_complete()
  writeback: initialize global_dirty_limit
  fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
  mm: page-writeback.c: local functions should not be exposed globally
2012-05-28 09:54:45 -07:00
Trond Myklebust
359d7d1c97 NFSv4: update_changeattr does not need to set NFS_INO_REVAL_PAGECACHE
We're already invalidating the data cache, and setting the new change
attribute. Since directories don't care about the i_size field, there
is no need to be forcing any extra revalidation of the page cache.

We do keep the NFS_INO_INVALID_ATTR flag, in order to force an
attribute cache revalidation on stat() calls since we do not
update the mtime and ctime fields.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-28 10:05:47 -04:00
Trond Myklebust
f2c1b5100d NFSv4.1: nfs4_reset_session should use nfs4_handle_reclaim_lease_error
The results from a call to nfs4_proc_create_session() should always
be fed into nfs4_handle_reclaim_lease_error, so that we can
handle errors such as NFS4ERR_SEQ_MISORDERED correctly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-27 14:50:44 -04:00
Trond Myklebust
9f594791dd NFSv4.1: Handle other occurrences of NFS4ERR_CONN_NOT_BOUND_TO_SESSION
Let nfs4_schedule_session_recovery() handle the details of choosing
between resetting the session, and other session related recovery.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-27 14:33:07 -04:00
Trond Myklebust
7c5d725684 NFSv4.1: Handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION in the state manager
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-27 14:33:07 -04:00
Trond Myklebust
bf674c8228 NFSv4.1: Handle errors in nfs4_bind_conn_to_session
Ensure that we handle NFS4ERR_DELAY errors separately, and then
let nfs4_recovery_handle_error() handle all other cases.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-27 14:32:06 -04:00
Trond Myklebust
43ac544cb3 NFSv4.1: nfs4_bind_conn_to_session should drain the session
In order to avoid races with other RPC calls that end up setting the
NFS4CLNT_BIND_CONN_TO_SESSION flag.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-27 14:00:08 -04:00
Trond Myklebust
32b0131069 NFSv4.1: Don't clobber the seqid if exchange_id returns a confirmed clientid
If the EXCHGID4_FLAG_CONFIRMED_R flag is set, the client is in theory
supposed to already know the correct value of the seqid, in which case
RFC5661 states that it should ignore the value returned.

Also ensure that if the sanity check in nfs4_check_cl_exchange_flags
fails, then we must not change the nfs_client fields.

Finally, clean up the code: we don't need to retest the value of
'status' unless it can change.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-26 14:17:31 -04:00
Trond Myklebust
6624553910 NFSv4.1: Add DESTROY_CLIENTID
Ensure that we destroy our lease on last unmount

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-26 14:17:30 -04:00
Trond Myklebust
2cf047c994 NFSv4.1: Ensure we use the correct credentials for bind_conn_to_session
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Weston Andros Adamson <dros@netapp.com>
2012-05-25 18:02:10 -04:00
Trond Myklebust
848f5bda54 NFSv4.1: Ensure we use the correct credentials for session create/destroy
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-25 18:02:09 -04:00
Trond Myklebust
ad24ecfbcd NFSv4.1: Move NFSPROC4_CLNT_BIND_CONN_TO_SESSION to the end of the operations
For backward compatibility with nfs-utils.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Weston Andros Adamson <dros@netapp.com>
2012-05-25 18:02:09 -04:00
Trond Myklebust
89a217360e NFSv4.1: Handle NFS4ERR_SEQ_MISORDERED when confirming the lease
Apparently the patch "NFS: Always use the same SETCLIENTID boot verifier"
is tickling a Linux nfs server bug, and causing a regression: the server
can get into a situation where it keeps replying NFS4ERR_SEQ_MISORDERED
to our CREATE_SESSION request even when we are sending the correct
sequence ID.

Fix this by purging the lease and then retrying.

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-25 16:17:42 -04:00
Trond Myklebust
be0bfed002 NFSv4: When purging the lease, we must clear NFS4CLNT_LEASE_CONFIRM
Otherwise we can end up not sending a new exchange-id/setclientid

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-25 16:17:13 -04:00
Trond Myklebust
2a6ee6aa2f NFSv4: Clean up the error handling for nfs4_reclaim_lease
Try to consolidate the error handling for nfs4_reclaim_lease into
a single function instead of doing a bit here, and a bit there...

Also ensure that NFS4CLNT_PURGE_STATE handles errors correctly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-25 16:17:13 -04:00
Trond Myklebust
bbafffd293 NFSv4.1: Exchange ID must use GFP_NOFS allocation mode
Exchange ID can be called in a lease reclaim situation, so it
will deadlock if it then tries to write out dirty NFS pages.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:31:39 -04:00
Weston Andros Adamson
a9e64442f1 nfs41: Use BIND_CONN_TO_SESSION for CB_PATH_DOWN*
The state manager can handle SEQ4_STATUS_CB_PATH_DOWN* flags with a
BIND_CONN_TO_SESSION instead of destroying the session and creating a new one.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:26:21 -04:00
Weston Andros Adamson
7c44f1ae4a nfs4.1: add BIND_CONN_TO_SESSION operation
This patch adds the BIND_CONN_TO_SESSION operation which is needed for
upcoming SP4_MACH_CRED work and useful for recovering from broken connections
without destroying the session.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:22:19 -04:00
Andy Adamson
d23d61c8d3 NFSv4.1 test the mdsthreshold hint parameters
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:15:49 -04:00
Andy Adamson
2701d086db NFSv4.1 add nfs_inode book keeping for mdsthreshold
Keep track of the number of bytes read or written via buffered, direct, and
mem-mapped i/o for use by mdsthreshold size_io hints.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:15:48 -04:00
Andy Adamson
82be417aa3 NFSv4.1 cache mdsthreshold values on OPEN
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:15:48 -04:00
Andy Adamson
88034c3d88 NFSv4.1 mdsthreshold attribute xdr
We only support one layout type per file system, so one threshold_item4 per
mdsthreshold4.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24 16:15:47 -04:00
Trond Myklebust
54ac471c83 NFS: Add memory barriers to the nfs_client->cl_cons_state initialisation
Ensure that a process that uses the nfs_client->cl_cons_state test
for whether the initialisation process is finished does not read
stale data.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-23 15:24:59 -04:00
Trond Myklebust
4697bd5e94 NFSv4: Fix a race in the net namespace mount notification
Since the struct nfs_client gets added to the global nfs_client_list
before it is initialised, it is possible that rpc_pipefs_event can
end up trying to create idmapper entries on such a thing.

The solution is to have the mount notification wait for the
initialisation of each nfs_client to complete, and then to
skip any entries for which the it failed.

Reported-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
2012-05-23 15:21:13 -04:00
Trond Myklebust
7b38c3682c NFSv4.1: Fix session initialisation races
Session initialisation is not complete until the lease manager
has run. We need to ensure that both nfs4_init_session and
nfs4_init_ds_session do so, and that they check for any resulting
errors in clp->cl_cons_state.

Only after this is done, can nfs4_ds_connect check the contents
of clp->cl_exchange_flags.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Andy Adamson <andros@netapp.com>
2012-05-23 15:20:57 -04:00
Chuck Lever
acdeb69d9c NFS: EXCHANGE_ID should save the server major and minor ID
Save the server major and minor ID results from EXCHANGE_ID, as they
are needed for detecting server trunking.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:48 -04:00
Chuck Lever
4bf590e08f NFS: Add nfs_client behavior flags
"noresvport" and "discrtry" can be passed to nfs_create_rpc_client()
by setting flags in the passed-in nfs_client.  This change makes it
easy to add new flags.

Note that these settings are now "sticky" over the lifetime of a
struct nfs_client, and may even be copied when an nfs_client is
cloned.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:47 -04:00
Chuck Lever
8cab4c390b NFS: Refactor nfs_get_client(): initialize nfs_client
Clean up: Continue to rationalize the locking in nfs_get_client() by
moving the logic that handles the case where a matching server IP
address is not found.

When we support server trunking detection, client initialization may
return a different nfs_client struct than was passed to it.  Change
the synopsis of the init_client methods to return an nfs_client.

The client initialization logic in nfs_get_client() is not much more
than a wrapper around ->init_client.  It's simpler to keep the little
bits of error handling in the version-specific init_client methods.

No behavior change is expected.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:47 -04:00
Chuck Lever
f411703adc NFS: Refactor nfs_get_client(): add nfs_found_client()
Clean up: Code that takes and releases nfs_client_lock remains in
nfs_get_client().  Logic that handles a pre-existing nfs_client is
moved to a separate function.

No behavior change is expected.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:46 -04:00
Chuck Lever
f092075dd3 NFS: Always use the same SETCLIENTID boot verifier
Currently our NFS client assigns a unique SETCLIENTID boot verifier
for each server IP address it knows about.  It's set to CURRENT_TIME
when the struct nfs_client for that server IP is created.

During the SETCLIENTID operation, our client also presents an
nfs_client_id4 string to servers, as an identifier on which the server
can hang all of this client's NFSv4 state.  Our client's
nfs_client_id4 string is unique for each server IP address.

An NFSv4 server is obligated to wipe all NFSv4 state associated with
an nfs_client_id4 string when the client presents the same
nfs_client_id4 string along with a changed SETCLIENTID boot verifier.

When our client unmounts the last of a server's shares, it destroys
that server's struct nfs_client.  The next time the client mounts that
NFS server, it creates a fresh struct nfs_client with a fresh boot
verifier.  On seeing the fresh verifer, the server wipes any previous
NFSv4 state associated with that nfs_client_id4.

However, NFSv4.1 clients are supposed to present the same
nfs_client_id4 string to all servers.  And, to support Transparent
State Migration, the same nfs_client_id4 string should be presented
to all NFSv4.0 servers so they recognize that migrated state for this
client belongs with state a server may already have for this client.
(This is known as the Uniform Client String model).

If the nfs_client_id4 string is the same but the boot verifier changes
for each server IP address, SETCLIENTID and EXCHANGE_ID operations
from such a client could unintentionally result in a server wiping a
client's previously obtained lease.

Thus, if our NFS client is going to use a fixed nfs_client_id4 string,
either for NFSv4.0 or NFSv4.1 mounts, our NFS client should use a
boot verifier that does not change depending on server IP address.
Replace our current per-nfs_client boot verifier with a per-nfs_net
boot verifier.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:46 -04:00
Chuck Lever
2c820d9a97 NFS: Force server to drop NFSv4 state
nfs4_reset_all_state() refreshes the boot verifier a server sees to
trigger that server to wipe this client's state.  This function is
invoked when an NFSv4.1 server reports that it has revoked some or
all of a client's NFSv4 state.

To facilitate server trunking discovery, we will eventually want to
move the cl_boot_time field to a more global structure.  The Uniform
Client String model (and specifically, server trunking detection)
requires that all servers see the same boot verifier until the client
actually does reboot, and not a fresh verifier every time the client
unmounts and remounts the server.

Without the cl_boot_time field, however, nfs4_reset_all_state() will
have to find some other way to force the server to purge the client's
NFSv4 state.

Because these verifiers are opaque (ie, the server doesn't know or
care that they happen to be timestamps), we can force the server
to wipe NFSv4 state by updating the boot verifier as we do now, then
immediately afterwards establish a fresh client ID using the old boot
verifier again.

Hopefully there are no extra paranoid server implementations that keep
track of the client's boot verifiers and prevent clients from reusing
a previous one.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:45 -04:00
Chuck Lever
ce1c8fc12d NFS: Remove nfs_unique_id
Clean up:  this structure is unused.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:45 -04:00
Chuck Lever
177313f149 NFS: Clean up return code checking in nfs4_proc_exchange_id()
Clean up: update to use matching types in "if" expressions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:44 -04:00
Chuck Lever
73ea666c2b NFS: Use proper naming conventions for the nfs_client.net field
Clean up:  When naming fields and data types, follow established
conventions to facilitate accurate grep/cscope searches.

Introduced by commit e50a7a1a "NFS: make NFS client allocated per
network namespace context," Tue Jan 10, 2012.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:44 -04:00
Chuck Lever
591555465e NFS: Use proper naming conventions for nfs_client.impl_id field
Clean up:  When naming fields and data types, follow established
conventions to facilitate accurate grep/cscope searches.

Additionally, for consistency, move the impl_id field into the NFSv4-
specific part of the nfs_client, and free that memory in the logic
that shuts down NFSv4 nfs_clients.

Introduced by commit 7d2ed9ac "NFSv4: parse and display server
implementation ids," Fri Feb 17, 2012.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:43 -04:00
Chuck Lever
79d4e1f0d8 NFS: Use proper naming conventions for NFSv4.1 server scope fields
Clean up:  When naming fields and data types, follow established
conventions to facilitate accurate grep/cscope searches.

Additionally, for consistency, move the scope field into the NFSv4-
specific part of the nfs_client, and free that memory in the logic
that shuts down NFSv4 nfs_clients.

Introduced by commit 99fe60d0 "nfs41: exchange_id operation", April
1 2009.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:43 -04:00
Chuck Lever
e3c0fb7ef5 NFS: Add NFSDBG_STATE
fs/nfs/nfs4state.c does not yet have any dprintk() call sites, and I'm
about to introduce some.  We will need a new flag for enabling them.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:42 -04:00
Chuck Lever
c3607282b4 NFS: Don't swap bytes in nfs4_construct_boot_verifier()
The SETCLIENTID boot verifier is opaque to NFSv4 servers, thus there
is no requirement for byte swapping before the client puts the
verifier on the wire.

This treatment is similar to other timestamp-based verifiers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:45:42 -04:00
Bryan Schumaker
497826af60 NFS: Fix compiler warnings
The "struct inode *inode" was only used in a dprintk, so compiling with
CONFIG_SUNRPC_DEBUG off triggers a warning.  To get around this, I
remove the "struct inode *inode" variable and instead change the
dprintk()s to use hdr->inode instead.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:43:04 -04:00
Andy Adamson
bd4aeffb5b NFSv4.1 skip rpc_call_done only on disconnected DS slot_table_waitq tasks
We reset all I/O on a disconnected data server through the pgio layer indicated
by the NFS_IOHDR_REDO flag.

Differentiate between on-the-wire tasks returning with an error which must
call rpc_call_done and tasks woken from the data server slot_table_waitq
waiting for a session slot with a status of zero which call rpc_exit in
rpc_prepare and need to skip rpc_call_done.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:42:42 -04:00
Andy Adamson
996074cb8c NFSv4.1 Just use nfs_put_client in filelayout release
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:42:37 -04:00
Andy Adamson
d42e78737c NFSv4.1 fix null state reference in filelayout_async_handle_error
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:42:28 -04:00
Trond Myklebust
53b8ee3464 NFSv4.1: Fix a bad reference count issue in the pNFS commit code
filelayout_scan_commit_lists needs to bump the reference count on
the struct nfs_page just like nfs_scan_commit_list().

Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-22 16:36:27 -04:00
Trond Myklebust
b3f87b98aa Merge branch 'bugfixes' into nfs-for-next 2012-05-21 10:12:39 -04:00
Andy Adamson
041245c88a NFSv4.1 resend LAYOUTGET on data server invalid layout errors
The "invalid layout" class of errors is handled by destroying the layout and
getting a new layout from the server.  Currently, the layout must be
destroyed before a new layout can be obtained.

This means that all references (e.g.lsegs) to the "to be destroyed" layout
header must be dropped before it can be destroyed. This in turn means waiting
for all in flight RPC's using the old layout as well as draining the data
server session slot table wait queue.

Set the NFS_LAYOUT_INVALID flag to redirect I/O to the MDS while waiting for
the old layout to be destroyed.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:55:33 -04:00
Andy Adamson
b4a2967e52 NFSv4.1 dereference a disconnected data server client record
When the last DS io is processed, the data server client record will be
freed.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:55:32 -04:00
Andy Adamson
3a7936c3fc NFSv4.1 ref count nfs_client across filelayout data server io
Prepare to put a dis-connected DS client record.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:55:32 -04:00
Andy Adamson
0a57cdac3f NFSv4.1 send layoutreturn to fence disconnected data server
Let the MDS know that you are redirecting I/O from pNFS to MDS.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:55:31 -04:00
Andy Adamson
671fb89695 NFSv4.1 wake up all tasks on un-connected DS slot table waitq
The DS has a connection error (invalid deviceid). Drain the fore channel
slot table waitq.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:55:31 -04:00
Andy Adamson
0ad2f378e1 NFSv4.1 Check invalid deviceid upon slot table waitq wakeup
Tasks sleeping on the slot table waitq wake to the rpc_prepare_task state.
Reset the task for io through the MDS if the deviceid is invalid.

The reset functions put the io pages through the pageio layer which has the
advantage of re-coalescing which allows for the MDS and DS having different
r/wsizes. Exit the awakened task without executing the rpc_call_done routine.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:55:31 -04:00
Andy Adamson
a033a09189 NFSv4.1 remove nfs4_reset_write and nfs4_reset_read
Replaced by filelayout_reset_write and filelayout_reset_read

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:54:59 -04:00
Andy Adamson
e7dd79af01 NFSv4.1: mark deviceid invalid on filelayout DS connection errors
This prevents the use of any layout for i/o that references the deviceid.
I/O is redirected through the MDS.

Redirect the unhandled failed I/O to the MDS without marking either the
layout or the deviceid invalid.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:54:20 -04:00
Andy Adamson
98fc685ae2 NFSv4.1 data server timeo and retrans module parameters
Set the recovery parameters for data servers.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:54:20 -04:00
Andy Adamson
9f0ec176b3 NFSv4.1 set RPC_TASK_SOFTCONN for filelayout DS RPC calls
RPC_TASK_SOFTCONN returns connection errors to the caller which allows the pNFS
file layout to quickly try the MDS or perhaps another DS.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:54:19 -04:00
Andy Adamson
90fecfcb34 NFSv4.1 cleanup filelayout invalid layout handling
The invalid layout bits are should only be used to block LAYOUTGETs.

Do not invalidate a layout on deviceid invalidation.
Do not invalidate a layout on un-handled READ, WRITE, COMMIT errors.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:54:19 -04:00
Andy Adamson
554d458d79 NFSv4.1: cleanup filelayout invalid deviceid handling
Move the invalid deviceid test into nfs4_fl_prepare_ds, called by the
filelayout read, write, and commit routines. NFS4_DEVICE_ID_NEG_ENTRY
is no longer needed.
Remove redundant printk's - filelayout_mark_devid_invalid prints a KERN_WARNING.

An invalid device prevents pNFS io.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:54:18 -04:00
Matthew Treinish
e73e6c9e85 Fixed goto readability in nfs_update_inode.
Simplified error gotos to make it slightly easier to read,
it doesn't affect the functionality of the routine.

Signed-off-by: Matthew Treinish <treinish@linux.vnet.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19 17:10:10 -04:00
Benny Halevy
5f23eff381 NFS: fix unsigned comparison in nfs4_create_sec_client
fs/nfs/nfs4namespace.c: In function ‘nfs4_create_sec_client’:
fs/nfs/nfs4namespace.c:171:2: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits]

Introduced by commit 72de53ec4b
"NFS: Do secinfo as part of lookup"

Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-16 10:36:50 -07:00
Trond Myklebust
39ffb9218e NFS: Fix a compile issue when CONFIG_NFS_FSCACHE was undefined
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-16 10:24:20 -07:00
Dan Carpenter
5abc03cd91 NFS: kmalloc() doesn't return an ERR_PTR()
Obviously we should check for NULL here instead of IS_ERR().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@vger.kernel.org [3.4]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:44:01 -07:00
Bryan Schumaker
981f9face8 NFS: Turn v3 on by default
Most users will use NFS v3 or possibly v4 so this makes it easier for
them.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:42:22 -07:00
Bryan Schumaker
2ba68002a7 NFS: Make v2 configurable
With this patch NFS v2 can be disabled during Kconfig.  I default the
option to "y" to match the current behavior.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:42:22 -07:00
Bryan Schumaker
5e7e5a0da2 NFS: Create an NFS v3 stat_to_errno()
In theory, NFS v3 can have different error versions than NFS v2. v4 is
already using its own nfs4_stat_to_errno() to map error codes, so
rather than create something in the generic client for v2 and v3 to
share I instead give v3 its own function.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:42:21 -07:00
Bryan Schumaker
87c7083dc3 NFS: Pass mntfh as part of the nfs_mount_info structure
This allows me to use the filehandle allocated in nfs_fs_mount() for nfs
v4 mounts instead of allocating a new one.  Rather than change
nfs4_mount() to look almost exactly like nfs_fs_mount(), I instead
remove the function.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:31 -07:00
Bryan Schumaker
46058d46d3 NFS: Allocate parsed mount data directly to the nfs_mount_info structure
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:31 -07:00
Bryan Schumaker
d72c727cd9 NFS: Create a single nfs_validate_mount_data() function
This new function chooses between the v2/3 parser and the v4 parser by
filesystem type.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:30 -07:00
Bryan Schumaker
b72e4f42a3 NFS: Create a single function for text mount data
The v2/3 and v4 cases were very similar, with just a few parameters
changed.  This makes it easy to share code.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:30 -07:00
Bryan Schumaker
486aa699ff NFS: Create a new nfs_try_mount()
This function returns the same same return type as nfs4_try_mount() so
they two can be more easily substituted.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:29 -07:00
Bryan Schumaker
db83335191 NFS: Let mount data parsing set the NFS version
This field is unconditionally set while parsing mount data, so there is
no need to fill it in here.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:29 -07:00
Bryan Schumaker
21e4b82e13 NFS: Use nfs_fs_mount_common() for remote referral mounts
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:28 -07:00
Bryan Schumaker
3d176e3fe4 NFS: Use nfs_fs_mount_common() for xdev mounts
At this point, there are only a few small differences between these two
functions.  I can set a few function pointers in the nfs_mount_info
struct to get around these differences.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:28 -07:00
Bryan Schumaker
8c958e0c4c NFS: Create a common xdev_mount() function
The only difference between nfs_xdev_mount() and nfs4_xdev_mount() is the
clone_super() function called to clone the super block.  I can combine
these two functions by using the fill_super field in the mount_info
structure.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:28 -07:00
Bryan Schumaker
c40f8d1d35 NFS: Create a common fs_mount() function
The nfs4_remote_mount() function was only slightly different from the
nfs_fs_mount() function used by the generic client.  I created a new
nfs_mount_info structure to set different parameters to help combine
these functions.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:27 -07:00
Bryan Schumaker
586f95cd4f NFS: Remove NFS4_MOUNT_UNSHARED
This flag is numerically equivalent to NFS_MOUNT_UNSHARED, so I can
remove it to make collapsing functions more straightforward.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:27 -07:00
Bryan Schumaker
2311b9439c NFS: Don't pass mount data to nfs_fscache_get_super_cookie()
I intend on creating a single nfs_fs_mount() function used by all our
mount paths.  To avoid checking between new mounts and clone mounts, I
instead pass both structures to a new function in super.c that finds the
cache key and then looks up the super cookie.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:26 -07:00
Bryan Schumaker
bae36241be NFS: Create a single nfs_get_root()
This patch splits out the NFS v4 specific functionality of
nfs4_get_root() into its own rpc_op called by the generic client, and
leaves nfs4_proc_get_rootfh() as its own stand alone function.  This
also allows me to change nfs4_remote_mount(), nfs4_xdev_mount() and
nfs4_remote_referral_mount() to use the generic client's nfs_get_root()
function.  Later patches in this series will collapse these functions
into one common function, so using the same get_root() function
everywhere simplifies future changes.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:26 -07:00
Bryan Schumaker
3028eb2b32 NFS: Rename nfs4_proc_get_root()
This function is really getting the root filehandle and not the root
dentry of the filesystem.  I also removed the rpc_ops lookup from
nfs4_get_rootfh() under the assumption that if we reach this function
then we already know we are using NFS v4.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-14 17:30:25 -07:00
Linus Torvalds
26fe575028 vfs: make it possible to access the dentry hash/len as one 64-bit entry
This allows comparing hash and len in one operation on 64-bit
architectures.  Right now only __d_lookup_rcu() takes advantage of this,
since that is the case we care most about.

The use of anonymous struct/unions hides the alternate 64-bit approach
from most users, the exception being a few cases where we initialize a
'struct qstr' with a static initializer.  This makes the problematic
cases use a new QSTR_INIT() helper function for that (but initializing
just the name pointer with a "{ .name = xyzzy }" initializer remains
valid, as does just copying another qstr structure).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-10 19:54:35 -07:00
Trond Myklebust
0427708657 NFS: Clean up - Simplify reference counting in fs/nfs/direct.c
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-05-09 15:17:49 -04:00
Trond Myklebust
1d1afcbc29 NFS: Clean up - Rename nfs_unlock_request and nfs_unlock_request_dont_release
Function rename to ensure that the functionality of nfs_unlock_request()
mirrors that of nfs_lock_request(). Then let nfs_unlock_and_release_request()
do the work of what used to be called nfs_unlock_request()...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-05-09 15:17:43 -04:00
Trond Myklebust
7ad84aa944 NFS: Clean up - simplify nfs_lock_request()
We only have two places where we need to grab a reference when trying
to lock the nfs_page. We're better off making that explicit.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-05-09 15:17:34 -04:00
Trond Myklebust
d1182b33ed NFS: nfs_set_page_writeback no longer needs to reference the page
We now hold a reference to the nfs_page across the calls to
nfs_set_page_writeback and nfs_end_page_writeback, and that
means we already have a reference to the struct page.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-05-09 15:17:28 -04:00
Trond Myklebust
3aff4ebb95 NFS: Prevent a deadlock in the new writeback code
We have to unlock the nfs_page before we call nfs_end_page_writeback
to avoid races with functions that expect the page to be unlocked
when PG_locked and PG_writeback are not set.
The problem is that nfs_unlock_request also releases the nfs_page,
causing a deadlock if the release of the nfs_open_context
triggers an iput() while the PG_writeback flag is still set...

The solution is to separate the unlocking and release of the nfs_page,
so that we can do the former before nfs_end_page_writeback and the
latter after.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-05-09 15:16:07 -04:00
Trond Myklebust
dc327ed4cd NFSv4: nfs_client_return_marked_delegations can't flush data
Since even filemap_flush() needs to lock pages that are dirty, we
cannot risk calling it from the state manager context. Therefore,
we need to move the call to filemap_flush() to
nfs_async_inode_return_delegation().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-08 12:53:21 -04:00
Trond Myklebust
c57d1bc5e0 NFS: nfs_inode_return_delegation() should always flush dirty data
The assumption is that if you are in a situation where you need to
return the delegation, then you should probably stop caching the
data anyway.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-08 12:53:21 -04:00
Trond Myklebust
14546c3375 NFS: Don't do a full flush to disk on close() if we hold a delegation
If we hold a delegation then we know that it should be safe to continue
to cache the data beyond the close(). However since the process that wrote
the data may die after close(), we may still want to send the data to
server before those RPCSEC_GSS credentials expire. We therefore compromise
by starting writeback to the server, but don't wait for completion.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-08 12:53:21 -04:00
Jan Kara
dbd5768f87 vfs: Rename end_writeback() to clear_inode()
After we moved inode_sync_wait() from end_writeback() it doesn't make sense
to call the function end_writeback() anymore. Rename it to clear_inode()
which well says what the function really does - set I_CLEAR flag.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-05-06 13:43:41 +08:00
Trond Myklebust
1385b81173 NFS: Fix sparse warnings
Fix the following sparse warnings:

fs/nfs/direct.c:221:6: warning: symbol 'nfs_direct_readpage_release' was
not declared. Should it be static?
fs/nfs/read.c:38:43: warning: non-ANSI function declaration of function
'nfs_readhdr_alloc'
fs/nfs/objlayout/objio_osd.c:214:5: warning: symbol '__alloc_objio_seg'
was not declared. Should it be static?

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
2012-05-04 14:59:51 -04:00
Trond Myklebust
bf5fc4028e NFS: Fix O_DIRECT compile warnings
Fix the following compile warnings:
fs/nfs/direct.c: In function 'nfs_direct_read_schedule_segment':
fs/nfs/direct.c:325:11: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c:325:11: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c:325:11: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c:352:27: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c: In function 'nfs_direct_write_schedule_segment':
fs/nfs/direct.c:622:11: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c:622:11: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c:622:11: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c:650:27: warning: comparison of distinct pointer types
lacks a cast [enabled by default]

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
2012-05-04 14:08:09 -04:00
Trond Myklebust
d69ee9b855 NFS: Adapt readdirplus to application usage patterns
While the use of READDIRPLUS is significantly more efficient than
READDIR followed by many LOOKUP calls, it is still less efficient
than just READDIR if the attributes are not required.

This patch tracks when lookups are attempted on the directory,
and uses that information to selectively disable READDIRPLUS
on that directory.
The first 'readdir' call is always served using READDIRPLUS.
Subsequent calls only use READDIRPLUS if there was a successful
lookup or revalidation on a child in the mean time.

Credit for the original idea should go to Neil Brown. See:
      http://www.spinics.net/lists/linux-nfs/msg19996.html
However, the implementation in this patch differs from Neil's
in that it focuses on tracking lookups rather than calls to
stat().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Neil Brown <neilb@suse.de>
2012-05-01 19:16:43 -04:00
Trond Myklebust
8582715e73 NFSv4: COMMIT does not need post-op attributes
No attributes are supposed to change during a COMMIT call, so there
is no need to request post-op attributes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01 15:42:45 -04:00
Trond Myklebust
5a37f85131 NFSv4: Don't request cache consistency attributes on some writes
We don't need cache consistency information when we're doing O_DIRECT
writes. Ditto for the case of delegated writes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01 15:42:45 -04:00
Trond Myklebust
778d28172f NFSv4: Simplify the NFSv4 REMOVE, LINK and RENAME compounds
Get rid of the post-op GETATTR on the directory in order to reduce
the amount of processing done on the server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01 15:42:44 -04:00
Trond Myklebust
7c317fcfba NFSv4: Simplify the NFSv4 CREATE compound
Get rid of the post-op GETATTR on the directory in order to reduce
the amount of processing done on the server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01 15:42:44 -04:00
Trond Myklebust
90ff0c548d NFSv4: Simplify the NFSv4 OPEN compound
Get rid of the post-op GETATTR on the directory in order to reduce
the amount of processing done on the server.

The cost is that if we later need to stat() the directory, then we
know that the ctime and mtime are likely to be invalid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01 15:42:43 -04:00
Trond Myklebust
fee7fe196c NFS: Simplify the cache invalidation code
Now that NFSv2 and NFSv3 have simulated change attributes,
instead of using all three of mtime, ctime and change attribute to
manage data cache consistency, we can simplify the code to just use
the change attribute.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01 15:42:43 -04:00
Trond Myklebust
3a1556e866 NFSv2/v3: Simulate the change attribute
Use the ctime to simulate a change attribute for NFSv2 and NFSv3.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01 15:42:43 -04:00
Trond Myklebust
6a4506c0b5 NFS: Change attribute updates should set NFS_INO_REVAL_PAGECACHE
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01 15:42:42 -04:00
Trond Myklebust
4124bbc521 NFS: Simplify nfs_fhget()
If the inode is being initialised, there is no point in
setting flags such as NFS_INO_INVALID_ACCESS,
NFS_INO_INVALID_ACL or NFS_INO_INVALID_DATA since there are
no cached access calls, acls or data caches to invalidate.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01 15:42:42 -04:00
Trond Myklebust
8d197a568f NFS: Always trust the PageUptodate flag when we have a delegation
We can always use the optimal full page write if we know that we
hold a delegation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01 15:42:41 -04:00
Trond Myklebust
01da47bde7 NFS: Optimise away nfs_check_inode_attributes() when holding a delegation
We already know that the attribute cache is valid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01 15:42:41 -04:00
Trond Myklebust
b4b1eadf7c NFS: Don't force page cache revalidations when holding a delegation
If we're holding a delegation, then we already know that our
page cache is valid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01 15:42:40 -04:00
Trond Myklebust
e144cbcc25 NFSv4: Retrieve attributes _before_ calling delegreturn
In order to retrieve cache consistency attributes before
anyone else has a chance to change the inode, we need to
put the GETATTR op _before_ the DELEGRETURN op.

We can then use that as part of a 'nfs_post_op_update_inode_force_wcc()'
call, to ensure that we update the attributes without clearing our
cached data.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01 15:42:40 -04:00
Trond Myklebust
9e907fec6e NFSv4: Delegreturn only needs the cache consistency bitmask
In order to do close-to-open cache consistency checking after
a delegreturn, we don't need to retrieve the full set of
attributes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01 15:42:40 -04:00
Trond Myklebust
a9f6991b6c NFSv4: Fix a typo in NFS4_enc_link_sz
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01 15:42:39 -04:00
Trond Myklebust
4bd8b01013 NFS: Simplify the nfs_read_completion functions
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-05-01 15:42:22 -04:00
Trond Myklebust
25b11dcdbf NFS: Clean up nfs read and write error paths
Move the error handling for nfs_generic_pagein() into a single function.
Ditto for nfs_generic_flush().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-05-01 13:48:13 -04:00
Trond Myklebust
9146ab5055 NFS: Read cleanups
Remove unused variables, and reformat some code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-05-01 13:48:13 -04:00
Trond Myklebust
292f3eeef0 NFS: Use kmem_cache_zalloc() in nfs_direct_req_alloc
Simplify the initialisation of O_DIRECT requests.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-04-30 18:31:49 -04:00
Trond Myklebust
6d74743b08 NFS: Simplify O_DIRECT page referencing
The O_DIRECT code shouldn't need to hold 2 references to each page. The
reference held by the struct nfs_page should suffice.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-04-30 14:33:52 -04:00
Trond Myklebust
3e9e0ca3f1 NFS: O_DIRECT pgio_completion_ops error_cleanup must unlock the request
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-04-30 14:33:51 -04:00
Trond Myklebust
71e8cc00c6 NFS: Ensure that we break out of read/write_schedule_segment on error
Currently we do break out of the for() loop, but we also need to
break out of the enclosing do {} while()...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-04-30 14:33:51 -04:00
Bryan Schumaker
68cd6fa4f3 NFS: Define dummy nfs_init_cinfo() and nfs_init_cinfo_from_inode()
These are needed when v3 and v4 are not enabled.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-30 14:32:36 -04:00
Bryan Schumaker
24fc9211f4 NFS: Define nfs_direct_write_schedule_work() when v3 and v4 are disabled
v2 doesn't have commits, so this function can be a no-op.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-30 14:19:14 -04:00
Bryan Schumaker
b58fee2189 NFS: pnfs_pageio_init_read() and init_write() need an extra argument
This is only when CONFIG_NFS_V4_1 isn't enabled.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Acked-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-30 14:06:46 -04:00
Trond Myklebust
3617e5031b NFSv4.1: Use the correct hostname in the client identifier string
We need to use the hostname of the process that created the nfs_client.
That hostname is now stored in the rpc_client->cl_nodename.

Also remove the utsname()->domainname component. There is no reason
to include the NIS/YP domainname in a client identifier string.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-30 12:04:58 -04:00
Stanislav Kinsbursky
71dfc5fa51 NFS: get module in idmap PipeFS notifier callback
This is bug fix.
Notifier callback is called from SUNRPC module. So before dereferencing NFS
module we have to make sure, that it's alive.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-28 13:22:19 -04:00
Trond Myklebust
9b5415b536 NFS: Fix a use-before-initialised warning in fs/nfs/write.c and fs/nfs/pnfs.c
If the allocation of nfs_write_header fails, the list of nfs_pages that
needs to be cleaned up is still on desc->pg_list...

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Fred Isaman <iisaman@netapp.com>
2012-04-27 15:03:51 -04:00
Bryan Schumaker
80a16b21a8 NFS: Remove extra rpc_clnt argument to proc_lookup
Now that I'm doing secinfo automatically in the v4 code this extra
argument isn't needed.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:39 -04:00
Bryan Schumaker
281cad46b3 NFS: Create a submount rpc_op
This simplifies the code for v2 and v3 and gives v4 a chance to decide
on referrals without needing to modify the generic client.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:39 -04:00
Bryan Schumaker
2671bfc3be NFS: Remove secinfo knowledge out of the generic client
And also remove the unneeded rpc_op.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:39 -04:00
Fred Isaman
df0117481c NFS: Prevent garbage cinfo->ds from leaking out
This is a bugfix that applies on top of the previous directio patches,
that fixes a bug introduced in "NFS: create struct nfs_commit_info".

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:39 -04:00
Fred Isaman
1763da1234 NFS: rewrite directio write to use async coalesce code
This also has the advantage that it allows directio to use pnfs.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:39 -04:00
Fred Isaman
56f9cd684d NFS: avoid some stat gathering for direct io
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:39 -04:00
Fred Isaman
b359f9d09b NFS: add dreq to nfs_commit_info
Need this to pass into nfs_commitdata_init, in order to keep data->dreq
accurate.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:38 -04:00
Fred Isaman
f453a54a01 NFS: create nfs_commit_completion_ops
Factors out the code that needs to change when directio
starts using these code paths.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:38 -04:00
Fred Isaman
ea2cf2282b NFS: create struct nfs_commit_info
It is COMMIT that is handled the most differently between
the paged and direct paths.  Create a structure that encapsulates
everything either path needs to know about the commit state.

We could use void to hide some of the layout driver stuff, but
Trond suggests pulling it out to ensure type checking, given the
huge changes being made, and the fact that it doesn't interfere
with other drivers.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:38 -04:00
Fred Isaman
84c53ab5c0 NFS: create nfs_generic_commit_list
Simple refactoring.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:38 -04:00
Fred Isaman
584aa810b6 NFS: rewrite directio read to use async coalesce code
This also has the advantage that it allows directio to use pnfs.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:38 -04:00
Fred Isaman
1825a0d08f NFS: prepare coalesce testing for directio
The coalesce code made assumptions that will no longer be true once
non-page aligned io occurs.  This introduces no change in
current behavior, but allows for more general situations to come.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:38 -04:00
Fred Isaman
9533da2979 NFS: remove unused wb_complete field from struct nfs_page
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:38 -04:00
Fred Isaman
061ae2edb7 NFS: create completion structure to pass into page_init functions
Factors out the code that will need to change when directio
starts using these code paths.  This will allow directio to use
the generic pagein and flush routines

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:38 -04:00
Fred Isaman
6c75dc0d49 NFS: merge _full and _partial write rpc_ops
Decouple nfs_pgio_header and nfs_write_data, and have (possibly
multiple) nfs_write_datas each take a refcount on nfs_pgio_header.

For the moment keeps nfs_write_header as a way to preallocate a single
nfs_write_data with the nfs_pgio_header.  The code doesn't need this,
and would be prettier without, but given the amount of churn I am
already introducing I didn't want to play with tuning new mempools.

This also fixes bug in pnfs_ld_handle_write_error.  In the case of
desc->pg_bsize < PAGE_CACHE_SIZE, the pages list was empty, causing
replay attempt to do nothing.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:37 -04:00
Fred Isaman
4db6e0b74c NFS: merge _full and _partial read rpc_ops
Decouple nfs_pgio_header and nfs_read_data, and have (possibly
multiple) nfs_read_datas each take a refcount on nfs_pgio_header.

For the moment keeps nfs_read_header as a way to preallocate a single
nfs_read_data with the nfs_pgio_header.  The code doesn't need this,
and would be prettier without, but given the amount of churn I am
already introducing I didn't want to play with tuning new mempools.

This also fixes bug in pnfs_ld_handle_read_error.  In the case of
desc->pg_bsize < PAGE_CACHE_SIZE, the pages list was empty, causing
replay attempt to do nothing.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:37 -04:00
Fred Isaman
30dd374f6f NFS: create struct nfs_page_array
Both nfs_read_data and nfs_write_data devote several fields which
can be combined into a single shared struct.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:37 -04:00
Fred Isaman
cd841605f7 NFS: create common nfs_pgio_header for both read and write
In order to avoid duplicating all the data in nfs_read_data whenever we
split it up into multiple RPC calls (either due to a short read result
or due to rsize < PAGE_SIZE), we split out the bits that are the same
per RPC call into a separate "header" structure.

The goal this patch moves towards is to have a single header
refcounted by several rpc_data structures.  Thus, want to always refer
from rpc_data to the header, and not the other way.  This patch comes
close to that ideal, but the directio code currently needs some
special casing, isolated in the nfs_direct_[read_write]hdr_release()
functions.  This will be dealt with in a future patch.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:37 -04:00
Fred Isaman
b554284976 NFS: use req_offset where appropriate
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:37 -04:00
Fred Isaman
cd12ae326f NFS: remove unnecessary casts of void pointers in nfs4filelayout.c
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:37 -04:00
Fred Isaman
c5996c4efb NFS: reverse arg order in nfs_initiate_[read|write]
Make it consistent with nfs_initiate_commit.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:37 -04:00
Fred Isaman
31f6852a4c NFS: dprintks in directio code were referencing task after put
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:37 -04:00
Fred Isaman
0b7c01533a NFS: add a struct nfs_commit_data to replace nfs_write_data in commits
Commits don't need the vectors of pages, etc. that writes do. Split out
a separate structure for the commit operation.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:37 -04:00
Fred Isaman
799ba8d53d NFS4.1: Add lseg to struct nfs4_fl_commit_bucket
Also create a commit_info structure to hold the bucket array and push
it up from the lseg to the layout where it really belongs.

While we are at it, fix a refcounting bug due to an (incorrect)
implicit assumption that filelayout_scan_ds_commit_list always
completely emptied the src list.

This clarifies refcounting, removes the ugly find_only_write_lseg
functions, and pushes the file layout commit code along on the path to
supporting multiple lsegs.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:36 -04:00
Fred Isaman
1acbbb4e16 NFS4.1: make pnfs_ld_[read|write]_done consistent
The two functions had diverged quite a bit, with the write function
being a bit more robust than the read.

However, these still break badly in the desc->pg_bsize < PAGE_CACHE_SIZE case,
as then there is nothing hanging on the data->pages list, and the resend
ends up doing nothing.  This will be fixed in a patch later in the series.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:36 -04:00
Fred Isaman
a20c6bec0b NFS: grab open context in direct read
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:36 -04:00
Bryan Schumaker
e245d4250d NFS: Remove unused function nfs_lookup_with_sec()
This fixes a compiler warning.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:03 -04:00
Bryan Schumaker
7e6eb683d2 NFS: Honor the authflavor set in the clone mount data
The authflavor is set in an nfs_clone_mount structure and passed to the
xdev_mount() functions where it was promptly ignored.  Instead, use it
to initialize an rpc_clnt for the cloned server.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:03 -04:00
Bryan Schumaker
f05d147f7e NFS: Fix following referral mount points with different security
I create a new proc_lookup_mountpoint() to use when submounting an NFS
v4 share.  This function returns an rpc_clnt to use for performing an
fs_locations() call on a referral's mountpoint.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:02 -04:00
Bryan Schumaker
72de53ec4b NFS: Do secinfo as part of lookup
Whenever lookup sees wrongsec do a secinfo and retry the lookup to find
attributes of the file or directory, such as "is this a referral
mountpoint?".  This also allows me to remove handling -NFS4ERR_WRONSEC
as part of getattr xdr decoding.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:02 -04:00
Bryan Schumaker
db0a9593d5 NFS: Handle exceptions coming out of nfs4_proc_fs_locations()
We don't want to return -NFS4ERR_WRONGSEC to the VFS because it could
cause the kernel to oops.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:01 -04:00
Bryan Schumaker
31e4dda474 NFS: Fix SECINFO_NO_NAME
I was using the same decoder function for SECINFO and SECINFO_NO_NAME,
so it was returning an error when it tried to decode an OP_SECINFO_NO_NAME
header as OP_SECINFO.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:01 -04:00
Sachin Prabhu
5794d21ef4 Avoid beyond bounds copy while caching ACL
When attempting to cache ACLs returned from the server, if the bitmap
size + the ACL size is greater than a PAGE_SIZE but the ACL size itself
is smaller than a PAGE_SIZE, we can read past the buffer page boundary.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:09:53 -04:00
Sachin Prabhu
5a00689930 Avoid reading past buffer when calling GETACL
Bug noticed in commit
bf118a342f

When calling GETACL, if the size of the bitmap array, the length
attribute and the acl returned by the server is greater than the
allocated buffer(args.acl_len), we can Oops with a General Protection
fault at _copy_from_pages() when we attempt to read past the pages
allocated.

This patch allocates an extra PAGE for the bitmap and checks to see that
the bitmap + attribute_length + ACLs don't exceed the buffer space
allocated to it.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reported-by: Jian Li <jiali@redhat.com>
[Trond: Fixed a size_t vs unsigned int printk() warning]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 13:15:07 -04:00
Jim Rees
10bd295a0b fix page number calculation bug for block layout decode buffer
Signed-off-by: Jim Rees <rees@umich.edu>
Suggested-by: Andy Adamson <andros@netapp.com>
Suggested-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-26 12:23:23 -04:00
Andy Adamson
e5265a0c58 NFSv4.1 fix page number calculation bug for filelayout decode buffers
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-26 12:23:23 -04:00
Sachin Bhamare
9526b2b6d6 pnfs-obj: Remove unused variable from objlayout_get_deviceinfo()
Local variable 'sb' was not being used in objlayout_get_deviceinfo().

Signed-off-by: Sachin Bhamare <sbhamare@panasas.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-26 12:15:51 -04:00
Weston Andros Adamson
1aba156763 nfs4: fix referrals on mounts that use IPv6 addrs
All referrals (IPv4 addr, IPv6 addr, and DNS) are broken on mounts of
IPv6 addresses, because validation code uses a path that is parsed
from the dev_name ("<server>:<path>") by splitting on the first colon and
colons are used in IPv6 addrs.
This patch ignores colons within IPv6 addresses that are escaped by '[' and ']'.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-26 12:11:29 -04:00
Trond Myklebust
7bf97bc273 NFSv4: Keep dropped state owners on the LRU list for a while
To ensure that we don't reuse their identifiers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-21 13:01:00 -04:00
Trond Myklebust
c77365c963 NFSv4: Ensure that we don't drop a state owner more than once
Retest the RB_EMPTY_NODE() condition under the spin lock
to ensure that we don't call rb_erase() more than once on the
same state owner.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-21 12:31:05 -04:00
Trond Myklebust
95b72eb0bd NFSv4: Ensure we do not reuse open owner names
The NFSv4 spec is ambiguous about whether or not it is permissible
to reuse open owner names, so play it safe. This patch adds a timestamp
to the state_owner structure, and combines that with the IDA based
uniquifier.
Fixes a regression whereby the Linux server returns NFS4ERR_BAD_SEQID.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-20 23:14:28 -04:00
Jan Kara
98a2139f4f nfs: Enclose hostname in brackets when needed in nfs_do_root_mount
When hostname contains colon (e.g. when it is an IPv6 address) it needs
to be enclosed in brackets to make parsing of NFS device string possible.
Fix nfs_do_root_mount() to enclose hostname properly when needed. NFS code
actually does not need this as it does not parse the string passed by
nfs_do_root_mount() but the device string is exposed to userspace in
/proc/mounts.

CC: Josh Boyer <jwboyer@redhat.com>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-20 17:59:01 -04:00
Fred Isaman
8ccd271f7a NFS: put open context on error in nfs_flush_multi
Cc: <stable@vger.kernel.org>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-20 14:57:30 -04:00
Fred Isaman
73fb7bc7c5 NFS: put open context on error in nfs_pagein_multi
Cc: <stable@vger.kernel.org>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-20 14:54:48 -04:00
Trond Myklebust
451146be93 NFSv4: Fix open(O_TRUNC) and ftruncate() error handling
If the file wasn't opened for writing, then truncate and ftruncate
need to report the appropriate errors.

Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-04-19 13:23:09 -04:00
Trond Myklebust
55725513b5 NFSv4: Ensure that we check lock exclusive/shared type against open modes
Since we may be simulating flock() locks using NFS byte range locks,
we can't rely on the VFS having checked the file open mode for us.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-04-19 13:23:08 -04:00
Trond Myklebust
05ffe24f52 NFSv4: Ensure that the LOCK code sets exception->inode
All callers of nfs4_handle_exception() that need to handle
NFS4ERR_OPENMODE correctly should set exception->inode

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-04-19 13:23:00 -04:00
Fred Isaman
ca138f368a NFS: check for req==NULL in nfs_try_to_update_request cleanup
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-18 11:05:49 -04:00
Linus Torvalds
58df9b387c NFS client bugfixes for Linux 3.4
Highlights include:
 - Fix infinite loops in the mount code
 - Fix a userspace buffer overflow in __nfs4_get_acl_uncached
 - Fix a memory leak due to a double reference count in rpcb_getport_async()
 
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPc8DPAAoJEGcL54qWCgDyxAkP/2YMxAnZ+8kzuhKi5J073y/K
 G9H8v6zV6WaMut9YeX+8sX/4xJrFPSZJoRALSdEbI9wpIGMMHjIxQD8RuY4imt1L
 +YwB3rAke+mr2YAvW/2Q6HVTV75/00Kui+Jkgsa2wm/4Fyz8PfCHe71bBLG4UPJg
 ZgauN9rCIHIQpT/sbuN5mPU5C4jbOZa959CogWyUKeMnbuts1Z3qbS+aFB0wcxAW
 279E88oqDISYLo5XQiC/NJzLKmKJ3iDl1UbjqS6T2g74i1zX+lyxl0K9MdTFmAmm
 9UQ1pjIFIMySKJI/LahY6ZXM76hGLW4TbwO0q+fmhuv4cFBM3LWZPUm3ggPyzAp1
 O8TEeMNIfkNVGRYVnC7GcPFjnUlt9ahiPzk0lQ7l0RzWfXiDt8y/EUOwhqdxULqx
 S5SwLmSKDw9bQXFiHJtqe7lnB9hrVWNUvHqE1iTC20YTtQYovYGWaT8YLP7/y/Iz
 4jSLv8VeOzA6BP1jfQkacsBucxpn0fhPjwuCOZl2ooa+8jldsr8in8nwo0nVgrJF
 PGzJvyvGSR2BT2BL+QSUPX0Pn8qe7eAFePlS8zFS/p3X8rkDNxYI51VwuyXOQQl3
 lJHUyTbdTd/G4H5ObEU+ClIhxIvqgDrawIdXrbiMonoShsaFoVkuX7IFviPiiPDE
 s7KGNtpLh45+GZsBVe6M
 =yiL7
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.4-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes for Linux 3.4 from Trond Myklebust

Highlights include:
- Fix infinite loops in the mount code
- Fix a userspace buffer overflow in __nfs4_get_acl_uncached
- Fix a memory leak due to a double reference count in rpcb_getport_async()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

* tag 'nfs-for-3.4-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Minor cleanups for nfs4_handle_exception and nfs4_async_handle_error
  NFSv4.1: Fix layoutcommit error handling
  NFSv4: Fix two infinite loops in the mount code
  SUNRPC: Use the already looked-up xprt in rpcb_getport_async()
  NFS4.1: remove duplicate variable declaration in filelayout_clear_request_commit
  Fix length of buffer copied in __nfs4_get_acl_uncached
2012-03-28 19:02:35 -07:00
Linus Torvalds
0195c00244 Disintegrate and delete asm/system.h
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAT3NKzROxKuMESys7AQKElw/+JyDxJSlj+g+nymkx8IVVuU8CsEwNLgRk
 8KEnRfLhGtkXFLSJYWO6jzGo16F8Uqli1PdMFte/wagSv0285/HZaKlkkBVHdJ/m
 u40oSjgT013bBh6MQ0Oaf8pFezFUiQB5zPOA9QGaLVGDLXCmgqUgd7exaD5wRIwB
 ZmyItjZeAVnDfk1R+ZiNYytHAi8A5wSB+eFDCIQYgyulA1Igd1UnRtx+dRKbvc/m
 rWQ6KWbZHIdvP1ksd8wHHkrlUD2pEeJ8glJLsZUhMm/5oMf/8RmOCvmo8rvE/qwl
 eDQ1h4cGYlfjobxXZMHqAN9m7Jg2bI946HZjdb7/7oCeO6VW3FwPZ/Ic75p+wp45
 HXJTItufERYk6QxShiOKvA+QexnYwY0IT5oRP4DrhdVB/X9cl2MoaZHC+RbYLQy+
 /5VNZKi38iK4F9AbFamS7kd0i5QszA/ZzEzKZ6VMuOp3W/fagpn4ZJT1LIA3m4A9
 Q0cj24mqeyCfjysu0TMbPtaN+Yjeu1o1OFRvM8XffbZsp5bNzuTDEvviJ2NXw4vK
 4qUHulhYSEWcu9YgAZXvEWDEM78FXCkg2v/CrZXH5tyc95kUkMPcgG+QZBB5wElR
 FaOKpiC/BuNIGEf02IZQ4nfDxE90QwnDeoYeV+FvNj9UEOopJ5z5bMPoTHxm4cCD
 NypQthI85pc=
 =G9mT
 -----END PGP SIGNATURE-----

Merge tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system

Pull "Disintegrate and delete asm/system.h" from David Howells:
 "Here are a bunch of patches to disintegrate asm/system.h into a set of
  separate bits to relieve the problem of circular inclusion
  dependencies.

  I've built all the working defconfigs from all the arches that I can
  and made sure that they don't break.

  The reason for these patches is that I recently encountered a circular
  dependency problem that came about when I produced some patches to
  optimise get_order() by rewriting it to use ilog2().

  This uses bitops - and on the SH arch asm/bitops.h drags in
  asm-generic/get_order.h by a circuituous route involving asm/system.h.

  The main difficulty seems to be asm/system.h.  It holds a number of
  low level bits with no/few dependencies that are commonly used (eg.
  memory barriers) and a number of bits with more dependencies that
  aren't used in many places (eg.  switch_to()).

  These patches break asm/system.h up into the following core pieces:

    (1) asm/barrier.h

        Move memory barriers here.  This already done for MIPS and Alpha.

    (2) asm/switch_to.h

        Move switch_to() and related stuff here.

    (3) asm/exec.h

        Move arch_align_stack() here.  Other process execution related bits
        could perhaps go here from asm/processor.h.

    (4) asm/cmpxchg.h

        Move xchg() and cmpxchg() here as they're full word atomic ops and
        frequently used by atomic_xchg() and atomic_cmpxchg().

    (5) asm/bug.h

        Move die() and related bits.

    (6) asm/auxvec.h

        Move AT_VECTOR_SIZE_ARCH here.

  Other arch headers are created as needed on a per-arch basis."

Fixed up some conflicts from other header file cleanups and moving code
around that has happened in the meantime, so David's testing is somewhat
weakened by that.  We'll find out anything that got broken and fix it..

* tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits)
  Delete all instances of asm/system.h
  Remove all #inclusions of asm/system.h
  Add #includes needed to permit the removal of asm/system.h
  Move all declarations of free_initmem() to linux/mm.h
  Disintegrate asm/system.h for OpenRISC
  Split arch_align_stack() out from asm-generic/system.h
  Split the switch_to() wrapper out of asm-generic/system.h
  Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h
  Create asm-generic/barrier.h
  Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h
  Disintegrate asm/system.h for Xtensa
  Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt]
  Disintegrate asm/system.h for Tile
  Disintegrate asm/system.h for Sparc
  Disintegrate asm/system.h for SH
  Disintegrate asm/system.h for Score
  Disintegrate asm/system.h for S390
  Disintegrate asm/system.h for PowerPC
  Disintegrate asm/system.h for PA-RISC
  Disintegrate asm/system.h for MN10300
  ...
2012-03-28 15:58:21 -07:00
David Howells
9ffc93f203 Remove all #inclusions of asm/system.h
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it.  Performed with the following command:

perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`

Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-28 18:30:03 +01:00
Trond Myklebust
14977489ff NFSv4: Minor cleanups for nfs4_handle_exception and nfs4_async_handle_error
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-27 21:53:14 -04:00
Trond Myklebust
e59d27e05a NFSv4.1: Fix layoutcommit error handling
Firstly, task->tk_status will always return negative error values,
so the current tests for 'NFS4ERR_DELEG_REVOKED' etc. are all being
ignored.
Secondly, clean up the code so that we only need to test
task->tk_status once!

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-03-27 21:53:14 -04:00
Trond Myklebust
05e9cfb408 NFSv4: Fix two infinite loops in the mount code
We can currently loop forever in nfs4_lookup_root() and in
nfs41_proc_secinfo_no_name(), if the first iteration returns a
NFS4ERR_DELAY or something else that causes exception.retry to get
set.

Reported-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-03-27 21:53:14 -04:00
Fred Isaman
d44fc38799 NFS4.1: remove duplicate variable declaration in filelayout_clear_request_commit
inode is declared twice for no good reason

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-24 14:33:26 -04:00
Sachin Prabhu
20e0fa98b7 Fix length of buffer copied in __nfs4_get_acl_uncached
_copy_from_pages() used to copy data from the temporary buffer to the
user passed buffer is passed the wrong size parameter when copying
data. res.acl_len contains both the bitmap and acl lenghts while
acl_len contains the acl length after adjusting for the bitmap size.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-24 14:33:26 -04:00
Linus Torvalds
f63d395d47 NFS client updates for Linux 3.4
New features include:
 - Add NFS client support for containers.
   This should enable most of the necessary functionality, including
   lockd support, and support for rpc.statd, NFSv4 idmapper and
   RPCSEC_GSS upcalls into the correct network namespace from
   which the mount system call was issued.
 - NFSv4 idmapper scalability improvements
   Base the idmapper cache on the keyring interface to allow concurrent
   access to idmapper entries. Start the process of migrating users from
   the single-threaded daemon-based approach to the multi-threaded
   request-key based approach.
 - NFSv4.1 implementation id.
   Allows the NFSv4.1 client and server to mutually identify each other
   for logging and debugging purposes.
 - Support the 'vers=4.1' mount option for mounting NFSv4.1 instead of
   having to use the more counterintuitive 'vers=4,minorversion=1'.
 - SUNRPC tracepoints.
   Start the process of adding tracepoints in order to improve debugging
   of the RPC layer.
 - pNFS object layout support for autologin.
 
 Important bugfixes include:
 - Fix a bug in rpc_wake_up/rpc_wake_up_status that caused them to fail
   to wake up all tasks when applied to priority waitqueues.
 - Ensure that we handle read delegations correctly, when we try to
   truncate a file.
 - A number of fixes for NFSv4 state manager loops (mostly to do with
   delegation recovery).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPalZbAAoJEGcL54qWCgDyCi4P+QHcmzQhJO7HWx3Pzjs67bFT
 xMSYaKHGWS4AJKUBVl5OKBxUExfrMHBNbElV3IKUIwBlDx8RVtnwfptKSe146iki
 dn4TrRO5es8nmI4hRDcGMlzJDZq4y0Qg//qiUFmojiNW/Avw0ljfMoVUejJJ09FV
 oeDk4EGtcxkEyH+g48ZjYbyspRnG8qtD3atf70Z3lYE0ELdG/B5Dyzw1RDrA5p73
 xJX3lqy8p/4ROzw/dmNoxdAXOrr3Q4/T58Bvp/lUglPy/EHyPmWzFoH0MU0C/PFu
 5VnAl6QDbNCTcIw9FvJlX/mIyErpNG9eKzUskUc9L9SA+B+J/i4rIap4KATRN3nH
 7QhE5qUacPuJnvxml7MPmlQTuft3fkAQ7NhKIWrbRi1QS9FmJC5NxctIb8loqlFn
 yIXdKeLfMshB+NyuFS9uzStX7SmV3eMgVd+5ZxRjYxm+PKJLw2KXeudArL6M5mHK
 3QeKZpqwaYQ3RfaTNpvAp0doiXHCO5UbWfI0Pe8xQs/QcMCNReffqV2G4IJKFAu6
 WpoN2UDQC9LCBifLw2nS7kku8+ZVXLQU8OC1NVl3TG15xD9cNLXuk3/y5llPGq4O
 odo52uLFpJohbDaHMj5RTKOfchTQCm2iyuVmxZEeAySypMSiAXmW7COSKHs/HxI1
 VBm+EI00Pvmm5+fUjIlp
 =LuHE
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates for Linux 3.4 from Trond Myklebust:
 "New features include:
   - Add NFS client support for containers.

     This should enable most of the necessary functionality, including
     lockd support, and support for rpc.statd, NFSv4 idmapper and
     RPCSEC_GSS upcalls into the correct network namespace from which
     the mount system call was issued.

   - NFSv4 idmapper scalability improvements

     Base the idmapper cache on the keyring interface to allow
     concurrent access to idmapper entries.  Start the process of
     migrating users from the single-threaded daemon-based approach to
     the multi-threaded request-key based approach.

   - NFSv4.1 implementation id.

     Allows the NFSv4.1 client and server to mutually identify each
     other for logging and debugging purposes.

   - Support the 'vers=4.1' mount option for mounting NFSv4.1 instead of
     having to use the more counterintuitive 'vers=4,minorversion=1'.

   - SUNRPC tracepoints.

     Start the process of adding tracepoints in order to improve
     debugging of the RPC layer.

   - pNFS object layout support for autologin.

  Important bugfixes include:

   - Fix a bug in rpc_wake_up/rpc_wake_up_status that caused them to
     fail to wake up all tasks when applied to priority waitqueues.

   - Ensure that we handle read delegations correctly, when we try to
     truncate a file.

   - A number of fixes for NFSv4 state manager loops (mostly to do with
     delegation recovery)."

* tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (224 commits)
  NFS: fix sb->s_id in nfs debug prints
  xprtrdma: Remove assumption that each segment is <= PAGE_SIZE
  xprtrdma: The transport should not bug-check when a dup reply is received
  pnfs-obj: autologin: Add support for protocol autologin
  NFS: Remove nfs4_setup_sequence from generic rename code
  NFS: Remove nfs4_setup_sequence from generic unlink code
  NFS: Remove nfs4_setup_sequence from generic read code
  NFS: Remove nfs4_setup_sequence from generic write code
  NFS: Fix more NFS debug related build warnings
  SUNRPC/LOCKD: Fix build warnings when CONFIG_SUNRPC_DEBUG is undefined
  nfs: non void functions must return a value
  SUNRPC: Kill compiler warning when RPC_DEBUG is unset
  SUNRPC/NFS: Add Kbuild dependencies for NFS_DEBUG/RPC_DEBUG
  NFS: Use cond_resched_lock() to reduce latencies in the commit scans
  NFSv4: It is not safe to dereference lsp->ls_state in release_lockowner
  NFS: ncommit count is being double decremented
  SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up()
  Try using machine credentials for RENEW calls
  NFSv4.1: Fix a few issues in filelayout_commit_pagelist
  NFSv4.1: Clean ups and bugfixes for the pNFS read/writeback/commit code
  ...
2012-03-23 08:53:47 -07:00
Linus Torvalds
e2a0883e40 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 1 from Al Viro:
 "This is _not_ all; in particular, Miklos' and Jan's stuff is not there
  yet."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
  ext4: initialization of ext4_li_mtx needs to be done earlier
  debugfs-related mode_t whack-a-mole
  hfsplus: add an ioctl to bless files
  hfsplus: change finder_info to u32
  hfsplus: initialise userflags
  qnx4: new helper - try_extent()
  qnx4: get rid of qnx4_bread/qnx4_getblk
  take removal of PF_FORKNOEXEC to flush_old_exec()
  trim includes in inode.c
  um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
  um: embed ->stub_pages[] into mmu_context
  gadgetfs: list_for_each_safe() misuse
  ocfs2: fix leaks on failure exits in module_init
  ecryptfs: make register_filesystem() the last potential failure exit
  ntfs: forgets to unregister sysctls on register_filesystem() failure
  logfs: missing cleanup on register_filesystem() failure
  jfs: mising cleanup on register_filesystem() failure
  make configfs_pin_fs() return root dentry on success
  configfs: configfs_create_dir() has parent dentry in dentry->d_parent
  configfs: sanitize configfs_create()
  ...
2012-03-21 13:36:41 -07:00
Linus Torvalds
3556485f15 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates for 3.4 from James Morris:
 "The main addition here is the new Yama security module from Kees Cook,
  which was discussed at the Linux Security Summit last year.  Its
  purpose is to collect miscellaneous DAC security enhancements in one
  place.  This also marks a departure in policy for LSM modules, which
  were previously limited to being standalone access control systems.
  Chromium OS is using Yama, and I believe there are plans for Ubuntu,
  at least.

  This patchset also includes maintenance updates for AppArmor, TOMOYO
  and others."

Fix trivial conflict in <net/sock.h> due to the jumo_label->static_key
rename.

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (38 commits)
  AppArmor: Fix location of const qualifier on generated string tables
  TOMOYO: Return error if fails to delete a domain
  AppArmor: add const qualifiers to string arrays
  AppArmor: Add ability to load extended policy
  TOMOYO: Return appropriate value to poll().
  AppArmor: Move path failure information into aa_get_name and rename
  AppArmor: Update dfa matching routines.
  AppArmor: Minor cleanup of d_namespace_path to consolidate error handling
  AppArmor: Retrieve the dentry_path for error reporting when path lookup fails
  AppArmor: Add const qualifiers to generated string tables
  AppArmor: Fix oops in policy unpack auditing
  AppArmor: Fix error returned when a path lookup is disconnected
  KEYS: testing wrong bit for KEY_FLAG_REVOKED
  TOMOYO: Fix mount flags checking order.
  security: fix ima kconfig warning
  AppArmor: Fix the error case for chroot relative path name lookup
  AppArmor: fix mapping of META_READ to audit and quiet flags
  AppArmor: Fix underflow in xindex calculation
  AppArmor: Fix dropping of allowed operations that are force audited
  AppArmor: Add mising end of structure test to caps unpacking
  ...
2012-03-21 13:25:04 -07:00
Vivek Trivedi
5a7c9eec9f NFS: fix sb->s_id in nfs debug prints
NFS bdi flush thread in ps output is printed like "flush-<major number
in decimal>:<minor number in decimal>"
For example:
$ ps aux | grep flush
 2079 root         0 SW   [flush-0:18]
                                 ^^^^

nfs_bdi_register()
==> bdi_register_dev()
==> bdi_register(bdi, NULL, "%u:%u", MAJOR(dev), MINOR(dev));
                             ^^^^^

However, NFS sb->s_id store major:minor number in hex:

nfs_initialise_sb()
==>         snprintf(sb->s_id, sizeof(sb->s_id),
                 "%x:%x", MAJOR(sb->s_dev), MINOR(sb->s_dev));
                  ^^^^^

If we enable nfs debug prints using command:
$ rpcdebug -m nfs -s all

write to a file:
$ dd if=/dev/zero of=<NFS Mount>/testfile.txt bs=32768 count=1

Without Patch:
[ 2431.032000] NFS:     0 initiated write call (req 0:12/40, 32768 bytes
@ offset 0)                                         ^^^^

With Patch:
[ 2431.032000] NFS:     0 initiated write call (req 0:18/40, 32768 bytes
@ offset 0)                                         ^^^^

We should store NFS "s->s_id" in decimal to avoid confusion between NFS
flush thread name(in ps output) and NFS debug prints.

Signed-off-by: Vivek Trivedi <vtrivedi018@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-21 11:45:28 -04:00
Sachin Bhamare
18d98f6c04 pnfs-obj: autologin: Add support for protocol autologin
The pnfs-objects protocol mandates that we autologin into devices not
present in the system, according to information specified in the
get_device_info returned from the server.

The Protocol specifies two login hints.
1. An IP address:port combination
2. A string URI which is constructed as a URL with a protocol prefix
   followed by :// and a string as address. For each  protocol prefix
   the string-address format might be different.

We only support the second option. The first option is just redundant
to the second one.
NOTE: The Kernel part of autologin does not parse the URI string. It
just channels it to a user-mode script. So any new login protocols should
only update the user-mode script which is a part of the nfs-utils package,
but the Kernel need not change.

We implement the autologin by using the call_usermodehelper() API.
(Thanks to Steve Dickson <steved@redhat.com> for pointing it out)
So there is no running daemon needed, and/or special setup.

We Add the osd_login_prog Kernel module parameters which defaults to:
	/sbin/osd_login

Kernel try's to upcall the program specified in osd_login_prog. If the file is
not found or the execution fails Kernel will disable any farther upcalls, by
zeroing out  osd_login_prog, Until Admin re-enables it by setting the
osd_login_prog parameter to a proper program.

Also add text about the osd_login program command line API to:
	Documentation/filesystems/nfs/pnfs.txt
and documentation of the new  osd_login_prog  module parameter to:
	Documentation/kernel-parameters.txt

TODO: Add timeout option in the case osd_login program gets
              stuck

Signed-off-by: Sachin Bhamare <sbhamare@panasas.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-21 09:31:47 -04:00
Bryan Schumaker
c6bfa1a163 NFS: Remove nfs4_setup_sequence from generic rename code
This is an NFS v4 specific operation, so it belongs in the NFS v4 code
and not the generic client.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-21 09:31:46 -04:00
Bryan Schumaker
34e137cc7e NFS: Remove nfs4_setup_sequence from generic unlink code
This is an NFS v4 specific operation, so it belongs in the NFS v4 code
and not the generic client.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-21 09:31:46 -04:00
Bryan Schumaker
ea7c330362 NFS: Remove nfs4_setup_sequence from generic read code
This is an NFS v4 specific operation, so it belongs in the NFS v4 code
and not the generic client.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-21 09:31:45 -04:00
Bryan Schumaker
c6cb80d00b NFS: Remove nfs4_setup_sequence from generic write code
This is an NFS v4 specific operation, so it belongs in the NFS v4 code
and not the generic client.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-21 09:31:45 -04:00
Trond Myklebust
6f00866ddd NFS: Fix more NFS debug related build warnings
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-21 09:31:44 -04:00
Al Viro
48fde701af switch open-coded instances of d_make_root() to new helper
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:35 -04:00
Trond Myklebust
e27d359e9b SUNRPC/NFS: Add Kbuild dependencies for NFS_DEBUG/RPC_DEBUG
This allows us to turn on/off the dprintk() debugging interfaces for
those distributions that don't ship the 'rpcdebug' utility.
It also allows us to add Kbuild dependencies. Specifically, we already
know that dprintk() in general relies on CONFIG_SYSCTL. Now it turns out
that the NFS dprintks depend on CONFIG_CRC32 after we added support
for the filehandle hash.

Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-20 13:08:26 -04:00
Trond Myklebust
3b3be88d67 NFS: Use cond_resched_lock() to reduce latencies in the commit scans
Ensure that we conditionally drop the inode->i_lock when it is safe
to do so in the commit loops.
We do so after locking the nfs_page, but before removing it from the
commit list. We can then use list_safe_reset_next to recover the loop
after the lock is retaken.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-20 13:08:26 -04:00
Trond Myklebust
5ae67c4fee NFSv4: It is not safe to dereference lsp->ls_state in release_lockowner
It is quite possible for the release_lockowner RPC call to race with the
close RPC call, in which case, we cannot dereference lsp->ls_state in
order to find the nfs_server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-20 13:08:25 -04:00
Fred Isaman
c4f1b62a4b NFS: ncommit count is being double decremented
The decrement is handled by each call to nfs_request_remove_commit_list,
no need to do it again in nfs_scan_commit.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-20 13:05:08 -04:00
Cong Wang
2b86ce2db3 nfs: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:24 +08:00
Sachin Prabhu
e49a29bd0e Try using machine credentials for RENEW calls
Using user credentials for RENEW calls will fail when the user
credentials have expired.

To avoid this, try using the machine credentials when making RENEW
calls. If no machine credentials have been set, fall back to using user
credentials as before.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-17 11:17:42 -04:00
Trond Myklebust
9390f42546 NFSv4.1: Fix a few issues in filelayout_commit_pagelist
- Fix a race in which NFS_I(inode)->commits_outstanding could potentially
  go to zero (triggering a call to nfs_commit_clear_lock()) before we're
  done sending out all the commit RPC calls.

- If nfs_commitdata_alloc fails, there is no reason why we shouldn't
  try to send off all the commits-to-ds.

- Simplify the error handling.

- Change pnfs_commit_list() to always return either
  PNFS_ATTEMPTED or PNFS_NOT_ATTEMPTED.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-03-17 11:17:42 -04:00
Trond Myklebust
8dd3775889 NFSv4.1: Clean ups and bugfixes for the pNFS read/writeback/commit code
Move more pnfs-isms out of the generic commit code.

Bugfixes:

- filelayout_scan_commit_lists doesn't need to get/put the lseg.
  In fact since it is run under the inode->i_lock, the lseg_put()
  can deadlock.

- Ensure that we distinguish between what needs to be done for
  commit-to-data server and what needs to be done for commit-to-MDS
  using the new flag PG_COMMIT_TO_DS. Otherwise we may end up calling
  put_lseg() on a bucket for a struct nfs_page that got written
  through the MDS.

- Fix a case where we were using list_del() on an nfs_page->wb_list
  instead of list_del_init().

- filelayout_initiate_commit needs to call filelayout_commit_release
  on error instead of the mds_ops->rpc_release(). Otherwise it won't
  clear the commit lock.

Cleanups:

- Let the files layout manage the commit lists for the pNFS case.
  Don't expose stuff like pnfs_choose_commit_list, and the fact
  that the commit buckets hold references to the layout segment
  in common code.

- Cast out the put_lseg() calls for the struct nfs_read/write_data->lseg
  into the pNFS layer from whence they came.

- Let the pNFS layer manage the NFS_INO_PNFS_COMMIT bit.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
2012-03-17 11:09:33 -04:00
Trond Myklebust
95a13f7b33 NFS: Fix a compile error when !defined NFS_DEBUG
We should use the 'ifdebug' wrapper rather than trying to inline
tests of nfs_debug, so that the code compiles correctly when we
don't define NFS_DEBUG.

Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-14 21:55:01 -04:00
William Dauchy
96dcadc2fd NFSv4: Rate limit the state manager for lock reclaim warning messages
Adding rate limit on `Lock reclaim failed` messages since it could fill
up system logs
Signed-off-by: William Dauchy <wdauchy@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-14 09:25:26 -04:00
Boaz Harrosh
5318a29c19 pnfs-obj: Uglify objio_segment allocation for the sake of the principle :-(
At some past instance Linus Trovalds wrote:
> From: Linus Torvalds <torvalds@linux-foundation.org>
> commit a84a79e4d3 upstream.
>
> The size is always valid, but variable-length arrays generate worse code
> for no good reason (unless the function happens to be inlined and the
> compiler sees the length for the simple constant it is).
>
> Also, there seems to be some code generation problem on POWER, where
> Henrik Bakken reports that register r28 can get corrupted under some
> subtle circumstances (interrupt happening at the wrong time?).  That all
> indicates some seriously broken compiler issues, but since variable
> length arrays are bad regardless, there's little point in trying to
> chase it down.
>
> "Just don't do that, then".

Since then any use of "variable length arrays" has become blasphemous.
Even in perfectly good, beautiful, perfectly safe code like the one
below where the variable length arrays are only used as a sizeof()
parameter, for type-safe dynamic structure allocations. GCC is not
executing any stack allocation code.

I have produced a small file which defines two functions main1(unsigned numdevs)
and main2(unsigned numdevs). main1 uses code as before with call to malloc
and main2 uses code as of after this patch. I compiled it as:
	gcc -O2 -S see_asm.c
and here is what I get:

<see_asm.s>
main1:
.LFB7:
	.cfi_startproc
	mov	%edi, %edi
	leaq	4(%rdi,%rdi), %rdi
	salq	$3, %rdi
	jmp	malloc
	.cfi_endproc
.LFE7:
	.size	main1, .-main1
	.p2align 4,,15
	.globl	main2
	.type	main2, @function
main2:
.LFB8:
	.cfi_startproc
	mov	%edi, %edi
	addq	$2, %rdi
	salq	$4, %rdi
	jmp	malloc
	.cfi_endproc
.LFE8:
	.size	main2, .-main2
	.section	.text.startup,"ax",@progbits
	.p2align 4,,15
</see_asm.s>

*Exact* same code !!!

So please seriously consider not accepting this patch and leave the
perfectly good code intact.

CC: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-13 23:47:59 -04:00
Dan Carpenter
e138ead73f NFS: null dereference in dev_remove()
In commit 5ffaf85541 "NFS: replace global bl_wq with per-net one" we
made "msg" a pointer instead of a struct stored in stack memory.  But we
forgot to change the memset() here so we're still clearing stack memory
instead clearing the struct like we intended.  It will lead to a kernel
crash.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-13 15:33:08 -04:00
Trond Myklebust
9a3ba43233 NFSv4: Rate limit the state manager warning messages
Prevent the state manager from filling up system logs when recovery
fails on the server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-03-12 18:15:22 -04:00
Bryan Schumaker
11588f493a NFS: Check return value from rpc_queue_upcall()
This function could fail to queue the upcall if rpc.idmapd is not running,
causing a warning message to be printed.  Instead, I want to check the
return value and revoke the key if the upcall can't be run.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-12 12:51:42 -04:00
Bryan Schumaker
4b7c8dd205 NFS: Only define some function when v4.1 is enabled
Now that the nfs4_cb_match_client() function is static, gcc notices that
it is only used when CONFIG_NFS_V4_1 is enabled.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-12 12:51:40 -04:00
Trond Myklebust
17280175c5 NFS: Fix a number of sparse warnings
Fix a number of "warning: symbol 'foo' was not declared. Should it be
static?" conditions.

Fix 2 cases of "warning: Using plain integer as NULL pointer"

fs/nfs/delegation.c:263:31: warning: restricted fmode_t degrades to integer
  - We want to allow upgrades to a WRITE delegation, but should otherwise
    consider servers that hand out duplicate delegations to be borken.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-11 15:14:16 -04:00
Stanislav Kinsbursky
5ffaf85541 NFS: replace global bl_wq with per-net one
This queue is used for sleeping in kernel and it have to be per-net since we
don't want to wake any other waiters except in out network nemespace.
BTW, move wq to per-net data is easy. But some way to handle upcall timeouts
have to be provided. On message destroy in case of timeout, tasks, waiting for
message to be delivered, should be awakened. Thus, some data required to
located the right wait queue. Chosen solution replaces rpc_pipe_msg object with
new introduced bl_pipe_msg object, containing rpc_pipe_msg and proper wq.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-11 10:57:35 -04:00
Stanislav Kinsbursky
cb9c1c4a88 NFS: replace global bl_mount_reply with per-net one
This global variable is used for blocklayout downcall and thus can be corrupted
if case of existence of multiple networks namespaces.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-11 10:57:34 -04:00
Fred Isaman
d6d6dc7cdf NFS: remove nfs_inode radix tree
The radix tree is only being used to compile lists of reqs needing commit.
It is simpler to just put the reqs directly into a list.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-10 17:14:10 -05:00
Fred Isaman
9994b62b56 NFS: remove NFS_PAGE_TAG_LOCKED
The last real use of this tag was removed by
commit 7f2f12d963 NFS: Simplify nfs_wb_page()

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-10 17:13:58 -05:00
Trond Myklebust
ad1e396829 NFSv4.0: Re-establish the callback channel on NFS4ERR_CB_PATHDOWN
When the NFSv4.0 server tells us that it can no-longer talk to us
on the callback channel, we should attempt a new SETCLIENTID in
order to re-transmit the callback channel information.

Note that as long as we do not change the boot verifier, this is
a safe procedure; the server is required to keep our state.

Also move the function nfs_handle_cb_pathdown to fs/nfs/nfs4state.c,
and change the name in order to mark it as being specific to NFSv4.0.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-10 11:54:36 -05:00
Trond Myklebust
4fc8796d23 NFSv4: Clean up nfs4_select_rw_stateid()
Ensure that we select delegation stateids first, then
lock stateids and then open stateids.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-08 22:38:55 -05:00
Trond Myklebust
0032a7a749 NFS: Don't copy read delegation stateids in setattr
The server will just return an NFS4ERR_OPENMODE anyway.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-08 22:37:12 -05:00
Andy Adamson
2dc317565b NFSv4.1 cleanup DS stateid error handling
The error handler nfs4_state parameter is never NULL in the pNFS case as
the open_context must carry an nfs_state.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-08 11:05:48 -05:00
Trond Myklebust
3114ea7a24 NFSv4: Return the delegation if the server returns NFS4ERR_OPENMODE
If a setattr() fails because of an NFS4ERR_OPENMODE error, it is
probably due to us holding a read delegation. Ensure that the
recovery routines return that delegation in this case.

Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-03-07 17:11:19 -05:00
Trond Myklebust
cf470c3e00 NFSv4: Don't free the nfs4_lock_state until after the release_lockowner
Otherwise we can end up with sequence id problems if the client reuses
the owner_id before the server has processed the release_lockowner

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-07 13:49:12 -05:00
Andy Adamson
9cb8196839 NFSv4.1 handle DS stateid errors
Handle DS READ and WRITE stateid errors by recovering the stateid on the MDS.

NFS4ERR_OLD_STATEID is ignored as the client always sends a
state sequenceid of zero for DS READ and WRITE stateids.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-07 10:53:55 -05:00
Weston Andros Adamson
4f1abd226d NFS: add fh_crc to debug output
Print the filehandle crc in two debug messages

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-07 10:34:27 -05:00
Weston Andros Adamson
d8e0539ebd NFS: add filehandle crc for debug display
Match wireshark's CRC-32 hash for easier debugging

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-07 10:34:07 -05:00
Trond Myklebust
ff2eb6818d NFSv4: Add a helper encode_uint64
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06 10:32:50 -05:00
Trond Myklebust
475d4ba02c NFSv4: More xdr cleanups
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06 10:32:49 -05:00
Trond Myklebust
7001951421 NFSv4: Cleanup - convert more functions to use encode_op_hdr
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06 10:32:49 -05:00
Chuck Lever
cd93710e8d NFS: Fix nfs4_verifier memory alignment
Clean up due to code review.

The nfs4_verifier's data field is not guaranteed to be u32-aligned.
Casting an array of chars to a u32 * is considered generally
hazardous.

Fix this by using a __be32 array to generate a verifier's contents,
and then byte-copy the contents into the verifier field.  The contents
of a verifier, for all intents and purposes, are opaque bytes.  Only
local code that generates a verifier need know the actual content and
format.  Everyone else compares the full byte array for exact
equality.

Also, sizeof(nfs4_verifer) is the size of the in-core verifier data
structure, but NFS4_VERIFIER_SIZE is the number of octets in an XDR'd
verifier.  The two are not interchangeable, even if they happen to
have the same value.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06 10:32:48 -05:00
Trond Myklebust
ab19b4813f NFSv4: Add a encode op helper
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06 10:32:48 -05:00
Trond Myklebust
4ade982160 NFSv4: Add a helper for encoding NFSv4 sequence ids
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06 10:32:48 -05:00
Trond Myklebust
6fdfb0bc2a NFSv4: Minor clean ups for encode_string()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06 10:32:47 -05:00
Trond Myklebust
2d2f24add1 NFSv4: Simplify the struct nfs4_stateid
Replace the union with the common struct stateid4 as defined in both
RFC3530 and RFC5661. This makes it easier to access the sequence id,
which will again make implementing support for parallel OPEN calls
easier.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06 10:32:47 -05:00
Trond Myklebust
f597c53790 NFSv4: Add helpers for basic copying of stateids
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06 10:32:46 -05:00
Trond Myklebust
1e3987c305 NFSv4: Rename nfs4_copy_stateid()
It is really a function for selecting the correct stateid to use in a
read or write situation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06 10:32:46 -05:00
Trond Myklebust
ea9d23f510 NFSv4: Add a helper for encoding stateids
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06 10:32:46 -05:00
Trond Myklebust
cb17e556f6 NFSv4: Add a helper for encoding opaque data
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06 10:32:45 -05:00
Trond Myklebust
d0b496d2fc NFSv4: Rename encode_stateid() to encode_open_stateid()
The current version of encode_stateid really only applies to open stateids.
You can't use it for locks, delegations or layouts.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06 10:32:45 -05:00
Trond Myklebust
36281caa83 NFSv4: Further clean-ups of delegation stateid validation
Change the name to reflect what we're really doing: testing two
stateids for whether or not they match according the the rules in
RFC3530 and RFC5661.
Move the code from callback_proc.c to nfs4proc.c

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06 10:32:44 -05:00
Trond Myklebust
8e663f0e5f NFSv4.1: Fix matching of the stateids when returning a delegation
nfs41_validate_delegation_stateid is broken if we supply a stateid with
a non-zero sequence id. Instead of trying to match the sequence id,
the function assumes that we always want to error. While this is
true for a delegation callback, it is not true in general.

Also fix a typo in nfs4_callback_recall.

Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06 10:32:44 -05:00
Trond Myklebust
a1d0b5eebc NFS: Properly handle the case where the delegation is revoked
If we know that the delegation stateid is bad or revoked, we need to
remove that delegation as soon as possible, and then mark all the
stateids that relied on that delegation for recovery. We cannot use
the delegation as part of the recovery process.

Also note that NFSv4.1 uses a different error code (NFS4ERR_DELEG_REVOKED)
to indicate that the delegation was revoked.

Finally, ensure that setlk() and setattr() can both recover safely from
a revoked delegation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-03-06 10:32:43 -05:00
Trond Myklebust
fa68a1ba1d NFS: Fix a typo in _nfs_display_fhandle
The check for 'fh == NULL' needs to come _before_ we dereference
fh.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06 10:32:43 -05:00
Trond Myklebust
7e03b7cc07 NFS: Fix a compile issue when !CONFIG_NFS_V4_1
The attempt to display the implementation ID needs to be conditional on
whether or not CONFIG_NFS_V4_1 is defined

Reported-by: Bryan Schumaker <Bryan.Schumaker@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-05 15:27:01 -05:00
Trond Myklebust
8aa0a410af Merge commit 'nfs-for-3.3-4' into nfs-for-next
Conflicts:
	fs/nfs/nfs4proc.c

Back-merge of the upstream kernel in order to fix a conflict with the
slotid type conversion and implementation id patches...
2012-03-03 15:05:56 -05:00
Chuck Lever
54b50af089 NFS: Reduce debugging noise from encode_compound_hdr
Get rid of

  encode_compound: tag=

when XDR debugging is enabled.  The current Linux client never sets
compound tags.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-02 17:18:10 -05:00
Chuck Lever
264e6351c5 NFS: Request fh_expire_type attribute in "server caps" operation
The fh_expire_type file attribute is a filesystem wide attribute that
consists of flags that indicate what characteristics file handles
on this FSID have.

Our client doesn't support volatile file handles.  It should find
out early (say, at mount time) whether the server is going to play
shenanighans with file handles during a migration.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-02 17:18:10 -05:00
Chuck Lever
81934ddb8e NFS: Introduce NFS_ATTR_FATTR_V4_LOCATIONS
The Linux NFS client must distinguish between referral events (which
it currently supports) and migration events (which it does not yet
support).

In both types of events, an fs_locations array is returned.  But upper
layers, not the XDR layer, should make the distinction between a
referral and a migration.  There really isn't a way for an XDR decoder
function to distinguish the two, in general.

Slightly adjust the FATTR flags returned by decode_fs_locations()
to set NFS_ATTR_FATTR_V4_LOCATIONS only if a non-empty locations
array was returned from the server.  Then have logic in nfs4proc.c
distinguish whether the locations array is for a referral or
something else.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-02 17:18:09 -05:00
Chuck Lever
bb4dae5e5b NFS: Simplify arguments of encode_renew()
Clean up: pass just the clientid4 to encode_renew().  This enables it
to be used by callers who might not have an full nfs_client.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-02 17:18:09 -05:00
Chuck Lever
20d27e929f NFS: Add a client-side function to display NFS file handles
For debugging, introduce a simplistic function to print NFS file
handles on the system console.  The main function is hooked into the
dprintk debugging facility, but you can directly call the helper,
_nfs_display_fhandle(), if you want to print a handle unconditionally.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-02 17:18:08 -05:00
Chuck Lever
31b8e2aec0 NFS: Make clientaddr= optional
For NFSv4 mounts, the clientaddr= mount option has always been
required.  Now we have rpc_localaddr() in the kernel, which was
modeled after the same logic in the mount.nfs command that constructs
the clientaddr= mount option.  If user space doesn't provide a
clientaddr= mount option, the kernel can now construct its own.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-02 17:18:04 -05:00
Trond Myklebust
4e0038b6b2 SUNRPC: Move clnt->cl_server into struct rpc_xprt
When the cl_xprt field is updated, the cl_server field will also have
to change.  Since the contents of cl_server follow the remote endpoint
of cl_xprt, just move that field to the rpc_xprt.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: simplify check_gss_callback_principal(), whitespace changes ]
[ cel: forward ported to 3.4 ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-02 15:36:41 -05:00
Trond Myklebust
2446ab6070 SUNRPC: Use RCU to dereference the rpc_clnt.cl_xprt field
A migration event will replace the rpc_xprt used by an rpc_clnt.  To
ensure this can be done safely, all references to cl_xprt must now use
a form of rcu_dereference().

Special care is taken with rpc_peeraddr2str(), which returns a pointer
to memory whose lifetime is the same as the rpc_xprt.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: fix lockdep splats and layering violations ]
[ cel: forward ported to 3.4 ]
[ cel: remove rpc_max_reqs(), add rpc_net_ns() ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-02 15:36:38 -05:00
Chuck Lever
a3ca5651cb NFS: Add debugging messages to NFSv4's CLOSE procedure
CLOSE is new with NFSv4.  Sometimes it's important to know the timing
of this operation compared to things like lease renewal.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-02 15:36:34 -05:00
Chuck Lever
02a2976c91 NFS: Clean up debugging in decode_pathname()
I noticed recently that decode_attr_fs_locations() is not generating
very pretty debugging output.  The pathname components each appear on
a separate line of output, though that does not appear to be the
intended display behavior.  The preferred way to generate continued
lines of output on the console is to use pr_cont().

Note that incoming pathname4 components contain a string that is not
necessarily NUL-terminated.  I did actually see some trailing garbage
on the console.  In addition to correcting the line continuation
problem, add a string precision format specifier to ensure that each
component string is displayed properly, and that vsnprintf() does
not Oops.

Someone pointed out that allowing incoming network data to possibly
generate a console line of unbounded length may not be such a good
idea.  Since this output will rarely be enabled, and there is a hard
upper bound (NFS4_PATHNAME_MAXCOMPONENTS) in our implementation, this
is probably not a major concern.

It might be useful to additionally sanity-check the length of each
incoming component, however.  RFC 3530bis15 does not suggest a maximum
number of UTF-8 characters per component for either the pathname4 or
component4 types.  However, we could invent one that is appropriate
for our implementation.

Another possibility is to scrap all of this and print these pathnames
in upper layers after a reasonable amount of sanity checking in the
XDR layer.  This would give us an opportunity to allocate a full
buffer so that the whole pathname would be output via a single
dprintk.

Introduced by commit 7aaa0b3b: "NFSv4: convert fs-locations-components
to conform to RFC3530," (June 9, 2006).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-02 15:36:26 -05:00
Chuck Lever
88b8e133c4 NFS: Make nfs_cache_array.size a signed integer
Eliminate a number of implicit type casts in comparisons, and these
compiler warnings:

fs/nfs/dir.c: In function ‘nfs_readdir_clear_array’:
fs/nfs/dir.c:264:16: warning: comparison between signed and unsigned
		integer expressions [-Wsign-compare]
fs/nfs/dir.c: In function ‘nfs_readdir_search_for_cookie’:
fs/nfs/dir.c:352:16: warning: comparison between signed and unsigned
		integer expressions [-Wsign-compare]
fs/nfs/dir.c: In function ‘nfs_do_filldir’:
fs/nfs/dir.c:769:38: warning: comparison between signed and unsigned
		integer expressions [-Wsign-compare]
fs/nfs/dir.c:780:9: warning: comparison between signed and unsigned
		integer expressions [-Wsign-compare]

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-02 15:36:13 -05:00
Trond Myklebust
3862279a5f NFS: Consolidate the parsing of the '-ov4.x' and '-overs=4.x' mount options
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-02 14:06:39 -05:00
Trond Myklebust
7bbceb6f2b NFS: Ensure we display the minor version correctly in /proc/mounts etc.
The 'minorversion' mount option is now deprecated, so we need to display
the minor version number in the 'vers=' format.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-02 14:00:20 -05:00
Trond Myklebust
0d71b05809 NFS: Extend the -overs= mount option to allow 4.x minorversions
Allow the user to mount an NFSv4.0 or NFSv4.1 partition using a
standard syntax of '-overs=4.0', or '-overs=4.1' rather than the
more cumbersome '-overs=4,minorversion=1'.

See also the earlier patch by Dros Adamson, which added the
Linux-specific syntax '-ov4.0', '-ov4.1'.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-02 13:59:49 -05:00
Weston Andros Adamson
7d2ed9ac22 NFSv4: parse and display server implementation ids
Shows the implementation ids in /proc/self/mountstats.  This doesn't break
the nfs-utils mountstats tool.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-01 17:10:22 -05:00
Weston Andros Adamson
9edbd953f8 NFSv4: fix server_scope memory leak
server_scope would never be freed if nfs4_check_cl_exchange_flags() returned
non-zero

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-01 17:10:21 -05:00
Weston Andros Adamson
db8ac8ba87 NFSv4: Send implementation id with exchange_id
Send the nfs implementation id in EXCHANGE_ID requests unless the module
parameter nfs.send_implementation_id is 0.

This adds a CONFIG variable for the nii_domain that defaults to "kernel.org".

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-01 17:10:21 -05:00
Bryan Schumaker
57e62324e4 NFS: Store the legacy idmapper result in the keyring
This patch removes the old hashmap-based caching and instead uses a
"request key actor" to place an upcall to the legacy idmapper rather
than going through /sbin/request-key.  This will only be used as a
fallback if /etc/request-key.conf isn't configured to use nfsidmap.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-01 17:10:16 -05:00
Trond Myklebust
0cb3284b53 NFSv4.1: Get rid of NFS4CLNT_LAYOUTRECALL
The NFS4CLNT_LAYOUTRECALL bit is a long-term impediment to scalability. It
basically stops all other recalls by a given server once any layout recall
is requested.

If the recall is for a different file, then we don't care.
If the recall applies to the same file, then we're in one of two situations:
Either we are in the case of a replay of an existing request, in which case
the session is supposed to deal with matters, or we are dealing with a
completely different request, in which case we should just try to process
it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-01 11:17:50 -05:00
Trond Myklebust
a59c30acfb NFSv4.1: Get rid of redundant NFS4CLNT_LAYOUTRECALL tests
The NFS4CLNT_LAYOUTRECALL tests in pnfs_layout_process and
pnfs_update_layout are redundant.

In the case of a bulk layout recall, we're always testing for
the NFS_LAYOUT_BULK_RECALL flay anyway.
In the case of a file or segment recall, the call to
pnfs_set_layout_stateid() updates the layout_header 'barrier'
sequence id, which triggers the test in pnfs_layoutgets_blocked()
and is less race-prone than NFS4CLNT_LAYOUTRECALL anyway.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-01 11:17:47 -05:00
Stanislav Kinsbursky
e9dbca8d73 NFS: release per-net clients lock before calling PipeFS dentries creation
v3:
1) Lookup for client is performed from the beginning of the list on each PipeFS
event handling operation.

Lockdep is sad otherwise, because inode mutex is taken on PipeFS dentry
creation, which can be called on mount notification, where this per-net client
lock is taken on clients list walk.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-27 13:36:35 -05:00
Trond Myklebust
7df529af5f NFSv4.1: Don't call nfs4_deviceid_purge_client() unless we're NFSv4.1
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-26 17:34:22 -05:00
Trond Myklebust
abd9669861 NFS: Ensure struct nfs_client holds a reference to the net namespace
Otherwise we have no guarantee that the net namespace won't just
disappear from underneath us once the task that created it
is destroyed.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
2012-02-19 08:46:49 +01:00
Trond Myklebust
9937347a1e NFS: Ensure that the nfs_client 'net' field is always set
Currently, the nfs_parsed_mount_data->net field is initialised in
the nfs_parse_mount_options() function, which means that it only
gets set if we're using text based mounts. The legacy binary
mount interface is therefore broken.

Fix is to initialise the ->net field in nfs_alloc_parsed_mount_data.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
2012-02-19 08:44:07 +01:00
Weston Andros Adamson
abe9a6d57b NFSv4: fix server_scope memory leak
server_scope would never be freed if nfs4_check_cl_exchange_flags() returned
non-zero

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-17 17:34:03 -05:00
Trond Myklebust
f86f36a6ae NFSv4.1: Fix a NFSv4.1 session initialisation regression
Commit aacd553 (NFSv4.1: cleanup init and reset of session slot tables)
introduces a regression in the session initialisation code. New tables
now find their sequence ids initialised to 0, rather than the mandated
value of 1 (see RFC5661).

Fix the problem by merging nfs4_reset_slot_table() and nfs4_init_slot_table().
Since the tbl->max_slots is initialised to 0, the test in
nfs4_reset_slot_table for max_reqs != tbl->max_slots will automatically
pass for an empty table.

Reported-by: Vitaliy Gusev <gusev.vitaliy@nexenta.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-17 17:33:39 -05:00
Weston Andros Adamson
0a70219523 NFS: include filelayout DS rpc stats in mountstats
Include RPC statistics from all data servers in /proc/self/mountstats for pNFS
filelayout mounts.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-17 13:39:47 -05:00
Andy Adamson
b6bf6e7d6f NFSv4.1 set highest_used_slotid to NFS4_NO_SLOT
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-17 13:38:03 -05:00
Chuck Lever
d7c3267502 nfs: Clean up debugging in nfs_follow_mountpoint()
Clean up: Fix a debugging message which had an obsolete function name
in it (nfs_follow_mountpoint).

Introduced by commit 36d43a43 "NFS: Use d_automount() rather than
abusing follow_link()" (January 14, 2011)

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-16 15:05:16 -05:00
Vitaliy Gusev
b4b9a0c1c8 nfs41: Verify channel's attributes accordingly to RFC v2
ca_maxoperations:

      For the backchannel, the server MUST
      NOT change the value the client offers.  For the fore channel,
      the server MAY change the requested value.

  ca_maxrequests:

       For the backchannel, the server MUST NOT change the
       value the client offers.  For the fore channel, the server MAY
       change the requested value.

Signed-off-by: Vitaliy Gusev <gusev.vitaliy@nexenta.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-15 11:16:11 -05:00
Weston Andros Adamson
571b755401 NFS: dont allow minorversion= opt when vers != 4
Don't allow invalid 'vers' and 'minorversion' combinations in mount options,
such as "vers=3,minorversion=1".

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-15 00:19:52 -05:00
Trond Myklebust
685f50f918 NFSv4: Further reduce the footprint of the idmapper
Don't allocate the legacy idmapper tables until we actually need
them.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2012-02-15 00:19:51 -05:00
Trond Myklebust
e3da87066f NFSv4: The idmapper now depends on keyring functionality
Add the appropriate 'select KEYS' to the NFSv4 Kconfig entry.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-15 00:19:51 -05:00
Trond Myklebust
d073e9b541 NFSv4: Reduce the footprint of the idmapper
Instead of pre-allocating the storage for all the strings, we can
significantly reduce the size of that table by doing the allocation
when we do the downcall.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2012-02-15 00:19:50 -05:00
Weston Andros Adamson
7ced286e0a NFS: add mount options 'v4.0' and 'v4.1'
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-15 00:19:50 -05:00
Stanislav Kinsbursky
b6d1e83b4e NFS: fix nfs4_find_client_sessionid() arguments list
It's not compilable in case of CONFIG_NFS_V4_1 is not set.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-15 00:19:49 -05:00
Trond Myklebust
4c03ae4a89 NFS: Initialise the nfs_net->nfs_client_lock
Ensure that we initialise the nfs_net->nfs_client_lock spinlock.
Also ensure that nfs_server_remove_lists() doesn't try to
dereference server->nfs_client before that is initialised.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
2012-02-15 00:19:49 -05:00
Stanislav Kinsbursky
66697bfd6a LockD: make nlm hosts network namespace aware
This object depends on RPC client, and thus on network namespace.
So let's make it's allocation and lookup in network namespace context.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-15 00:19:48 -05:00
Trond Myklebust
ef159e9177 NFSv4.1: Add a module parameter to set the number of session slots
Add the module parameter 'max_session_slots' to set the initial number
of slots that the NFSv4.1 client will attempt to negotiate with the
server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-15 00:19:44 -05:00
Trond Myklebust
45d43c291e NFSv4.1: Convert slotid from u8 to u32
It is perfectly legal to negotiate up to 2^32-1 slots in the protocol,
and with 10GigE, we are already seeing that 255 slots is far too limiting.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-15 00:19:43 -05:00
Al Viro
4040153087 security: trim security.h
Trim security.h

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: James Morris <jmorris@namei.org>
2012-02-14 10:45:42 +11:00
Trond Myklebust
b9f9a03150 NFSv4: Ensure we throw out bad delegation stateids on NFS4ERR_BAD_STATEID
To ensure that we don't just reuse the bad delegation when we attempt to
recover the nfs4_state that received the bad stateid error.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-02-09 15:59:21 -05:00
James Morris
9e3ff38647 Merge branch 'next-queue' into next 2012-02-09 17:02:34 +11:00
Stanislav Kinsbursky
17347d03c0 NFS: build fixed in case of NFS_USE_NEW_IDMAPPER is undefined
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-06 18:48:05 -05:00
Stanislav Kinsbursky
33faaa380e NFS: pass transport net to rpc_pton() while parse server name
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-06 18:48:05 -05:00
Stanislav Kinsbursky
b48e127884 NFS: pass current net to rpc_pton() while parsing mount options
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-06 18:48:04 -05:00
Stanislav Kinsbursky
c7add9a972 NFS: search for client session id in proper network namespace
Network namespace is taken from request transport and passed as a part of
cb_process_state structure.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-06 18:48:04 -05:00
Stanislav Kinsbursky
bc224f539d NFS: pass proper net rpc_pton() in nfs_dns_resolve_name()
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-06 18:48:04 -05:00
Stanislav Kinsbursky
dc03085834 NFS: make nfs_client_lock per net ns
This patch makes nfs_clients_lock allocated per network namespace. All items it
protects are already network namespace aware.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-06 18:48:03 -05:00
Stanislav Kinsbursky
28cd1b3f26 NFS: make cb_ident_idr per net ns
This patch makes ID's infrastructure network namespace aware. This was done
mainly because of nfs_client_lock, which is desired to be per network
namespace, but protects NFS clients ID's.

NOTE: NFS client's net pointer have to be set prior to ID initialization,
proper assignment was moved.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-06 18:48:03 -05:00
Stanislav Kinsbursky
c25d32b263 NFS: make nfs_volume_list per net ns
This patch splits global list of NFS servers into per-net-ns array of lists.
This looks more strict and clearer.
BTW, this patch also makes "/proc/fs/nfsfs/volumes" content depends on /proc
mount owner pid namespace. See below for details.

NOTE: few words about how was /proc/fs/nfsfs/ entries content show per network
namespace done. This is a little bit tricky and not the best is could be. But
it's cheap (proper fix for /proc conteinerization is a hard nut to crack).
The idea is simple: take proper network namespace from pid namespace
child reaper nsproxy of /proc/ mount creator.
This actually means, that if there are 2 containers with different net
namespace sharing pid namespace, then read of /proc/fs/nfsfs/ entries will
always return content, taken from net namespace of pid namespace creator task
(and thus second namespace set wil be unvisible).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-06 18:48:02 -05:00
Stanislav Kinsbursky
6b13168b36 NFS: make nfs_client_list per net ns
This patch splits global list of NFS clients into per-net-ns array of lists.
This looks more strict and clearer.
BTW, this patch also makes "/proc/fs/nfsfs/servers" entry content depends on
/proc mount owner pid namespace. See below for details.

NOTE: few words about how was /proc/fs/nfsfs/ entries content show per network
namespace done. This is a little bit tricky and not the best is could be. But
it's cheap (proper fix for /proc conteinerization is a hard nut to crack).
The idea is simple: take proper network namespace from pid namespace
child reaper nsproxy of /proc/ mount creator.
This actually means, that if there are 2 containers with different net
namespace sharing pid namespace, then read of /proc/fs/nfsfs/ entries will
always return content, taken from net namespace of pid namespace creator task
(and thus second namespace set wil be unvisible).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-06 18:48:02 -05:00
Bryan Schumaker
3cd0f37a2c NFS: Keep idmapper include files in one place
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-06 18:48:01 -05:00
Bryan Schumaker
e6499c6f4b NFS: Fall back on old idmapper if request_key() fails
This patch removes the CONFIG_NFS_USE_NEW_IDMAPPER compile option.
First, the idmapper will attempt to map the id using /sbin/request-key
and nfsidmap.  If this fails (if /etc/request-key.conf is not configured
properly) then the idmapper will call the legacy code to perform the
mapping.  I left a comment stating where the legacy code begins to make
it easier for somebody to remove in the future.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-06 18:48:01 -05:00
Weston Andros Adamson
2d3fe01c36 NFS: Fix comparison between DS address lists
data_server_cache entries should only be treated as the same if the address
list hasn't changed.

A new entry will be made when an MDS changes an address list (as seen by
GETDEVINFO). The old entry will be freed once all references are gone.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-06 18:48:00 -05:00
Weston Andros Adamson
a030889a01 NFS: start printks w/ NFS: even if __func__ shown
This patch addresses printks that have some context to show that they are
from fs/nfs/, but for the sake of consistency now start with NFS:

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-06 18:48:00 -05:00
Weston Andros Adamson
f9fd2d9c1f NFS: printks in fs/nfs/ should start with NFS:
Messages like "Got error -10052 from the server on DESTROY_SESSION. Session
has been destroyed regardless" can be confusing to users who aren't very
familiar with NFS.

NOTE: This patch ignores any printks() that start by printing __func__ - that
will be in a separate patch.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-06 18:47:59 -05:00
Bryan Schumaker
b01dd1d8fa NFS: Call test_stateid() and free_stateid() with correct stateids
I noticed that test_stateid() was always using the same stateid for open
and lock recovery.  After poking around a bit, I discovered that it was
always testing with a delegation stateid (even if there was no
delegation present).  I figured this wasn't correct, so now delegation
and open stateids are tested during open_expired() and lock stateids are
tested during lock_expired().

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-06 18:47:58 -05:00
Bryan Schumaker
1cab0652ba NFS: Pass a stateid to test_stateid() and free_stateid()
This takes the guesswork out of what stateid to use.  The caller is
expected to figure this out and pass in the correct one.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-06 18:47:58 -05:00
Trond Myklebust
331818f1c4 NFSv4: Fix an Oops in the NFSv4 getacl code
Commit bf118a342f (NFSv4: include bitmap
in nfsv4 get acl data) introduces the 'acl_scratch' page for the case
where we may need to decode multi-page data. However it fails to take
into account the fact that the variable may be NULL (for the case where
we're not doing multi-page decode), and it also attaches it to the
encoding xdr_stream rather than the decoding one.

The immediate result is an Oops in nfs4_xdr_enc_getacl due to the
call to page_address() with a NULL page pointer.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Andy Adamson <andros@netapp.com>
Cc: stable@vger.kernel.org
2012-02-03 18:50:34 -05:00
Trond Myklebust
a4980e7840 NFSv4: ACCESS validation doesn't require a full attribute refresh
We only really need to check the change attribute, so let's just use the
server->cache_consistency_bitmask.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:24 -05:00
Trond Myklebust
8b7e3f49dd NFSv4: Don't decode fs_locations if we didn't ask for them...
Currently, the server can potentially cause us to Oops by returning an
fs_locations request that we didn't actually request.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:23 -05:00
Jeff Layton
c15c928f36 nfs: remove unneeded NULL pointer check in nfs4_remote_mount
"data" is never NULL here.

Reported-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:23 -05:00
Benny Halevy
d36b7cf7c6 pnfs: clean up initiate_file_draining layout lookup
Fixes the following compiler warning:

fs/nfs/callback_proc.c: In function 'do_callback_layoutrecall':
fs/nfs/callback_proc.c:115:26: warning: 'lo' may be used uninitialized in this function

Reported-by: Jim Rees <rees@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:22 -05:00
Trond Myklebust
7d9dea915f NFS: Use kcalloc() when allocating arrays
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:22 -05:00
Trond Myklebust
4601df20fb NFSv4: Avoid thundering herd issues with nfs_release_seqid
Store a pointer to the rpc_task in struct nfs_seqid so that we can wake up
only that request that is able to grab the lock after we've released it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:21 -05:00
Trond Myklebust
a613fa168a SUNRPC: constify the rpc_program
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:20 -05:00
Stanislav Kinsbursky
babea479b7 NFS: remove unused nfs4_find_client_no_ident function
Looks like this function survived after some cleanup patch without a reason.
Now it's not called or referenced and I believe, that it can be simply removed.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:19 -05:00
Stanislav Kinsbursky
ec7652aaf2 SUNRPC: register RPC stats /proc entries in passed network namespace context
This patch makes it possible to create NFS program entry ("/proc/net/rpc/nfs")
in passed network namespace context instead of hard-coded "init_net".

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:17 -05:00
Stanislav Kinsbursky
170942726b NFS: decode destination address in proper network namespace context
This patch replaces "init_net" with NFS client's owner net in rpc_pton() call
in decode_ds_addr().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:17 -05:00
Stanislav Kinsbursky
599ec129c2 NFS: parse DNS cache in proper network namespace context
This patch replaces "init_net" with cache's owner net in rpc_pton() call.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:17 -05:00
Stanislav Kinsbursky
90100b1766 SUNRPC: parametrize rpc_pton() by network context
Parametrize rpc_pton() by network context and thus force it's callers to pass
in network context instead of using hard-coded "init_net".

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:12 -05:00
Trond Myklebust
961a828df6 SUNRPC: Fix potential races in xprt_lock_write_next()
We have to ensure that the wake up from the waitqueue and the assignment
of xprt->snd_task are atomic. We can do this by assigning the snd_task
while under the waitqueue spinlock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:08 -05:00
Trond Myklebust
2aeb98f498 NFS: Ensure that mmapped pages remain stable during writeback
Ensure that nfs_vm_page_mkwrite() waits for the page writeback to
complete before the application is allowed to modify page
contents.
The main reason for wanting to do this in NFS is to ensure that the
server doesn't get confused if we have to resend the RPC request
due to a dropped/missed reply.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:08 -05:00
Trond Myklebust
536e43d12b NFS: Optimise away unnecessary setattrs for open(O_TRUNC);
Currently, we will correctly optimise away a truncate that doesn't
change the file size. However, in the case of open(O_TRUNC), we
also want to optimise away the time changes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:07 -05:00
Trond Myklebust
48c22eb210 NFS: Move struct nfs_unique_id into struct nfs_seqid_counter
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:07 -05:00
Trond Myklebust
7ba127ab9f NFSv4: Move contents of struct rpc_sequence into struct nfs_seqid_counter
Clean up.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:06 -05:00
Trond Myklebust
9d12b216aa NFSv41: Add a new helper nfs4_init_sequence()
Clean up

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 19:28:06 -05:00
Trond Myklebust
d2d7ce28a2 NFSv4: Replace lock_owner->ld_id with an ida based allocator
Again, We're unlikely to ever need more than 2^31 simultaneous lock
owners, so let's replace the custom allocator.

Now that there are no more users, we can also get rid of the custom
allocator code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:28 -05:00
Trond Myklebust
9157c31dd6 NFSv4: Replace state_owner->so_owner_id with an ida based allocator
We're unlikely to ever need more than 2^31 simultaneous open owners,
so let's replace the custom allocator with the generic ida allocator.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:28 -05:00
Trond Myklebust
d1e284d50a NFSv4: Clean up nfs4_get_state_owner
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:28 -05:00
Trond Myklebust
1313e6034a NFS: Remove unnecessary includes from linux/nfs_fs_i.h
Also from linux/nfs_xdr.h.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:28 -05:00
Stanislav Kinsbursky
2561d618ff NFS: remove RPC PipeFS mount point reference from blocklayout routines
This is a cleanup patch. We don't need this reference anymore, because
blocklayout pipes dentries now creates and destroys in per-net operations and
on PipeFS mount/umount notification.
Note that nfs4blocklayout_register_net() now returns 0 instead of -ENOENT in
case of PipeFS superblock absence. This is ok, because blocklayout pipe dentry
will be created on PipeFS mount event.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:27 -05:00
Stanislav Kinsbursky
627f30668f NFS: blocklayout PipeFS notifier introduced
This patch subscribes blocklayout pipes to RPC pipefs notifications. Notifier
is registering on blocklayout module load. This notifier callback is
responsible for creation/destruction of PipeFS blocklayout pipe dentry.
Note that no locking required in notifier callback because PipeFS superblock
pointer is passed as an argument from it's creation or destruction routine and
thus we can be sure about it's validity.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:27 -05:00
Stanislav Kinsbursky
9e2e74dba6 NFS: blocklayout pipe creation per network namespace context introduced
This patch implements blocklayout pipe creation and registration per each
existent network namespace.
This was achived by registering NFS per-net operations, responsible for
blocklayout pipe allocation/register and unregister/destruction instead of
initialization and destruction of static "bl_device_pipe" pipe (this one was
removed).
Note, than pointer to network blocklayout pipe is stored in per-net "nfs_net"
structure, because allocating of one more per-net structure for blocklayout
module looks redundant.
This patch also changes dev_remove() function prototype (and all it's callers,
where it' requied) by adding network namespace pointer parameter, which is used
to discover proper blocklayout pipe for rpc_queue_upcall() call.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:27 -05:00
Stanislav Kinsbursky
332dfab6f4 NFS: handle blocklayout pipe PipeFS dentry by network namespace aware routines
This patch makes blocklayout pipe dentry allocated and destroyed in network
namespace context by PipeFS network namespace aware routines.
Network namespace context is obtained from nfs_client structure.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:27 -05:00
Stanislav Kinsbursky
eee17325f1 NFS: idmap PipeFS notifier introduced
v2:
1) Added "nfs_idmap_init" and "nfs_idmap_quit" definitions for kernels built
without CONFIG_NFS_V4 option set.

This patch subscribes NFS clients to RPC pipefs notifications. Idmap notifier
is registering on NFS module load. This notifier callback is responsible for
creation/destruction of PipeFS idmap pipe dentry for NFS4 clients.

Since ipdmap pipe is created in rpc client pipefs directory, we have make sure,
that this directory has been created already. IOW RPC client notifier callback
has been called already. To achive this, PipeFS notifier priorities has been
introduced (RPC clients notifier priority is greater than NFS idmap one).
But this approach gives another problem: unlink for RPC client directory will
be called before NFS idmap pipe unlink on UMOUNT event and will fail, because
directory is not empty.
The solution, introduced in this patch, is to try to remove client directory
once again after idmap pipe was unlinked. This looks like ugly hack, so
probably it should be replaced in some more elegant way.

Note that no locking required in notifier callback because PipeFS superblock
pointer is passed as an argument from it's creation or destruction routine and
thus we can be sure about it's validity.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:27 -05:00
Stanislav Kinsbursky
4929d1d33f NFS: handle NFS idmap pipe PipeFS dentries by network namespace aware routines
This patch makes NFS idmap pipes dentries allocated and destroyed in network
namespace context by PipeFS network namespace aware routines.
Network namespace context is obtained from nfs_client structure.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:27 -05:00
Stanislav Kinsbursky
7bb782c6ac NFS: create callback transports in parent transport network namespace
This patch replaces static "init_net" references with parent transport xprt_net
reference. Thus callback transports will be created in the same network
namespace as respective NFS mount point was created.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:27 -05:00
Stanislav Kinsbursky
6d59b8d599 NFS: pass NFS client owner network namespace to RPC client creation routine
This patch replaces static "init_net" with nfs_client->net pointer in RPC
client creation calls.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:27 -05:00
Stanislav Kinsbursky
e50a7a1a42 NFS: make NFS client allocated per network namespace context
This patch adds new net variable to nfs_client structure. This variable is set
on NFS client creation and cheched during matching NFS client search.
Initially current->nsproxy->net_ns is used as network namespace owner for new
NFS client to create. This network namespace pointer is set during mount
options parsing and thus can be passed from user-spave utils in future if will
be necessary.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:26 -05:00
Stanislav Kinsbursky
39cb67b9a0 NFS: remove RPC PipeFS mount point references from NFS cache routines
This is a cleanup patch. We don't need this reference anymore, because DNS
resolver cache now creates it's dentries in per-net operations and on PipeFS
mount/umount notification.
Note that nfs_cache_register_net() now returns 0 instead of -ENOENT in case of
PiepFS superblock absence. This is ok, Dns resolver cache will be regestered on
PipeFS mount event.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:26 -05:00
Stanislav Kinsbursky
9df69c81b4 NFS: DNS resolver PipeFS notifier introduced
This patch subscribes DNS resolver caches to RPC pipefs notifications. Notifier
is registering on NFS module load. This notifier callback is responsible for
creation/destruction of PipeFS DNS resolver cache directory.
Note that no locking required in notifier callback because PipeFS superblock
pointer is passed as an argument from it's creation or destruction routine and
thus we can be sure about it's validity.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:26 -05:00
Stanislav Kinsbursky
1b340d0118 NFS: DNS resolver cache per network namespace context introduced
This patch implements DNS resolver cache creation and registration for each
alive network namespace context.
This was done by registering NFS per-net operations, responsible for DNS cache
allocation/register and unregister/destructioning instead of initialization and
destruction of static "nfs_dns_resolve" cache detail (this one was removed).
Pointer to network dns resolver cache is stored in new per-net "nfs_net"
structure.
This patch also changes nfs_dns_resolve_name() function prototype (and it's
calls) by adding network pointer parameter, which is used to get proper DNS
resolver cache pointer for do_cache_lookup_wait() call.

Note: empty nfs_dns_resolver_init() and nfs_dns_resolver_destroy() functions
will be used in next patch in the series.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:26 -05:00
Stanislav Kinsbursky
5c1cacb175 NFS: handle NFS caches dentries by network namespace aware routines
This patch makes NFS caches PipeFS dentries allocated and destroyed in network
namespace context by PipeFS network namespace aware routines.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:26 -05:00
Stanislav Kinsbursky
9222b95506 NFS: split cache creation and PipeFS registration
This precursor patch splits NFS cache creation and PipeFS registartion.
It's required for latter split of NFS DNS resolver cache creation per network
namespace context and PipeFS registration/unregistration on MOUNT/UMOUNT
events.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:26 -05:00
Stanislav Kinsbursky
820f9442e7 SUNRPC: split cache creation and PipeFS registration
This precursor patch splits SUNRPC cache creation and PipeFS registartion.
It's required for latter split of NFS DNS resolver cache creation per network
namespace context and PipeFS registration/unregistration on MOUNT/UMOUNT
events.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:26 -05:00
Stanislav Kinsbursky
30507f58ce SUNRPC: remove RPC PipeFS mount point reference from RPC client
This is a cleanup patch. We don't need this reference anymore.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:26 -05:00
Stanislav Kinsbursky
c239d83b99 SUNRPC: split SUNPRC PipeFS dentry and private pipe data creation
This patch is a final step towards to removing PipeFS inode references from
kernel code other than PipeFS itself. It makes all kernel SUNRPC PipeFS users
depends on pipe private data, which state depend on their specific operations,
etc.
This patch completes SUNRPC PipeFS preparations and allows to create pipe
private data and PipeFS dentries independently.
Next step will be making SUNPRC PipeFS dentries allocated by SUNRPC PipeFS
network namespace aware routines.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:25 -05:00
Stanislav Kinsbursky
d706ed1f50 SUNPRC: cleanup RPC PipeFS pipes upcall interface
RPC pipe upcall doesn't requires only private pipe data. Thus RPC inode
references in this code can be removed.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-31 18:20:25 -05:00
David Howells
700920eb5b KEYS: Allow special keyrings to be cleared
The kernel contains some special internal keyrings, for instance the DNS
resolver keyring :

2a93faf1 I-----     1 perm 1f030000     0     0 keyring   .dns_resolver: empty

It would occasionally be useful to allow the contents of such keyrings to be
flushed by root (cache invalidation).

Allow a flag to be set on a keyring to mark that someone possessing the
sysadmin capability can clear the keyring, even without normal write access to
the keyring.

Set this flag on the special keyrings created by the DNS resolver, the NFS
identity mapper and the CIFS identity mapper.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
2012-01-19 14:38:51 +11:00
Linus Torvalds
a12587b003 NFS client bugfixes and cleanups for Linux 3.3 (pull 2)
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPFJ45AAoJEGcL54qWCgDyy9QP/2LmBpYyu3L6vJKRDPxHO9/A
 geLuJsc+m+NU7V7k2H2PWj5TfLPyY33+FEdR0+iqmYVdCK+cb/fvXjeRmhwaFE78
 uoOs9x4LYtMUN+OCOdPJaNc8vosBxHfD9GE3Qb6KFKT4/2pjRu7kjbZvYEyufDhO
 FrPkSiJv5VUgV+xWsgp2C0FSbBFUQ4iGZ3nPeqJCLAT+1QEaJd3PIaJdUepGCi/X
 wakUGAWGKA/sWDcJgBbovVddB8j6FINmTrFme3ZSYqkOOF9mdmzxa/tvhFMjr/17
 sE2L3BHqQioRNXfjY4Ex3kGdVbuIVIIOSNAYeUysuunTSGsub0Ou+UJpj/CtNhur
 auZhjeZQrQUEd1j4Zb4knE3PyjF/GT1RV1CstmdDqty9+BPrzZdx8Rfj61IlLWEz
 b4DYT2HYxXifH5ZbDwA/uyApNKg95kjkp/OLGxwF640gIKXoeKo3xfJO+syQ5wKH
 AdBuTV0vPvkMwdY0H64IHghs47y7A3M/ZbK1jyAkaTzkq15GPqp49ojwFLc3UESb
 JiV/66hmmIJCAqEMu8OncHNtzsrZ2ZO5fqPSjhcrc4CEVCeqNy8WwP8she6BeR2U
 EwaWIqBs1zt5A6dZE8yxdu2Mvai3oRo/ioa6bPwxCAuy7o6/p2x/39TLe27fL18z
 1Qss51gQfFWe8lM8Rhj+
 =beF6
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

NFS client bugfixes and cleanups for Linux 3.3 (pull 2)

* tag 'nfs-for-3.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  pnfsblock: alloc short extent before submit bio
  pnfsblock: remove rpc_call_ops from struct parallel_io
  pnfsblock: move find lock page logic out of bl_write_pagelist
  pnfsblock: cleanup bl_mark_sectors_init
  pnfsblock: limit bio page count
  pnfsblock: don't spinlock when freeing block_dev
  pnfsblock: clean up _add_entry
  pnfsblock: set read/write tk_status to pnfs_error
  pnfsblock: acquire im_lock in _preload_range
  NFS4: fix compile warnings in nfs4proc.c
  nfs: check for integer overflow in decode_devicenotify_args()
  NFS: cleanup endian type in decode_ds_addr()
  NFS: add an endian notation
2012-01-16 15:08:13 -08:00
Linus Torvalds
0a80939b3e Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999 BFCB D920 0E6C D1AD B8F1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPD2aFAAoJENkgDmzRrbjxNzsQAIeYbbrXYLjr6kQzUSngj/eC
 FzjaTEfYTQIeuQCFJHcHthyc5lXV4sQbo3jOezW+Bp5yuDJL2aWIHesSfWZe7imu
 zQdM4VshOYdAmUR9Q0AW5zhB8Smbs7/AyABiF2jm4p0ZPOuyMDSlei9sjvE9Vjvt
 B7g5ht7L6kz0JbDnwwy0u5gs+tEitwpXYId9Y4ysZIBzIbL0qkPX8veOddGTMy0N
 8xhWXaKtufpjvxFD2ORLDsw3AkoF1xXSNuFd/5nzCNpbeE7TW931jfkPoqJumuAO
 7GLxcU9kKYl+IICobC6wBtsj/RrB7w+cBXMvPGwdBliam1qaRhUcJZi5FLM/Ha5d
 2A9QDYNUpoXiO8JbPXrV9Z+Y0+Co8RilsQj7R/rjZh6AbbYCWt9nxzx2Svl/RfTr
 xfiimHuB2P3rHjOvpCXULwOOuE5c8MzPuWncpdjiD3uGXOY/aY+X1m+if/quJw9D
 grPlKL0+YiRakEYUeGG4M77KCqyKFZaF7L7UQPbqfZcj8V/9AW3/7U5I/B9RlAjs
 idsr4fcf5s0N+oKUyTCW1ncpUDQNiwbU2NyJQqeu1ZxaRGj72AgyvsaNeyIPDyK+
 f6x95Bi7i8KLjXc9Z1KvJwh2Nxt25gNUiTYVha/9H2NpJGd1cfI15kTOGXrgddVv
 1pvuGcJDZwYiwfiXr3FL
 =HHrh
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://github.com/rustyrussell/linux

Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999  BFCB D920 0E6C D1AD B8F1

* tag 'for-linus' of git://github.com/rustyrussell/linux:
  module_param: check that bool parameters really are bool.
  intelfbdrv.c: bailearly is an int module_param
  paride/pcd: fix bool verbose module parameter.
  module_param: make bool parameters really bool (drivers & misc)
  module_param: make bool parameters really bool (arch)
  module_param: make bool parameters really bool (core code)
  kernel/async: remove redundant declaration.
  printk: fix unnecessary module_param_name.
  lirc_parallel: fix module parameter description.
  module_param: avoid bool abuse, add bint for special cases.
  module_param: check type correctness for module_param_array
  modpost: use linker section to generate table.
  modpost: use a table rather than a giant if/else statement.
  modules: sysfs - export: taint, coresize, initsize
  kernel/params: replace DEBUGP with pr_debug
  module: replace DEBUGP with pr_debug
  module: struct module_ref should contains long fields
  module: Fix performance regression on modules with large symbol tables
  module: Add comments describing how the "strmap" logic works

Fix up conflicts in scripts/mod/file2alias.c due to the new linker-
generated table approach to adding __mod_*_device_table entries.  The
ARM sa11x0 mcp bus needed to be converted to that too.
2012-01-14 12:32:16 -08:00
Mel Gorman
a6bc32b899 mm: compaction: introduce sync-light migration for use by compaction
This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
mode that avoids writing back pages to backing storage.  Async compaction
maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
used.

This avoids sync compaction stalling for an excessive length of time,
particularly when copying files to a USB stick where there might be a
large number of dirty pages backed by a filesystem that does not support
->writepages.

[aarcange@redhat.com: This patch is heavily based on Andrea's work]
[akpm@linux-foundation.org: fix fs/nfs/write.c build]
[akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:09 -08:00
Mel Gorman
b969c4ab9f mm: compaction: determine if dirty pages can be migrated without blocking within ->migratepage
Asynchronous compaction is used when allocating transparent hugepages to
avoid blocking for long periods of time.  Due to reports of stalling,
there was a debate on disabling synchronous compaction but this severely
impacted allocation success rates.  Part of the reason was that many dirty
pages are skipped in asynchronous compaction by the following check;

	if (PageDirty(page) && !sync &&
		mapping->a_ops->migratepage != migrate_page)
			rc = -EBUSY;

This skips over all mapping aops using buffer_migrate_page() even though
it is possible to migrate some of these pages without blocking.  This
patch updates the ->migratepage callback with a "sync" parameter.  It is
the responsibility of the callback to fail gracefully if migration would
block.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andy Isaacson <adi@hexapodia.org>
Cc: Nai Xia <nai.xia@gmail.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-12 20:13:09 -08:00
Rusty Russell
90ab5ee941 module_param: make bool parameters really bool (drivers & misc)
module_param(bool) used to counter-intuitively take an int.  In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option.  For this version
it'll simply give a warning, but it'll break next kernel version.

Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13 09:32:20 +10:30
Peng Tao
7c5465d6cc pnfsblock: alloc short extent before submit bio
As discussed earlier, it is better for block client to allocate memory for
tracking extents state before submitting bio. So the patch does it by allocating
a short_extent for every INVALID extent touched by write pagelist and for
every zeroing page we created, saving them in layout header. Then in end_io we
can just use them to create commit list items and avoid memory allocation there.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-12 16:52:10 -05:00
Peng Tao
c0411a94a8 pnfsblock: remove rpc_call_ops from struct parallel_io
block layout can just make use of generic read/write_done.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-12 16:52:10 -05:00
Peng Tao
72c5088799 pnfsblock: move find lock page logic out of bl_write_pagelist
Also avoid unnecessary lock_page if page is handled by others.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-12 16:52:10 -05:00
Peng Tao
60c52e3a72 pnfsblock: cleanup bl_mark_sectors_init
It does not need to manipulate on partial initialized blocks.
Writeback code takes care of it.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-12 16:52:09 -05:00
Peng Tao
74a6eeb44c pnfsblock: limit bio page count
One bio can have at most BIO_MAX_PAGES pages. We should limit it bec otherwise
bio_alloc will fail when there are many pages in one read/write_pagelist.

Cc: <stable@vger.kernel.org> #3.1+
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-12 16:39:05 -05:00
Peng Tao
93a3844ee0 pnfsblock: don't spinlock when freeing block_dev
bl_free_block_dev() may sleep. We can not call it with spinlock held.
Besides, there is no need to take bm_lock as we are last user freeing bm_devlist.

Cc: <stable@vger.kernel.org> #3.1+
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-12 16:39:04 -05:00
Peng Tao
57582b372f pnfsblock: clean up _add_entry
It is wrong to kmalloc in _add_entry() as it is inside
spinlock. memory should be already allocated _add_entry() is called.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-12 16:38:55 -05:00
Peng Tao
82b906d655 pnfsblock: set read/write tk_status to pnfs_error
To pass the IO status to upper layer.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-12 16:38:51 -05:00
Peng Tao
39e567ae36 pnfsblock: acquire im_lock in _preload_range
When calling _add_entry, we should take the im_lock to protect
agains other modifiers.

Cc: <stable@vger.kernel.org> #3.1+
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-12 16:38:49 -05:00
Peng Tao
de040beccd NFS4: fix compile warnings in nfs4proc.c
compile in nfs-for-3.3 branch shows following warnings. Fix it here.

fs/nfs/nfs4proc.c: In function ‘__nfs4_get_acl_uncached’:
fs/nfs/nfs4proc.c:3589: warning: format ‘%ld’ expects type ‘long int’, but argument 4 has type ‘size_t’
fs/nfs/nfs4proc.c:3589: warning: format ‘%ld’ expects type ‘long int’, but argument 6 has type ‘size_t’

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-12 16:31:51 -05:00
Dan Carpenter
363e0df057 nfs: check for integer overflow in decode_devicenotify_args()
On 32 bit, if n is too large then "n * sizeof(*args->devs)" could
overflow and args->devs would be smaller than expected.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-12 16:30:07 -05:00
Dan Carpenter
13fff2f35f NFS: cleanup endian type in decode_ds_addr()
port is supposed to be a __be16 here.  The existing code should work
fine, but this is a cleanup.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-12 16:30:03 -05:00
Dan Carpenter
0e0243dc35 NFS: add an endian notation
This function returns a big endian value.  The implementation in
fs/nfs/callback_proc.c is declared with "__be32" but the .h file uses
"unsigned" instead.  It makes sparse complain:

fs/nfs/callback_proc.c:232:8: error:
	symbol 'nfs4_callback_layoutrecall' redeclared with different
	type (originally declared at fs/nfs/callback.h:165) - different
	base types

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-12 16:29:51 -05:00
Linus Torvalds
57eccf1c2a Merge branch 'nfs-for-3.3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
* 'nfs-for-3.3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Change the default setting of the nfs4_disable_idmapping parameter
  NFSv4: Save the owner/group name string when doing open
  NFS: Remove pNFS bloat from the generic write path
  pnfs-obj: Must return layout on IO error
  pnfs-obj: pNFS errors are communicated on iodata->pnfs_error
  NFS: Cache state owners after files are closed
  NFS: Clean up nfs4_find_state_owners_locked()
  NFSv4: include bitmap in nfsv4 get acl data
  nfs: fix a minor do_div portability issue
  NFSv4.1: cleanup comment and debug printk
  NFSv4.1: change nfs4_free_slot parameters for dynamic slots
  NFSv4.1: cleanup init and reset of session slot tables
  NFSv4.1: fix backchannel slotid off-by-one bug
  nfs: fix regression in handling of context= option in NFSv4
  NFS - fix recent breakage to NFS error handling.
  NFS: Retry mounting NFSROOT
  SUNRPC: Clean up the RPCSEC_GSS service ticket requests
2012-01-10 14:57:40 -08:00
Trond Myklebust
074b1d12fe NFSv4: Change the default setting of the nfs4_disable_idmapping parameter
Now that the use of numeric uids/gids is officially sanctioned in
RFC3530bis, it is time to change the default here to 'enabled'.

By doing so, we ensure that NFSv4 copies the behaviour of NFSv3 when we're
using the default AUTH_SYS authentication (i.e. when the client uses the
numeric uids/gids as authentication tokens), so that when new files are
created, they will appear to have the correct user/group.
It also fixes a number of backward compatibility issues when migrating
from NFSv3 to NFSv4 on a platform where the server uses different uid/gid
mappings than the client.

Note also that this setting has been successfully tested against servers
that do not support numeric uids/gids at several Connectathon/Bakeathon
events at this point, and the fall back to using string names/groups has
been shown to work well in all those test cases.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-09 14:22:27 -05:00
Linus Torvalds
eb59c505f8 Merge branch 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
* 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits)
  PM / Hibernate: Implement compat_ioctl for /dev/snapshot
  PM / Freezer: fix return value of freezable_schedule_timeout_killable()
  PM / shmobile: Allow the A4R domain to be turned off at run time
  PM / input / touchscreen: Make st1232 use device PM QoS constraints
  PM / QoS: Introduce dev_pm_qos_add_ancestor_request()
  PM / shmobile: Remove the stay_on flag from SH7372's PM domains
  PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume
  PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode
  PM: Drop generic_subsys_pm_ops
  PM / Sleep: Remove forward-only callbacks from AMBA bus type
  PM / Sleep: Remove forward-only callbacks from platform bus type
  PM: Run the driver callback directly if the subsystem one is not there
  PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
  PM/Devfreq: Add Exynos4-bus device DVFS driver for Exynos4210/4212/4412.
  PM / Sleep: Merge internal functions in generic_ops.c
  PM / Sleep: Simplify generic system suspend callbacks
  PM / Hibernate: Remove deprecated hibernation snapshot ioctls
  PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
  ARM: S3C64XX: Implement basic power domain support
  PM / shmobile: Use common always on power domain governor
  ...

Fix up trivial conflict in fs/xfs/xfs_buf.c due to removal of unused
XBT_FORCE_SLEEP bit
2012-01-08 13:10:57 -08:00
Trond Myklebust
6926afd192 NFSv4: Save the owner/group name string when doing open
...so that we can do the uid/gid mapping outside the asynchronous RPC
context.
This fixes a bug in the current NFSv4 atomic open code where the client
isn't able to determine what the true uid/gid fields of the file are,
(because the asynchronous nature of the OPEN call denies it the ability
to do an upcall) and so fills them with default values, marking the
inode as needing revalidation.
Unfortunately, in some cases, the VFS will do some additional sanity
checks on the file, and may override the server's decision to allow
the open because it sees the wrong owner/group fields.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-07 13:22:46 -05:00
Al Viro
34c80b1d93 vfs: switch ->show_options() to struct dentry *
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:19:54 -05:00
Al Viro
a6322de67b vfs: switch ->show_path() to struct dentry *
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:16:55 -05:00
Al Viro
d861c630e9 vfs: switch ->show_devname() to struct dentry *
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:16:54 -05:00
Al Viro
64132379d5 vfs: switch ->show_stats to struct dentry *
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06 23:16:54 -05:00
Trond Myklebust
e2fecb215b NFS: Remove pNFS bloat from the generic write path
We have no business doing any this in the standard write release path.
Get rid of it, and put it in the pNFS layer.

Also, while we're at it, get rid of the completely bogus unlock/relock
semantics that were present in nfs_writeback_release_full(). It is
not only unnecessary, but actually dangerous to release the write lock
just in order to take it again in nfs_page_async_flush(). Better just
to open code the pgio operations in a pnfs helper.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-06 08:57:46 -05:00
Boaz Harrosh
fe0fe83585 pnfs-obj: Must return layout on IO error
As mandated by the standard. In case of an IO error, a pNFS
objects layout driver must return it's layout. This is because
all device errors are reported to the server as part of the
layout return buffer.

This is implemented the same way PNFS_LAYOUTRET_ON_SETATTR
is done, through a bit flag on the pnfs_layoutdriver_type->flags
member. The flag is set by the layout driver that wants a
layout_return preformed at pnfs_ld_{write,read}_done in case
of an error.
(Though I have not defined a wrapper like pnfs_ld_layoutret_on_setattr
 because this code is never called outside of pnfs.c and pnfs IO
 paths)

Without this patch 3.[0-2] Kernels leak memory and have an annoying
WARN_ON after every IO error utilizing the pnfs-obj driver.

[This patch is for 3.2 Kernel. 3.1/0 Kernels need a different patch]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-06 08:55:33 -05:00
Boaz Harrosh
5c0b4129c0 pnfs-obj: pNFS errors are communicated on iodata->pnfs_error
Some time along the way pNFS IO errors were switched to
communicate with a special iodata->pnfs_error member instead
of the regular RPC members. But objlayout was not switched
over.

Fix that!
Without this fix any IO error is hanged, because IO is not
switched to MDS and pages are never cleared or read.

[Applies to 3.2.0. Same bug different patch for 3.1/0 Kernels]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-06 08:55:23 -05:00
Chuck Lever
0aaaf5c424 NFS: Cache state owners after files are closed
Servers have a finite amount of memory to store NFSv4 open and lock
owners.  Moreover, servers may have a difficult time determining when
they can reap their state owner table, thanks to gray areas in the
NFSv4 protocol specification.  Thus clients should be careful to reuse
state owners when possible.

Currently Linux is not too careful.  When a user has closed all her
files on one mount point, the state owner's reference count goes to
zero, and it is released.  The next OPEN allocates a new one.  A
workload that serially opens and closes files can run through a large
number of open owners this way.

When a state owner's reference count goes to zero, slap it onto a free
list for that nfs_server, with an expiry time.  Garbage collect before
looking for a state owner.  This makes state owners for active users
available for re-use.

Now that there can be unused state owners remaining at umount time,
purge the state owner free list when a server is destroyed.  Also be
sure not to reclaim unused state owners during state recovery.

This change has benefits for the client as well.  For some workloads,
this approach drops the number of OPEN_CONFIRM calls from the same as
the number of OPEN calls, down to just one.  This reduces wire traffic
and thus open(2) latency.  Before this patch, untarring a kernel
source tarball shows the OPEN_CONFIRM call counter steadily increasing
through the test.  With the patch, the OPEN_CONFIRM count remains at 1
throughout the entire untar.

As long as the expiry time is kept short, I don't think garbage
collection should be terribly expensive, although it does bounce the
clp->cl_lock around a bit.

[ At some point we should rationalize the use of the nfs_server
->destroy method. ]

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[Trond: Fixed a garbage collection race and a few efficiency issues]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 11:59:18 -05:00
Chuck Lever
414adf14cd NFS: Clean up nfs4_find_state_owners_locked()
There's no longer a need to check the so_server field in the state
owner, because nowadays the RB tree we search for state owners
contains owners for that only server.

Make nfs4_find_state_owners_locked() use the same tree searching logic
as nfs4_insert_state_owner_locked().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 10:42:42 -05:00
Andy Adamson
bf118a342f NFSv4: include bitmap in nfsv4 get acl data
The NFSv4 bitmap size is unbounded: a server can return an arbitrary
sized bitmap in an FATTR4_WORD0_ACL request.  Replace using the
nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server
with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data
xdr length to the (cached) acl page data.

This is a general solution to commit e5012d1f "NFSv4.1: update
nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead
when getting ACLs.

Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr
was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved.

Cc: stable@kernel.org
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 10:42:42 -05:00
Chris Metcalf
3476f114ad nfs: fix a minor do_div portability issue
This change modifies filelayout_get_dense_offset() to use the functions
in math64.h and thus avoid a 32-bit platform compile error trying to
use do_div() on an s64 type.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Reviewed-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 10:42:42 -05:00
Andy Adamson
0b1c8fc43c NFSv4.1: cleanup comment and debug printk
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 10:42:41 -05:00
Andy Adamson
aabd0b40b3 NFSv4.1: change nfs4_free_slot parameters for dynamic slots
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 10:42:41 -05:00
Andy Adamson
aacd553727 NFSv4.1: cleanup init and reset of session slot tables
We are either initializing or resetting a session. Initialize or reset
the session slot tables accordingly.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 10:42:40 -05:00
Andy Adamson
61f2e51065 NFSv4.1: fix backchannel slotid off-by-one bug
Cc:stable@kernel.org
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 10:42:40 -05:00
Jeff Layton
8a0d551a59 nfs: fix regression in handling of context= option in NFSv4
Setting the security context of a NFSv4 mount via the context= mount
option is currently broken. The NFSv4 codepath allocates a parsed
options struct, and then parses the mount options to fill it. It
eventually calls nfs4_remote_mount which calls security_init_mnt_opts.
That clobbers the lsm_opts struct that was populated earlier. This bug
also looks like it causes a small memory leak on each v4 mount where
context= is used.

Fix this by moving the initialization of the lsm_opts into
nfs_alloc_parsed_mount_data. Also, add a destructor for
nfs_parsed_mount_data to make it easier to free all of the allocations
hanging off of it, and to ensure that the security_free_mnt_opts is
called whenever security_init_mnt_opts is.

I believe this regression was introduced quite some time ago, probably
by commit c02d7adf.

Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 10:42:40 -05:00
NeilBrown
2edb6bc385 NFS - fix recent breakage to NFS error handling.
From c6d615d2b97fe305cbf123a8751ced859dca1d5e Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Wed, 16 Nov 2011 09:39:05 +1100
Subject: [PATCH] NFS - fix recent breakage to NFS error handling.

commit 02c24a8218 made a small and
presumably unintended change to write error handling in NFS.

Previously an error from filemap_write_and_wait_range would only be of
interest if nfs_file_fsync did not return an error.  After this commit,
an error from filemap_write_and_wait_range would mean that (the rest of)
nfs_file_fsync would not even be called.

This means that:
 1/ you are more likely to see EIO than e.g. EDQUOT or ENOSPC.
 2/ NFS_CONTEXT_ERROR_WRITE remains set for longer so more writes are
    synchronous.

This patch restores previous behaviour.

Cc: stable@kernel.org
Cc: Josef Bacik <josef@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-05 10:42:39 -05:00
Trond Myklebust
68c97153fb SUNRPC: Clean up the RPCSEC_GSS service ticket requests
Instead of hacking specific service names into gss_encode_v1_msg, we should
just allow the caller to specify the service name explicitly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
2012-01-05 10:42:38 -05:00
Al Viro
1a67aafb5f switch ->mknod() to umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:54 -05:00
Al Viro
4acdaf27eb switch ->create() to umode_t
vfs_create() ignores everything outside of 16bit subset of its
mode argument; switching it to umode_t is obviously equivalent
and it's the only caller of the method

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:53 -05:00
Al Viro
18bb1db3e7 switch vfs_mkdir() and ->mkdir() to umode_t
vfs_mkdir() gets int, but immediately drops everything that might not
fit into umode_t and that's the only caller of ->mkdir()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:53 -05:00
Al Viro
6b520e0565 vfs: fix the stupidity with i_dentry in inode destructors
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
the cost of taking it into inode_init_always() will be negligible for pipes
and sockets and negative for everything else.  Not to mention the removal of
boilerplate code from ->destroy_inode() instances...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:40 -05:00
Al Viro
5352d3b65a make nfs_follow_remote_path() handle ERR_PTR() passed as root_mnt
... rather than duplicating that in callers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:39 -05:00
Al Viro
e407699ef5 btrfs, nfs, apparmor: don't pull mnt_namespace.h for no reason...
it's not needed anymore; we used to, back when we had to do
mount_subtree() by hand, complete with put_mnt_ns() in it.
No more...  Apparmor didn't need it since the __d_path() fix.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:38 -05:00
Al Viro
5ede7b1cfa pull manipulations of rpc_cred inside alloc_nfs_open_context()
No need to duplicate them in both callers; make it return
ERR_PTR(-ENOMEM) on allocation failure instead of NULL and
it'll be able to report rpc_lookup_cred() failures just
fine.  Callers are much happier that way...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:52:34 -05:00
Rafael J. Wysocki
b00f4dc5ff Merge branch 'master' into pm-sleep
* master: (848 commits)
  SELinux: Fix RCU deref check warning in sel_netport_insert()
  binary_sysctl(): fix memory leak
  mm/vmalloc.c: remove static declaration of va from __get_vm_area_node
  ipmi_watchdog: restore settings when BMC reset
  oom: fix integer overflow of points in oom_badness
  memcg: keep root group unchanged if creation fails
  nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()
  nilfs2: unbreak compat ioctl
  cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
  evm: prevent racing during tfm allocation
  evm: key must be set once during initialization
  mmc: vub300: fix type of firmware_rom_wait_states module parameter
  Revert "mmc: enable runtime PM by default"
  mmc: sdhci: remove "state" argument from sdhci_suspend_host
  x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT
  IB/qib: Correct sense on freectxts increment and decrement
  RDMA/cma: Verify private data length
  cgroups: fix a css_set not found bug in cgroup_attach_proc
  oprofile: Fix uninitialized memory access when writing to writing to oprofilefs
  Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
  ...

Conflicts:
	kernel/cgroup_freezer.c
2011-12-21 21:59:45 +01:00
Trond Myklebust
6c52961743 NFS: Fix a regression in nfs_file_llseek()
After commit 06222e491e (fs: handle
SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek)
the behaviour of llseek() was changed so that it always revalidates
the file size. The bug appears to be due to a logic error in the
afore-mentioned commit, which always evaluates to 'true'.

Reported-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>=3.1]
2011-12-15 18:44:36 -05:00
Trond Myklebust
652f89f64f NFSv4: Do not accept delegated opens when a delegation recall is in effect
...and report the servers that try to return a delegation when the client
is using the CLAIM_DELEG_CUR open mode. That behaviour is explicitly
forbidden in RFC3530.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-12-09 19:05:58 -05:00
Trond Myklebust
4b44b40e04 NFSv4: Ensure correct locking when accessing the 'lock_states' list
There are currently 2 places in the state recovery code, where we do not
take sufficient precautions before accessing the state->lock_states. In
both cases, we should be holding the state->state_lock.

Reported-by: Pascal Bouchareine <pascal@gandi.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-12-09 16:31:52 -05:00
Jeff Layton
d310310cbf Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezer
Allow the freezer to skip wait_on_bit_killable sleeps in the sunrpc
layer. This should allow suspend and hibernate events to proceed, even
when there are RPC's pending on the wire.

Also, wrap the TASK_KILLABLE sleeps in NFS layer in freezer_do_not_count
and freezer_count calls. This allows the freezer to skip tasks that are
sleeping while looping on EJUKEBOX or NFS4ERR_DELAY sorts of errors.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-12-06 22:12:27 +01:00
Trond Myklebust
111d489f0f NFSv4.1: Ensure that we handle _all_ SEQUENCE status bits.
Currently, the code assumes that the SEQUENCE status bits are mutually
exclusive. They are not...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 2.6.34]
2011-12-01 16:37:42 -05:00
Trond Myklebust
4f38e4aadc NFSv4: Don't error if we handled it in nfs4_recovery_handle_error
If we handled an error condition, then nfs4_recovery_handle_error should
return '0' so that the state recovery thread can continue.
Also ensure that nfs4_check_lease() continues to abort if we haven't got
any credentials by having it return ENOKEY (which is not handled).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-12-01 16:31:34 -05:00
Linus Torvalds
e25ba0ce03 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Revert pnfs ugliness from the generic NFS read code path
  SUNRPC: destroy freshly allocated transport in case of sockaddr init error
  NFS: Fix a regression in the referral code
  nfs: move nfs_file_operations declaration to bottom of file.c (try #2)
  nfs: when attempting to open a directory, fall back on normal lookup (try #5)
2011-11-22 08:54:15 -08:00
Al Viro
ea441d1104 new helper: mount_subtree()
takes vfsmount and relative path, does lookup within that vfsmount
(possibly triggering automounts) and returns the result as root
of subtree suitable for return by ->mount() (i.e. a reference to
dentry and an active reference to its superblock grabbed, superblock
locked exclusive).

btrfs and nfs switched to it instead of open-coding the sucker.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-11-16 22:00:34 -05:00
Al Viro
c133449587 switch create_mnt_ns() to saner calling conventions, fix double mntput() in nfs
Life is much saner if create_mnt_ns(mnt) drops mnt in case of error...
Switch it to such calling conventions, switch callers, fix double mntput() in
fs/nfs/super.c one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-11-16 16:12:14 -05:00
Trond Myklebust
62e4a76987 NFS: Revert pnfs ugliness from the generic NFS read code path
pNFS-specific code belongs in the pnfs layer. It should not be
hijacking generic NFS read or write code paths.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-10 14:50:26 -05:00
Trond Myklebust
a6f498a891 NFS: Fix a regression in the referral code
Fix a regression that was introduced by commit
0c2e53f11a (NFS: Remove the unused
"lookupfh()" version of nfs4_proc_lookup()).

In the case where the lookup gets an NFS4ERR_MOVED, we want to return
the result of nfs4_get_referral(). Instead, that value is getting
clobbered by the call to nfs4_handle_exception()...

Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Tested-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-07 16:20:00 -05:00
Linus Torvalds
32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Jeff Layton
0486958f57 nfs: move nfs_file_operations declaration to bottom of file.c (try #2)
...a remove a set of forward declarations.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-04 16:39:11 -04:00
Jeff Layton
1788ea6e3b nfs: when attempting to open a directory, fall back on normal lookup (try #5)
commit d953126 changed how nfs_atomic_lookup handles an -EISDIR return
from an OPEN call. Prior to that patch, that caused the client to fall
back to doing a normal lookup. When that patch went in, the code began
returning that error to userspace. The d_revalidate codepath however
never had the corresponding change, so it was still possible to end up
with a NULL ctx->state pointer after that.

That patch caused a regression. When we attempt to open a directory that
does not have a cached dentry, that open now errors out with EISDIR. If
you attempt the same open with a cached dentry, it will succeed.

Fix this by reverting the change in nfs_atomic_lookup and allowing
attempts to open directories to fall back to a normal lookup

Also, add a NFSv4-specific f_ops->open routine that just returns
-ENOTDIR. This should never be called if things are working properly,
but if it ever is, then the dprintk may help in debugging.

To facilitate this, a new file_operations field is also added to the
nfs_rpc_ops struct.

Cc: stable@kernel.org
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-04 16:39:04 -04:00
Linus Torvalds
6736c04799 Merge branch 'nfs-for-3.2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
* 'nfs-for-3.2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (25 commits)
  nfs: set vs_hidden on nfs4_callback_version4 (try #2)
  pnfs-obj: Support for RAID5 read-4-write interface.
  pnfs-obj: move to ore 03: Remove old raid engine
  pnfs-obj: move to ore 02: move to ORE
  pnfs-obj: move to ore 01: ore_layout & ore_components
  pnfs-obj: Rename objlayout_io_state => objlayout_io_res
  pnfs-obj: Get rid of objlayout_{alloc,free}_io_state
  pnfs-obj: Return PNFS_NOT_ATTEMPTED in case of read/write_pagelist
  pnfs-obj: Remove redundant EOF from objlayout_io_state
  nfs: Remove unused variable from write.c
  nfs: Fix unused variable warning from file.c
  NFS: Remove no-op less-than-zero checks on unsigned variables.
  NFS: Clean up nfs4_xdr_dec_secinfo()
  NFS: Fix documenting comment for nfs_create_request()
  NFS4: fix cb_recallany decode error
  nfs4: serialize layoutcommit
  SUNRPC: remove rpcbind clients destruction on module cleanup
  SUNRPC: remove rpcbind clients creation during service registering
  NFSd: call svc rpcbind cleanup explicitly
  SUNRPC: cleanup service destruction
  ...
2011-11-04 12:27:43 -07:00
Jeff Layton
6070295efc nfs: set vs_hidden on nfs4_callback_version4 (try #2)
This service should not be registered with or unregistered from rpcbind.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-04 09:00:09 -04:00
Trond Myklebust
31cbecb4ab Merge branch 'osd-devel' into nfs-for-next 2011-11-02 23:56:40 -04:00
Boaz Harrosh
278c023a99 pnfs-obj: Support for RAID5 read-4-write interface.
The ore need suplied a r4w_get_page/r4w_put_page API
from Filesystem so it can get cache pages to read-into when
writing parial stripes.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-02 23:56:09 -04:00
Boaz Harrosh
04291b628c pnfs-obj: move to ore 03: Remove old raid engine
Finally remove all the old raid engine, which is by now
dead code.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-02 23:56:08 -04:00
Boaz Harrosh
eecfc6312a pnfs-obj: move to ore 02: move to ORE
In this patch we are actually moving to the ORE.
(Object Raid Engine).

objio_state holds a pointer to an ore_io_state. Once
we have an ore_io_state at hand we can call the ore
for reading/writing. We register on the done path
to kick off the nfs io_done mechanism.

Again for Ease of reviewing the old code is "#if 0"
but is not removed so the diff command works better.
The old code will be removed in the next patch.

fs/exofs/Kconfig::ORE is modified to also be auto-included
if PNFS_OBJLAYOUT is set. Since we now depend on ORE.
(See comments in fs/exofs/Kconfig)

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-02 23:56:08 -04:00
Boaz Harrosh
af4f5b54bc pnfs-obj: move to ore 01: ore_layout & ore_components
For Ease of reviewing I split the move to ore into 3 parts
	move to ore 01: ore_layout & ore_components
	move to ore 02: move to ORE
	move to ore 03: Remove old raid engine

This patch modifies the objio_lseg, layout-segment level
and devices and components arrays to use the ORE types.

Though it will be removed soon, also the raid engine
is modified to actually compile, possibly run, with
the new types. So it is the same old raid engine but
with some new ORE types.

For Ease of reviewing, some of the old code is
"#if 0" but is not removed so the diff command works
better. The old code will be removed in the 3rd patch.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-02 23:56:07 -04:00
Boaz Harrosh
e2e04355d9 pnfs-obj: Rename objlayout_io_state => objlayout_io_res
* All instances of objlayout_io_state => objlayout_io_res
* All instances of state => oir;
* All instances of ol_state => oir;

Big but nothing to it

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-02 23:56:06 -04:00
Boaz Harrosh
96218556b0 pnfs-obj: Get rid of objlayout_{alloc,free}_io_state
This is part of moving objio_osd to use the ORE.

objlayout_io_state had two functions:
1. It was used in the error reporting mechanism at layout_return.
   This function is kept intact.
   (Later patch will rename objlayout_io_state => objlayout_io_res)
2. Carrier of rw io members into the objio_read/write_paglist API.
   This is removed in this patch.

The {r,w}data received from NFS are passed directly to the
objio_{read,write}_paglist API. The io_engine is now allocating
it's own IO state as part of the read/write. The minimal
functionality that was part of the generic allocation is passed
to the io_engine.

So part of this patch is rename of:
	ios->ol_state.foo => ios->foo

At objlayout_{read,write}_done an objlayout_io_state is passed that
denotes the result of the IO. (Hence the later name change).
If the IO is successful objlayout calls an objio_free_result() API
immediately (Which for objio_osd causes the release of the io_state).
If the IO ended in an error it is hanged onto until reported in
layout_return and is released later through the objio_free_result()
API. (All this is not new just renamed and cleaned)

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-02 23:56:05 -04:00
Boaz Harrosh
e6c40fe3f4 pnfs-obj: Return PNFS_NOT_ATTEMPTED in case of read/write_pagelist
objlayout driver was always returning PNFS_ATTEMPTED from it's
read/write_pagelist operations. Even on error. Fix that.

Start by establishing an error return API from io-engine, by
not returning ssize_t (length-or-error) but returning "int"
0=OK, 0>Error. And clean up all return types in io-engine.

Then if io-engine returned error return PNFS_NOT_ATTEMPTED
to generic layer. (With a dprint)

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-02 23:56:03 -04:00
Boaz Harrosh
4cdc685c7d pnfs-obj: Remove redundant EOF from objlayout_io_state
The EOF calculation was done on .read_pagelist(), cached
in objlayout_io_state->eof, and set in objlayout_read_done()
into nfs_read_data->res.eof.

So set it directly into nfs_read_data->res.eof and avoid
the extra member.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-02 23:56:00 -04:00
Rakib Mullick
2b72c9ccd2 nfs: Remove unused variable from write.c
When CONFIG_NFS=y and CONFIG_NFS_V3_{,V4}=n we get the following warning.

	fs/nfs/write.c: In function ‘nfs_writeback_done’:
	fs/nfs/write.c:1246:21: warning: unused variable ‘server’

 Remove the variable 'server' to fix the above warning.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-02 23:50:24 -04:00
Rakib Mullick
6f276e49fd nfs: Fix unused variable warning from file.c
Fix the following unused variable warning.

fs/nfs/file.c: In function ‘nfs_file_release’:
fs/nfs/file.c:140:17: warning: unused variable ‘dentry’
fs/nfs/file.c: In function ‘nfs_file_read’:
fs/nfs/file.c:237:9: warning: unused variable ‘count’

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-02 23:49:09 -04:00
Miklos Szeredi
bfe8684869 filesystems: add set_nlink()
Replace remaining direct i_nlink updates with a new set_nlink()
updater function.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-11-02 12:53:43 +01:00
Miklos Szeredi
6d6b77f163 filesystems: add missing nlink wrappers
Replace direct i_nlink updates with the respective updater function
(inc_nlink, drop_nlink, clear_nlink, inode_dec_link_count).

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-11-02 12:53:43 +01:00
Paul Gortmaker
143cb494cb fs: add module.h to files that were implicitly using it
Some files were using the complete module.h infrastructure without
actually including the header at all.  Fix them up in advance so
once the implicit presence is removed, we won't get failures like this:

  CC [M]  fs/nfsd/nfssvc.o
fs/nfsd/nfssvc.c: In function 'nfsd_create_serv':
fs/nfsd/nfssvc.c:335: error: 'THIS_MODULE' undeclared (first use in this function)
fs/nfsd/nfssvc.c:335: error: (Each undeclared identifier is reported only once
fs/nfsd/nfssvc.c:335: error: for each function it appears in.)
fs/nfsd/nfssvc.c: In function 'nfsd':
fs/nfsd/nfssvc.c:555: error: implicit declaration of function 'module_put_and_exit'
make[3]: *** [fs/nfsd/nfssvc.o] Error 1

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:31 -04:00
Paul Gortmaker
afeacc8c1f fs: add export.h to files using EXPORT_SYMBOL/THIS_MODULE macros
These files were getting <linux/module.h> via an implicit include
path, but we want to crush those out of existence since they cost
time during compiles of processing thousands of lines of headers
for no reason.  Give them the lightweight header that just contains
the EXPORT_SYMBOL infrastructure.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:31 -04:00
Chuck Lever
e414966b81 NFS: Remove no-op less-than-zero checks on unsigned variables.
Introduced by commit 16b374ca "NFSv4.1: pnfs: filelayout: add driver's
LAYOUTGET and GETDEVICEINFO infrastructure" (October 20, 2010).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-31 11:52:47 -04:00
Chuck Lever
c6e6966602 NFS: Clean up nfs4_xdr_dec_secinfo()
Clean up: Remove superfluous logic at the tail of
nfs4_xdr_dec_secinfo() .

Introduced by commit 5a5ea0d4 "NFS: Add secinfo procedure" (March 24,
2011).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-31 11:52:47 -04:00
Chuck Lever
c02f557dd0 NFS: Fix documenting comment for nfs_create_request()
Clean up: the first parameter of nfs_create_request() has been
incorrectly documented since time immemorial (OK, since before
2.6.12).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-31 11:52:47 -04:00
Peng Tao
d743c3c9c2 NFS4: fix cb_recallany decode error
craa_type_mask is bitmap4 per RFC5661. We need to expect a length before
extracting bitmap value.

Cc: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-31 11:51:28 -04:00
Peng Tao
92407e75ce nfs4: serialize layoutcommit
Current pnfs_layoutcommit_inode can not handle parallel layoutcommit.
And as Trond suggested , there is no need for client to optimize for
parallel layoutcommit. So add NFS_INO_LAYOUTCOMMITTING flag to
mark inflight layoutcommit and serialize lalyoutcommit with it.
Also mark_inode_dirty_sync if pnfs_layoutcommit_inode fails to issue
layoutcommit.

Reported-by: Vitaliy Gusev <gusev.vitaliy@nexenta.com>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-31 11:51:28 -04:00
Linus Torvalds
f362f98e7c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue: (21 commits)
  leases: fix write-open/read-lease race
  nfs: drop unnecessary locking in llseek
  ext4: replace cut'n'pasted llseek code with generic_file_llseek_size
  vfs: add generic_file_llseek_size
  vfs: do (nearly) lockless generic_file_llseek
  direct-io: merge direct_io_walker into __blockdev_direct_IO
  direct-io: inline the complete submission path
  direct-io: separate map_bh from dio
  direct-io: use a slab cache for struct dio
  direct-io: rearrange fields in dio/dio_submit to avoid holes
  direct-io: fix a wrong comment
  direct-io: separate fields only used in the submission path from struct dio
  vfs: fix spinning prevention in prune_icache_sb
  vfs: add a comment to inode_permission()
  vfs: pass all mask flags check_acl and posix_acl_permission
  vfs: add hex format for MAY_* flag values
  vfs: indicate that the permission functions take all the MAY_* flags
  compat: sync compat_stats with statfs.
  vfs: add "device" tag to /proc/self/mountstats
  cleanup: vfs: small comment fix for block_invalidatepage
  ...

Fix up trivial conflict in fs/gfs2/file.c (llseek changes)
2011-10-28 10:49:34 -07:00
Andi Kleen
79835a710d nfs: drop unnecessary locking in llseek
This makes NFS follow the standard generic_file_llseek locking scheme.

Cc: Trond.Myklebust@netapp.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:59:00 +02:00
Andi Kleen
ef3d0fd27e vfs: do (nearly) lockless generic_file_llseek
The i_mutex lock use of generic _file_llseek hurts.  Independent processes
accessing the same file synchronize over a single lock, even though
they have no need for synchronization at all.

Under high utilization this can cause llseek to scale very poorly on larger
systems.

This patch does some rethinking of the llseek locking model:

First the 64bit f_pos is not necessarily atomic without locks
on 32bit systems. This can already cause races with read() today.
This was discussed on linux-kernel in the past and deemed acceptable.
The patch does not change that.

Let's look at the different seek variants:

SEEK_SET: Doesn't really need any locking.
If there's a race one writer wins, the other loses.

For 32bit the non atomic update races against read()
stay the same. Without a lock they can also happen
against write() now.  The read() race was deemed
acceptable in past discussions, and I think if it's
ok for read it's ok for write too.

=> Don't need a lock.

SEEK_END: This behaves like SEEK_SET plus it reads
the maximum size too. Reading the maximum size would have the
32bit atomic problem. But luckily we already have a way to read
the maximum size without locking (i_size_read), so we
can just use that instead.

Without i_mutex there is no synchronization with write() anymore,
however since the write() update is atomic on 64bit it just behaves
like another racy SEEK_SET.  On non atomic 32bit it's the same
as SEEK_SET.

=> Don't need a lock, but need to use i_size_read()

SEEK_CUR: This has a read-modify-write race window
on the same file. One could argue that any application
doing unsynchronized seeks on the same file is already broken.
But for the sake of not adding a regression here I'm
using the file->f_lock to synchronize this. Using this
lock is much better than the inode mutex because it doesn't
synchronize between processes.

=> So still need a lock, but can use a f_lock.

This patch implements this new scheme in generic_file_llseek.
I dropped generic_file_llseek_unlocked and changed all callers.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-28 14:58:58 +02:00
Linus Torvalds
ef78cc75f1 Merge branch 'nfs-for-3.2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
* 'nfs-for-3.2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (26 commits)
  Check validity of cl_rpcclient in nfs_server_list_show
  NFS: Get rid of the nfs_rdata_mempool
  NFS: Don't rely on PageError in nfs_readpage_release_partial
  NFS: Get rid of unnecessary calls to ClearPageError() in read code
  NFS: Get rid of nfs_restart_rpc()
  NFS: Get rid of the unused nfs_write_data->flags field
  NFS: Get rid of the unused nfs_read_data->flags field
  NFSv4: Translate NFS4ERR_BADNAME into ENOENT when applied to a lookup
  NFS: Remove the unused "lookupfh()" version of nfs4_proc_lookup()
  NFS: Use the inode->i_version to cache NFSv4 change attribute information
  SUNRPC: Remove unnecessary export of rpc_sockaddr2uaddr
  SUNRPC: Fix rpc_sockaddr2uaddr
  nfs/super.c: local functions should be static
  pnfsblock: fix writeback deadlock
  pnfsblock: fix NULL pointer dereference
  pnfs: recoalesce when ld read pagelist fails
  pnfs: recoalesce when ld write pagelist fails
  pnfs: make _set_lo_fail generic
  pnfsblock: add missing rpc_put_mount and path_put
  SUNRPC/NFS: make rpc pipe upcall generic
  ...
2011-10-25 15:44:06 +02:00
Linus Torvalds
1442d1678c Merge branch 'for-3.2' of git://linux-nfs.org/~bfields/linux
* 'for-3.2' of git://linux-nfs.org/~bfields/linux: (103 commits)
  nfs41: implement DESTROY_CLIENTID operation
  nfsd4: typo logical vs bitwise negate for want_mask
  nfsd4: allow NFS4_SHARE_SIGNAL_DELEG_WHEN_RESRC_AVAIL | NFS4_SHARE_PUSH_DELEG_WHEN_UNCONTENDED
  nfsd4: seq->status_flags may be used unitialized
  nfsd41: use SEQ4_STATUS_BACKCHANNEL_FAULT when cb_sequence is invalid
  nfsd4: implement new 4.1 open reclaim types
  nfsd4: remove unneeded CLAIM_DELEGATE_CUR workaround
  nfsd4: warn on open failure after create
  nfsd4: preallocate open stateid in process_open1()
  nfsd4: do idr preallocation with stateid allocation
  nfsd4: preallocate nfs4_file in process_open1()
  nfsd4: clean up open owners on OPEN failure
  nfsd4: simplify process_open1 logic
  nfsd4: make is_open_owner boolean
  nfsd4: centralize renew_client() calls
  nfsd4: typo logical vs bitwise negate
  nfs: fix bug about IPv6 address scope checking
  nfsd4: more robust ignoring of WANT bits in OPEN
  nfsd4: move name-length checks to xdr
  nfsd4: move access/deny validity checks to xdr code
  ...
2011-10-25 15:42:01 +02:00
Malahal Naineni
940aab4902 Check validity of cl_rpcclient in nfs_server_list_show
As soon as the nfs_client gets created, its cl_rpcclient is set to
ERR_PTR(-EINVAL). The rpc client structure is allocated later. Check
if the client is ready before using the cl_rpcclient pointer.

Signed-off-by: Malahal Naineni <malahal@us.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-20 18:44:04 -05:00
Trond Myklebust
b6ee8cd264 NFS: Get rid of the nfs_rdata_mempool
We don't need a mempool in order to guarantee reliable NFS read performance.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-19 13:58:38 -07:00
Trond Myklebust
fba730050d NFS: Don't rely on PageError in nfs_readpage_release_partial
Don't rely on the PageError flag to tell us if one of the partial reads of
the page failed. Instead, replace that with a dedicated flag in the
struct nfs_page.

Then clean out redundant uses of the PageError flag: the VM no longer
checks it for reads.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-19 13:58:38 -07:00
Trond Myklebust
fbb5a9abf0 NFS: Get rid of unnecessary calls to ClearPageError() in read code
The generic file read code does that for us anyway.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-19 13:58:37 -07:00
Trond Myklebust
d00c5d4386 NFS: Get rid of nfs_restart_rpc()
It can trivially be replaced with rpc_restart_call_prepare.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-19 13:58:30 -07:00
Trond Myklebust
08ef7bd3bc NFSv4: Translate NFS4ERR_BADNAME into ENOENT when applied to a lookup
Both LOOKUP and OPEN operations may return NFS4ERR_BADNAME if we send a
an invalid name as a filename argument. As far as the application is
concerned, it just has to know that the file doesn't exist, and so
ENOENT would be the appropriate reply. We should only return EINVAL
if the filename is being used to _create_ a new object on the
remote filesystem.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 16:13:51 -07:00
Trond Myklebust
0c2e53f11a NFS: Remove the unused "lookupfh()" version of nfs4_proc_lookup()
...and also remove the associated nfs_v4_clientops entry.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 16:13:51 -07:00
Trond Myklebust
a9a4a87a59 NFS: Use the inode->i_version to cache NFSv4 change attribute information
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:14:34 -07:00
H Hartley Sweeten
45402c38ee nfs/super.c: local functions should be static
commit ae50c0b5 "pnfs: client stats" added additional information to
the output of /proc/self/mountstats. The new functions introduced are
only used in this file and should be marked static.

If CONFIG_NFS_V4_1 is not defined, empty stub functions are used.  If
CONFIG_NFS_V4 is not defined these stub functions are not used at all.
Adding static for the functions results in compile warnings:

fs/nfs/super.c:743: warning: 'show_sessions' defined but not used
fs/nfs/super.c:756: warning: 'show_pnfs' defined but not used

Fix this by adding a #ifdef CONFIG_NFS_V4 guard around the two
show_ functions.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:15 -07:00
Peng Tao
7542274519 pnfsblock: fix writeback deadlock
We should check if the sector is already initialized before
trying to grab the page from page cache. Otherwise when two
pages of the same block are written back by two threads each
calling from writepage_locked, it can cause deadlock like bellow.

 [ 1080.972099] INFO: task kswapd0:25 blocked for more than 120 seconds.
 [ 1080.972377] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 [ 1080.972812] kswapd0         D ffff88000c4926c0     0    25      2 0x00000000
 [ 1080.972816]  ffff88000df276b0 0000000000000046 ffff88000df27640 ffffffff81013ba7
 [ 1080.972821]  ffff88000c492310 ffff88000df27fd8 ffff88000df27fd8 00000000001d3440
 [ 1080.972824]  ffff88000c378000 ffff88000c492310 ffff8800175d3d40 ffff880017fc75a8
 [ 1080.972828] Call Trace:
 [ 1080.972860]  [<ffffffff81013ba7>] ? read_tsc+0x9/0x19
 [ 1080.972877]  [<ffffffff810e0b23>] ? lock_page+0x2b/0x2b
 [ 1080.972899]  [<ffffffff81475a1d>] io_schedule+0x63/0x7e
 [ 1080.972902]  [<ffffffff810e0b31>] sleep_on_page+0xe/0x12
 [ 1080.972905]  [<ffffffff81475fe8>] __wait_on_bit_lock+0x46/0x8f
 [ 1080.972916]  [<ffffffff810822d7>] ? lock_release_holdtime.part.7+0x6b/0x72
 [ 1080.972919]  [<ffffffff810e0af6>] __lock_page+0x66/0x68
 [ 1080.972928]  [<ffffffff81072705>] ? autoremove_wake_function+0x3d/0x3d
 [ 1080.972932]  [<ffffffff810e0b1f>] lock_page+0x27/0x2b
 [ 1080.972934]  [<ffffffff810e0bcf>] find_lock_page+0x34/0x57
 [ 1080.972937]  [<ffffffff810e1738>] find_or_create_page+0x34/0x8a
 [ 1080.972947]  [<ffffffffa034245b>] bl_write_pagelist+0x205/0x6da [blocklayoutdriver]
 [ 1080.972951]  [<ffffffffa034145d>] ? bl_free_lseg+0x38/0x38 [blocklayoutdriver]
 [ 1080.972995]  [<ffffffffa02e27b9>] ? nfs_write_rpcsetup+0x118/0x123 [nfs]
 [ 1080.973033]  [<ffffffffa030246b>] pnfs_generic_pg_writepages+0x10b/0x1f4 [nfs]
 [ 1080.973089]  [<ffffffffa02deaae>] nfs_pageio_doio+0x1a/0x43 [nfs]
 [ 1080.973098]  [<ffffffffa02df035>] nfs_pageio_complete+0x16/0x2d [nfs]
 [ 1080.973108]  [<ffffffffa02e2d8f>] nfs_writepage_locked+0xa0/0xbf [nfs]
 [ 1080.973119]  [<ffffffffa02e36a1>] nfs_writepage+0x16/0x2b [nfs]
 [ 1080.973122]  [<ffffffff810e8762>] ? clear_page_dirty_for_io+0x87/0x9a
 [ 1080.973133]  [<ffffffff810efc5b>] shrink_page_list+0x39b/0x6c8
 [ 1080.973139]  [<ffffffff810f03bb>] shrink_inactive_list+0x22c/0x39e
 [ 1080.973144]  [<ffffffff810822d7>] ? lock_release_holdtime.part.7+0x6b/0x72
 [ 1080.973148]  [<ffffffff810f0c33>] shrink_zone+0x445/0x588
 [ 1080.973152]  [<ffffffff810f1a11>] balance_pgdat+0x2c2/0x56b
 [ 1080.973170]  [<ffffffff81254208>] ? __bitmap_weight+0x34/0x80
 [ 1080.973175]  [<ffffffff810f1f78>] kswapd+0x2be/0x2fa
 [ 1080.973179]  [<ffffffff810726c8>] ? __init_waitqueue_head+0x4b/0x4b
 [ 1080.973183]  [<ffffffff810f1cba>] ? balance_pgdat+0x56b/0x56b
 [ 1080.973187]  [<ffffffff81071f69>] kthread+0xa8/0xb0
 [ 1080.973200]  [<ffffffff814806b4>] kernel_thread_helper+0x4/0x10
 [ 1080.973205]  [<ffffffff81071ec1>] ? __init_kthread_worker+0x5a/0x5a
 [ 1080.973210]  [<ffffffff814806b0>] ? gs_change+0x13/0x13
 [ 1080.973213] no locks held by kswapd0/25.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:15 -07:00
Peng Tao
e6d05a757c pnfsblock: fix NULL pointer dereference
bl_add_page_to_bio returns error pointer. bio should be reset to
NULL in failure cases as the out path always calls bl_submit_bio.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:14 -07:00
Peng Tao
9b7eecdcfe pnfs: recoalesce when ld read pagelist fails
For pnfs pagelist read failure, we need to pg_recoalesce and resend IO to
mds.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:14 -07:00
Peng Tao
8ce160c5ef pnfs: recoalesce when ld write pagelist fails
For pnfs pagelist write failure, we need to pg_recoalesce and resend IO to
mds.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:13 -07:00
Peng Tao
1b0ae06877 pnfs: make _set_lo_fail generic
file layout and block layout both use it to set mark layout io failure
bit. So make it generic.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:13 -07:00
Peng Tao
760383f1ee pnfsblock: add missing rpc_put_mount and path_put
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:12 -07:00
Peng Tao
c1225158a8 SUNRPC/NFS: make rpc pipe upcall generic
The same function is used by idmap, gss and blocklayout code. Make it
generic.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:12 -07:00
Jim Rees
fdc17abbc4 pnfsblock: fix size of upcall message
Make the status field explicitly 32 bits.  "...it's unlikely that the kernel
and userspace would differ on the size of an int here, but it might be a
good idea to go ahead and make that explicitly 32 bits in case we end up
dealing with more exotic arches at some point in the future."

Suggested-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:11 -07:00
Jim Rees
516f2e24fa pnfsblock: fix return code confusion
Always return PTR_ERR, not NULL, from nfs4_blk_get_deviceinfo and
nfs4_blk_decode_device.

Check for IS_ERR, not NULL, in bl_set_layoutdriver when calling
nfs4_blk_get_deviceinfo.

Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@kernel.org [3.0]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:11 -07:00
Jeff Layton
2da9565235 nfs: don't try to migrate pages with active requests
nfs_find_and_lock_request will take a reference to the nfs_page and
will then put it if the req is already locked. It's possible though
that the reference will be the last one. That put then can kick off
a whole series of reference puts:

nfs_page
   nfs_open_context
      dentry
          inode

If the inode ends up being deleted, then the VFS will call
truncate_inode_pages. That function will try to take the page lock, but
it was already locked when migrate_page was called. The code
deadlocks.

Fix this by simply refusing the migration request if PagePrivate is
already set, indicating that the page is already associated with an
active read or write request.

We've had a customer test a backported version of this patch and
the preliminary results seem good.

Cc: stable@kernel.org
Cc: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:11 -07:00
Mi Jinlong
b9dd3abbbc nfs: fix bug about IPv6 address scope checking
The result from ipv6_addr_scope() always not be a single SCOPE,
so we can't use equal to compare the result with IPV6_ADDR_SCOPE_LINKLOCAL
at nfs_sockaddr_match_ipaddr6.

This patch fixs the problem, and lets checking address before scope_id.

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:10 -07:00
Jeff Layton
3236c3e1ad nfs: don't redirty inode when ncommit == 0 in nfs_commit_unstable_pages
commit 420e3646 allowed the kernel to reduce the number of unnecessary
commit calls by skipping the commit when there are a large number of
outstanding pages.

However, the current test in nfs_commit_unstable_pages does not handle
the edge condition properly. When ncommit == 0, then that means that the
kernel doesn't need to do anything more for the inode. The current test
though in the WB_SYNC_NONE case will return true, and the inode will end
up being marked dirty. Once that happens the inode will never be clean
until there's a WB_SYNC_ALL flush.

Fix this by immediately returning from nfs_commit_unstable_pages when
ncommit == 0.

Mike noticed this problem initially in RHEL5 (2.6.18-based kernel) which
has a backported version of 420e3646. The inode cache there was growing
very large. The inode cache was unable to be shrunk since the inodes
were all marked dirty. Calling sync() would essentially "fix" the
problem -- the WB_SYNC_ALL flush would result in the inodes all being
marked clean.

What I'm not clear on is how big a problem this is in mainline kernels
as the writeback code there is very different. Either way, it seems
incorrect to re-mark the inode dirty in this case.

Reported-by: Mike McLean <mikem@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@kernel.org [2.6.34+]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18 09:08:10 -07:00
Trond Myklebust
59b7c05fff Revert "NFS: Ensure that writeback_single_inode() calls write_inode() when syncing"
This reverts commit b80c3cb628.

The reverted commit was rendered obsolete by a VFS fix: commit
5547e8aac6 (writeback: Update dirty flags in
two steps). We now no longer need to worry about writeback_single_inode()
missing our marking the inode for COMMIT in 'do_writepages()' call.

Reverting this patch, fixes a performance regression in which the inode
would continuously get queued to the dirty list, causing the writeback
code to unnecessarily try to send a COMMIT.

Signed-off-by: Trond Myklebust <Trond.Myklebust>
Tested-by: Simon Kirby <sim@hostway.ca>
Cc: stable@kernel.org [2.6.35+]
2011-10-18 09:08:09 -07:00
Mi Jinlong
5703728ac1 nfs: fix bug about IPv6 address scope checking
The result from ipv6_addr_scope() is a set of flags, not a single value,
so we can't just compare the result with  IPV6_ADDR_SCOPE_LINKLOCAL.

This patch fixs the problem, and checks for unequal addresses before
scope_id.

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-12 10:30:29 -04:00
Trond Myklebust
815d405cef VFS: Fix the remaining automounter semantics regressions
The concensus seems to be that system calls such as stat() etc should
not trigger an automount.  Neither should the l* versions.

This patch therefore adds a LOOKUP_AUTOMOUNT flag to tag those lookups
that _should_ trigger an automount on the last path element.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ Edited to leave out the cases that are already covered by LOOKUP_OPEN,
  LOOKUP_DIRECTORY and LOOKUP_CREATE - all of which also fundamentally
  force automounting for their own reasons   - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-26 19:16:46 -07:00
Sachin Prabhu
fb2088ccc1 nfs: Do not allow multiple mounts on same mountpoint when using -o noac
Do not allow multiple mounts on same mountpoint when using -o noac

When you normally attempt to mount a share twice on the same mountpoint,
a check in do_add_mount causes it to return an error

# mount localhost:/nfsv3 /mnt
# mount localhost:/nfsv3 /mnt
mount.nfs: /mnt is already mounted or busy

However when using the option 'noac', the user is able to mount the same
share on the same mountpoint multiple times. This happens because a
share mounted with the noac option is automatically assigned the 'sync'
flag MS_SYNCHRONOUS in nfs_initialise_sb(). This flag is set after the
check for already existing superblocks is done in sget(). The check for
the mount flags in nfs_compare_mount_options() does not take into
account the 'sync' flag applied later on in the code path. This means
that when using 'noac', a new superblock structure is assigned for every
new mount of the same share and multiple shares on the same mountpoint
are allowed.

ie.
# mount -onoac localhost:/nfsv3 /mnt
can be run multiple times.

The patch checks for noac and assigns the sync flag before sget() is
called to obtain an already existing superblock structure.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-09-13 17:10:15 -04:00
Trond Myklebust
f13c3620a4 NFS: Fix a typo in nfs_flush_multi
Fix a typo which causes an Oops in the RPC layer, when using wsize < 4k.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Sricharan R <r.sricharan@ti.com>
2011-09-13 17:06:57 -04:00
J. Bruce Fields
a9004abc34 nfsd4: cleanup and consolidate seqid_mutating_err
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-27 14:21:26 -04:00
Trond Myklebust
042b60beb4 NFSv4: renewd needs to be able to handle the NFS4ERR_CB_PATH_DOWN error
The NFSv4 spec does not specify that the server must repeat that error,
so in order to avoid having the delegations revoked, we should handle
it immediately.

Also note that NFS4ERR_CB_PATH_DOWN does in fact renew the lease...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-24 15:07:37 -04:00
Trond Myklebust
2f60ea6b8c NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegation
RFC3530 states that if the client holds a delegation, then it is obliged
to continue to send RENEW calls once every lease period in order to allow
the server to return NFS4ERR_CB_PATH_DOWN if the callback path is
unreachable.

This is not required for NFSv4.1, since the server can at any time set
the SEQ4_STATUS_CB_PATH_DOWN_SESSION in any SEQUENCE operation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-24 15:07:37 -04:00
Trond Myklebust
8534d4ec05 NFSv4: nfs4_proc_renew should be declared static
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-24 15:07:37 -04:00
Trond Myklebust
b569ad3492 NFSv4: nfs4_proc_async_renew should use a GFP_NOFS allocation
We shouldn't allow the renew daemon to do direct reclaim on the NFS
partition.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-24 15:07:35 -04:00
Eric Dumazet
11fd165c68 sunrpc: use better NUMA affinities
Use NUMA aware allocations to reduce latencies and increase throughput.

sunrpc kthreads can use kthread_create_on_node() if pool_mode is
"percpu" or "pernode", and svc_prepare_thread()/svc_init_buffer() can
also take into account NUMA node affinity for memory allocations.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: "J. Bruce Fields" <bfields@fieldses.org>
CC: Neil Brown <neilb@suse.de>
CC: David Miller <davem@davemloft.net>
Reviewed-by: Greg Banks <gnb@fastmail.fm>
[bfields@redhat.com: fix up caller nfs41_callback_up]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-08-19 13:25:36 -04:00
Linus Torvalds
35a21b4299 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4.1: Return NFS4ERR_BADSESSION to callbacks during session resets
  NFSv4.1: Fix the callback 'highest_used_slotid' behaviour
  pnfs-obj: Fix the comp_index != 0 case
  pnfs-obj: Bug when we are running out of bio
  nfs: add missing prefetch.h include
2011-08-18 22:47:13 -07:00
Boaz Harrosh
8cf1fb2163 pnfs: Automatically select blocks & objects layouts
Just like files-layout, blocks & objects layouts are part of the
NFS 4.1 protocol and should be automatically selected if NFS_4_1
is selected. The small problem is that these depend on other
Kernel support being present, while files only depends on NFS
itself.

This patch removes from the user choice the presence of objects
and blocks layout. But makes sure these are selected only if
the depended subsystems are present in the Kernel.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Acked-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11 17:51:27 -07:00
Peng Tao
54a33b190a NFS41: make PNFS_BLOCK selectable
PNFS_BLOCK needs BLK_DEV_DM/MD, which is not a dependency for other
pnfs layout drivers. Seperate it out so others can still build when
BLK_DEV_DM/MD is not enabled.

Also change select to depends on to avoid build failures.

Reported-and-tested-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Acked-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11 08:58:02 -07:00
Trond Myklebust
910ac68a2b NFSv4.1: Return NFS4ERR_BADSESSION to callbacks during session resets
If the client is in the process of resetting the session when it receives
a callback, then returning NFS4ERR_DELAY may cause a deadlock with the
DESTROY_SESSION call.

Basically, if the client returns NFS4ERR_DELAY in response to the
CB_SEQUENCE call, then the server is entitled to believe that the
client is busy because it is already processing that call. In that
case, the server is perfectly entitled to respond with a
NFS4ERR_BACK_CHAN_BUSY to any DESTROY_SESSION call.

Fix this by having the client reply with a NFS4ERR_BADSESSION in
response to the callback if it is resetting the session.

Cc: stable@kernel.org [2.6.38+]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-04 11:55:35 -04:00
Trond Myklebust
55a673990e NFSv4.1: Fix the callback 'highest_used_slotid' behaviour
Currently, there is no guarantee that we will call nfs4_cb_take_slot() even
though nfs4_callback_compound() will consistently call
nfs4_cb_free_slot() provided the cb_process_state has set the 'clp' field.
The result is that we can trigger the BUG_ON() upon the next call to
nfs4_cb_take_slot().

This patch fixes the above problem by using the slot id that was taken in
the CB_SEQUENCE operation as a flag for whether or not we need to call
nfs4_cb_free_slot().
It also fixes an atomicity problem: we need to set tbl->highest_used_slotid
atomically with the check for NFS4_SESSION_DRAINING, otherwise we end up
racing with the various tests in nfs4_begin_drain_session().

Cc: stable@kernel.org [2.6.38+]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-04 11:55:35 -04:00
Boaz Harrosh
9af7db3228 pnfs-obj: Fix the comp_index != 0 case
There were bugs in the case of partial layout where olo_comp_index
is not zero. This used to work and was tested but one of the later
cleanup SQUASHMEs broke it and was not tested since.

Also add a dprint that specify those received layout parameters.
Everything else was already printed.

[Needed in v3.0]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-04 11:54:48 -04:00
Boaz Harrosh
20618b21da pnfs-obj: Bug when we are running out of bio
When we have a situation that the number of pages we want
to encode is bigger then the size of the bio. (Which can
currently happen only when all IO is going to a single device
.e.g group_width==1) then the IO is submitted short and we
report back only the amount of bytes we actually wrote/read
and all is fine. BUT ...

There was a bug that the current length counter was advanced
before the fail to add the extra page, and we come to a situation
that the CDB length was one-page longer then the actual bio size,
which is of course rejected by the osd-target.

While here also fix the bio size calculation, in the case
that we received more then one group of devices.

CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-04 11:54:38 -04:00
Heiko Carstens
88c9e42196 nfs: add missing prefetch.h include
Fix this compile error on s390:

  CC [M]  fs/nfs/blocklayout/blocklayout.o
fs/nfs/blocklayout/blocklayout.c: In function 'bl_end_io_read':
fs/nfs/blocklayout/blocklayout.c:201:4: error: implicit declaration of function 'prefetchw'

Introduced with 9549ec01 "pnfsblock: bl_read_pagelist".

Cc: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-08-04 11:54:25 -04:00
Linus Torvalds
1b8e94993c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  xfs: Fix build breakage in xfs_iops.c when CONFIG_FS_POSIX_ACL is not set
  VFS: Reorganise shrink_dcache_for_umount_subtree() after demise of dcache_lock
  VFS: Remove dentry->d_lock locking from shrink_dcache_for_umount_subtree()
  VFS: Remove detached-dentry counter from shrink_dcache_for_umount_subtree()
  switch posix_acl_chmod() to umode_t
  switch posix_acl_from_mode() to umode_t
  switch posix_acl_equiv_mode() to umode_t *
  switch posix_acl_create() to umode_t *
  block: initialise bd_super in bdget()
  vfs: avoid call to inode_lru_list_del() if possible
  vfs: avoid taking inode_hash_lock on pipes and sockets
  vfs: conditionally call inode_wb_list_del()
  VFS: Fix automount for negative autofs dentries
  Btrfs: load the key from the dir item in readdir into a fake dentry
  devtmpfs: missing initialialization in never-hit case
  hppfs: missing include
2011-08-01 13:48:31 -10:00
Al Viro
d3fb612076 switch posix_acl_create() to umode_t *
so we can pass &inode->i_mode to it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-08-01 02:09:42 -04:00
Trond Myklebust
a00ed25cce NFS: Re-enable compilation of nfs with !CONFIG_NFS_V4 || !CONFIG_NFS_V4_1
Fix two recently introduced compile problems:

Fix a typo in fs/nfs/pnfs.h

Move the pnfs_blksize declaration outside the CONFIG_NFS_V4 section in
struct nfs_server.

Reported-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-31 14:27:04 -10:00
Linus Torvalds
24c3047095 Merge branch 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
* 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits)
  pnfsblock: write_pagelist handle zero invalid extents
  pnfsblock: note written INVAL areas for layoutcommit
  pnfsblock: bl_write_pagelist
  pnfsblock: bl_read_pagelist
  pnfsblock: cleanup_layoutcommit
  pnfsblock: encode_layoutcommit
  pnfsblock: merge rw extents
  pnfsblock: add extent manipulation functions
  pnfsblock: bl_find_get_extent
  pnfsblock: xdr decode pnfs_block_layout4
  pnfsblock: call and parse getdevicelist
  pnfsblock: merge extents
  pnfsblock: lseg alloc and free
  pnfsblock: remove device operations
  pnfsblock: add device operations
  pnfsblock: basic extent code
  pnfsblock: use pageio_ops api
  pnfsblock: add blocklayout Kconfig option, Makefile, and stubs
  pnfs: cleanup_layoutcommit
  pnfs: ask for layout_blksize and save it in nfs_server
  ...
2011-07-31 06:26:50 -10:00
Peng Tao
71cdd40fd4 pnfsblock: write_pagelist handle zero invalid extents
For invalid extents, find other pages in the same fsblock and write them out.

[pnfsblock: write_begin]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:17 -04:00
Fred Isaman
31e6306a40 pnfsblock: note written INVAL areas for layoutcommit
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:17 -04:00
Fred Isaman
650e2d39bd pnfsblock: bl_write_pagelist
Note: When upper layer's read/write request cannot be fulfilled, the block
layout driver shouldn't silently mark the page as error. It should do
what can be done and  leave the rest to the upper layer. To do so, we
should set rdata/wdata->res.count properly.

When upper layer re-send the read/write request to finish the rest
part of the request, pgbase is the position where we should start at.

[pnfsblock: bl_write_pagelist support functions]
[pnfsblock: bl_write_pagelist adjust for missing PG_USE_PNFS]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: handle errors when read or write pagelist.]
Signed-off-by: Zhang Jingwang <yyalone@gmail.com>
[pnfs-block: use new write_pagelist api]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>

[SQUASHME: pnfsblock: mds_offset is set in the generic layer]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>

[pnfsblock: mark IO error with NFS_LAYOUT_{RW|RO}_FAILED]
Signed-off-by: Peng Tao <peng_tao@emc.com>
[pnfsblock: SQUASHME: adjust to API change]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: fixup blksize alignment in bl_setup_layoutcommit]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[pnfsblock: bl_write_pagelist adjust for missing PG_USE_PNFS]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: handle errors when read or write pagelist.]
Signed-off-by: Zhang Jingwang <yyalone@gmail.com>
[pnfs-block: use new write_pagelist api]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:17 -04:00
Fred Isaman
9549ec01b0 pnfsblock: bl_read_pagelist
Note: When upper layer's read/write request cannot be fulfilled, the block
layout driver shouldn't silently mark the page as error. It should do
what can be done and  leave the rest to the upper layer. To do so, we
should set rdata/wdata->res.count properly.

When upper layer re-send the read/write request to finish the rest
part of the request, pgbase is the position where we should start at.

[pnfsblock: mark IO error with NFS_LAYOUT_{RW|RO}_FAILED]
Signed-off-by: Peng Tao <peng_tao@emc.com>
[pnfsblock: read path error handling]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: handle errors when read or write pagelist.]
Signed-off-by: Zhang Jingwang <yyalone@gmail.com>
[pnfs-block: use new read_pagelist api]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:17 -04:00
Fred Isaman
b2be7811dd pnfsblock: cleanup_layoutcommit
In blocklayout driver. There are two things happening
while layoutcommit/cleanup.
1. the modified extents are encoded.
2. On cleanup the extents are put back on the layout rw
   extents list, for reads.

In the new system where actual xdr encoding is done in
encode_layoutcommit() directly into xdr buffer, these are
the new commit stages:

1. On setup_layoutcommit, the range is adjusted as before
   and a structure is allocated for communication with
   bl_encode_layoutcommit && bl_cleanup_layoutcommit
   (Generic layer provides a void-star to hang it on)

2. bl_encode_layoutcommit is called to do the actual
   encoding directly into xdr. The commit-extent-list is not
   freed and is stored on above structure.
   FIXME: The code is not yet converted to the new XDR cleanup

3. On cleanup the commit-extent-list is put back by a call
   to set_to_rw() as before, but with no need for XDR decoding
   of the list as before. And the commit-extent-list is freed.
   Finally allocated structure is freed.

[rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()]
Signed-off-by: Jim Rees <rees@umich.edu>
[pnfsblock: introduce bl_committing list]
Signed-off-by: Peng Tao <peng_tao@emc.com>
[pnfsblock: SQUASHME: adjust to API change]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[blocklayout: encode_layoutcommit implementation]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[pnfsblock: fix bug setting up layoutcommit.]
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
[pnfsblock: cleanup_layoutcommit wants a status parameter]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:17 -04:00
Fred Isaman
90ace12ac4 pnfsblock: encode_layoutcommit
In blocklayout driver. There are two things happening
while layoutcommit/cleanup.
1. the modified extents are encoded.
2. On cleanup the extents are put back on the layout rw
   extents list, for reads.

In the new system where actual xdr encoding is done in
encode_layoutcommit() directly into xdr buffer, these are
the new commit stages:

1. On setup_layoutcommit, the range is adjusted as before
   and a structure is allocated for communication with
   bl_encode_layoutcommit && bl_cleanup_layoutcommit
   (Generic layer provides a void-star to hang it on)

2. bl_encode_layoutcommit is called to do the actual
   encoding directly into xdr. The commit-extent-list is not
   freed and is stored on above structure.
   FIXME: The code is not yet converted to the new XDR cleanup

3. On cleanup the commit-extent-list is put back by a call
   to set_to_rw() as before, but with no need for XDR decoding
   of the list as before. And the commit-extent-list is freed.
   Finally allocated structure is freed.

[rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()]
[pnfsblock: get rid of deprecated xdr macros]
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[blocklayout: encode_layoutcommit implementation]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[pnfsblock: fix bug setting up layoutcommit.]
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
[pnfsblock: prevent commit list corruption]
[pnfsblock: fix layoutcommit with an empty opaque]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:17 -04:00
Fred Isaman
9f3770422c pnfsblock: merge rw extents
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:17 -04:00
Fred Isaman
c1c2a4cd35 pnfsblock: add extent manipulation functions
Adds working implementations of various support functions
to handle INVAL extents, needed by writes, such as
bl_mark_sectors_init and bl_is_sector_init.

[pnfsblock: fix 64-bit compiler warnings for extent manipulation]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[Implement release_inval_marks]
Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:17 -04:00
Fred Isaman
6d742ba538 pnfsblock: bl_find_get_extent
Implement bl_find_get_extent(), one of the core extent manipulation
routines.

[pnfsblock: Lookup list entry of layouts and tags in reverse order]
Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Jim Rees <rees@umich.edu>

pnfsblock: fix print format warnings for sector_t and size_t

gcc spews warnings about these on x86_64, e.g.:
fs/nfs/blocklayout/blocklayout.c:74: warning: format ‘%Lu’ expects type ‘long long unsigned int’, but argument 2 has type ‘sector_t’
fs/nfs/blocklayout/blocklayout.c:388: warning: format ‘%d’ expects type ‘int’, but argument 5 has type ‘size_t’

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:16 -04:00
Fred Isaman
e9437ccef9 pnfsblock: xdr decode pnfs_block_layout4
XDR decodes the block layout payload sent in LAYOUTGET result, storing
the result in an extent list.

[pnfsblock: get rid of deprecated xdr macros]
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: fix bug getting pnfs_layout_type in translate_devid().]
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:16 -04:00
Fred Isaman
2f9fd18260 pnfsblock: call and parse getdevicelist
Call GETDEVICELIST during mount, then call and parse GETDEVICEINFO
for each device returned.

[pnfsblock: get rid of deprecated xdr macros]
Signed-off-by: Jim Rees <rees@umich.edu>
[pnfsblock: fix pnfs_deviceid references]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: fix print format warnings for sector_t and size_t]
[pnfs-block: #include <linux/vmalloc.h>]
[pnfsblock: no PNFS_NFS_SERVER]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfsblock: fix bug determining size of striped volume]
[pnfsblock: fix oops when using multiple devices]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[pnfsblock: get rid of vmap and deviceid->area structure]
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:16 -04:00
Fred Isaman
03341d2cc9 pnfsblock: merge extents
Replace a stub, so that extents underlying the layouts are properly
added, merged, or ignored as necessary.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: delete the new node before put it]
Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:16 -04:00
Fred Isaman
a60d2ebd93 pnfsblock: lseg alloc and free
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfsblock: fix bug getting pnfs_layout_type in translate_devid().]
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Zhang Jingwang <Jingwang.Zhang@emc.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:16 -04:00
Jim Rees
025a70ed65 pnfsblock: remove device operations
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[upcall bugfixes]
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:16 -04:00
Jim Rees
fe0a9b7408 pnfsblock: add device operations
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[upcall bugfixes]
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:16 -04:00
Fred Isaman
9e69296999 pnfsblock: basic extent code
Adds structures and basic create/delete code for extents.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Zhang Jingwang <Jingwang.Zhang@emc.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:16 -04:00
Benny Halevy
e9643fe80d pnfsblock: use pageio_ops api
[pnfsblock: use pnfs_generic_pg_init_read/write]
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:15 -04:00
Fred Isaman
155e7524f2 pnfsblock: add blocklayout Kconfig option, Makefile, and stubs
Define a configuration variable to enable/disable compilation of the
block driver code.

Add the minimal structure for a pnfs block layout driver, and empty
list-heads that will hold the extent data

[pnfsblock: make NFS_V4_1 select PNFS_BLOCK]
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfs-block: fix CONFIG_PNFS_BLOCK dependencies]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[pnfsblock: SQUASHME: adjust to API change]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfs: move pnfs_layout_type inline in nfs_inode]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[blocklayout: encode_layoutcommit implementation]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[pnfsblock: layout alloc and free]
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
[pnfs: move pnfs_layout_type inline in nfs_inode]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[pnfsblock: define module alias]
Signed-off-by: Peng Tao <peng_tao@emc.com>
[rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()]
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:15 -04:00
Andy Adamson
db29c08909 pnfs: cleanup_layoutcommit
This gives layout driver a chance to cleanup structures they put in at
encode_layoutcommit.

Signed-off-by: Andy Adamson <andros@netapp.com>
[fixup layout header pointer for layoutcommit]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
[rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()]
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:15 -04:00
Fred Isaman
dae100c2b1 pnfs: ask for layout_blksize and save it in nfs_server
Block layout needs it to determine IO size.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Tao Guo <glorioustao@gmail.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:15 -04:00
Benny Halevy
738fd0f360 pnfs: add set-clear layoutdriver interface
To allow layout driver to issue getdevicelist at mount time, and clean up
at umount time.

[fixup non NFS_V4_1 set_pnfs_layoutdriver definition]
[pnfs: pass mntfh down the init_pnfs path]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:15 -04:00
Andy Adamson
7f11d8d38d pnfs: GETDEVICELIST
The block driver uses GETDEVICELIST

Signed-off-by: Andy Adamson <andros@netapp.com>
[pass struct nfs_server * to getdevicelist]
[get machince creds for getdevicelist]
[fix getdevicelist decode sizing]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:15 -04:00
Peng Tao
3557c6c3be pnfs: use lwb as layoutcommit length
Using NFS4_MAX_UINT64 will break current protocol.

[Needed in v3.0]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:15 -04:00
Peng Tao
a9bae5666d pnfs: let layoutcommit handle a list of lseg
There can be multiple lseg per file, so layoutcommit should be
able to handle it.

[Needed in v3.0]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:15 -04:00
Peng Tao
9fa4075878 pnfs: save layoutcommit cred at layout header init
No need to save it for every lseg.
No need to save it at every pnfs_set_layoutcommit.

[Needed in v3.0]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:14 -04:00
Peng Tao
acff588053 pnfs: save layoutcommit lwb at layout header
No need to save it for every lseg.

[Needed in v3.0]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31 12:18:14 -04:00
Bryan Schumaker
374e4e3ec3 Additional readdir cookie loop information
Print out the name of the file that triggers the cookie loop  message to
make it slightly easier to track down the cause.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-30 14:37:14 -04:00
Trond Myklebust
0c0308066c NFS: Fix spurious readdir cookie loop messages
If the directory contents change, then we have to accept that the
file->f_pos value may shrink if we do a 'search-by-cookie'. In that
case, we should turn off the loop detection and let the NFS client
try to recover.

The patch also fixes a second loop detection bug by ensuring
that after turning on the ctx->duped flag, we read at least one new
cookie into ctx->dir_cookie before attempting to match with
ctx->dup_cookie.

Reported-by: Petr Vandrovec <petr@vandrovec.name>
Cc: stable@kernel.org [2.6.39+]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-30 14:34:50 -04:00
Linus Torvalds
28890d3598 Merge branch 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
* 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (44 commits)
  NFSv4: Don't use the delegation->inode in nfs_mark_return_delegation()
  nfs: don't use d_move in nfs_async_rename_done
  RDMA: Increasing RPCRDMA_MAX_DATA_SEGS
  SUNRPC: Replace xprt->resend and xprt->sending with a priority queue
  SUNRPC: Allow caller of rpc_sleep_on() to select priority levels
  SUNRPC: Support dynamic slot allocation for TCP connections
  SUNRPC: Clean up the slot table allocation
  SUNRPC: Initalise the struct xprt upon allocation
  SUNRPC: Ensure that we grab the XPRT_LOCK before calling xprt_alloc_slot
  pnfs: simplify pnfs files module autoloading
  nfs: document nfsv4 sillyrename issues
  NFS: Convert nfs4_set_ds_client to EXPORT_SYMBOL_GPL
  SUNRPC: Convert the backchannel exports to EXPORT_SYMBOL_GPL
  SUNRPC: sunrpc should not explicitly depend on NFS config options
  NFS: Clean up - simplify the switch to read/write-through-MDS
  NFS: Move the pnfs write code into pnfs.c
  NFS: Move the pnfs read code into pnfs.c
  NFS: Allow the nfs_pageio_descriptor to signal that a re-coalesce is needed
  NFS: Use the nfs_pageio_descriptor->pg_bsize in the read/write request
  NFS: Cache rpc_ops in struct nfs_pageio_descriptor
  ...
2011-07-27 13:23:02 -07:00
Arun Sharma
60063497a9 atomic: use <linux/atomic.h>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:47 -07:00
Linus Torvalds
f01ef569cd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/writeback: (27 commits)
  mm: properly reflect task dirty limits in dirty_exceeded logic
  writeback: don't busy retry writeback on new/freeing inodes
  writeback: scale IO chunk size up to half device bandwidth
  writeback: trace global_dirty_state
  writeback: introduce max-pause and pass-good dirty limits
  writeback: introduce smoothed global dirty limit
  writeback: consolidate variable names in balance_dirty_pages()
  writeback: show bdi write bandwidth in debugfs
  writeback: bdi write bandwidth estimation
  writeback: account per-bdi accumulated written pages
  writeback: make writeback_control.nr_to_write straight
  writeback: skip tmpfs early in balance_dirty_pages_ratelimited_nr()
  writeback: trace event writeback_queue_io
  writeback: trace event writeback_single_inode
  writeback: remove .nonblocking and .encountered_congestion
  writeback: remove writeback_control.more_io
  writeback: skip balance_dirty_pages() for in-memory fs
  writeback: add bdi_dirty_limit() kernel-doc
  writeback: avoid extra sync work at enqueue time
  writeback: elevate queue_io() into wb_writeback()
  ...

Fix up trivial conflicts in fs/fs-writeback.c and mm/filemap.c
2011-07-26 10:39:54 -07:00
Linus Torvalds
d3ec4844d4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
  fs: Merge split strings
  treewide: fix potentially dangerous trailing ';' in #defined values/expressions
  uwb: Fix misspelling of neighbourhood in comment
  net, netfilter: Remove redundant goto in ebt_ulog_packet
  trivial: don't touch files that are removed in the staging tree
  lib/vsprintf: replace link to Draft by final RFC number
  doc: Kconfig: `to be' -> `be'
  doc: Kconfig: Typo: square -> squared
  doc: Konfig: Documentation/power/{pm => apm-acpi}.txt
  drivers/net: static should be at beginning of declaration
  drivers/media: static should be at beginning of declaration
  drivers/i2c: static should be at beginning of declaration
  XTENSA: static should be at beginning of declaration
  SH: static should be at beginning of declaration
  MIPS: static should be at beginning of declaration
  ARM: static should be at beginning of declaration
  rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check
  Update my e-mail address
  PCIe ASPM: forcedly -> forcibly
  gma500: push through device driver tree
  ...

Fix up trivial conflicts:
 - arch/arm/mach-ep93xx/dma-m2p.c (deleted)
 - drivers/gpio/gpio-ep93xx.c (renamed and context nearby)
 - drivers/net/r8169.c (just context changes)
2011-07-25 13:56:39 -07:00
Trond Myklebust
ed1e6211a0 NFSv4: Don't use the delegation->inode in nfs_mark_return_delegation()
nfs_mark_return_delegation() is usually called without any locking, and
so it is not safe to dereference delegation->inode. Since the inode is
only used to discover the nfs_client anyway, it makes more sense to
have the callers pass a valid pointer to the nfs_server as a parameter.

Reported-by: Ian Kent <raven@themaw.net>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-25 15:37:29 -04:00
Jeff Layton
73ca1001ed nfs: don't use d_move in nfs_async_rename_done
If the task that initiated the sillyrename ends up being killed by a
fatal signal, then it will eventually return back to userspace and end
up releasing the i_mutex. d_move however needs to be done while holding
the i_mutex.

Instead of using d_move here, just unhash the old and new dentries to
prevent them from being found by lookups. With this change though, the
dentries are now incorrect post-rename and do not reflect the actual
name of the file on the server. I'm proceeding under the assumption
that since they are unhashed that this isn't really a problem.

In order for the sillydelete to still work though, the dname must be
copied earlier when setting up the sillydelete info, and the name must
be recopied if the sillydelete info has to be moved to a new dentry.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-25 15:00:21 -04:00
Stephen Rothwell
5f00bcb38e Merge branch 'master' into devel and apply fixup from Stephen Rothwell:
vfs/nfs: fixup for nfs_open_context change

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-25 14:53:52 -04:00
Al Viro
826cae2f2b kill boilerplates around posix_acl_create_masq()
new helper: posix_acl_create(&acl, gfp, mode_p).  Replaces acl with
modified clone, on failure releases acl and replaces with NULL.
Returns 0 or -ve on error.  All callers of posix_acl_create_masq()
switched.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-25 14:27:32 -04:00
Josef Bacik
02c24a8218 fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers
Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the ->fsync() handlers.  Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2.  For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:59 -04:00
Josef Bacik
06222e491e fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek
This converts everybody to handle SEEK_HOLE/SEEK_DATA properly.  In some cases
we just return -EINVAL, in others we do the normal generic thing, and in others
we're simply making sure that the properly due-dilligence is done.  For example
in NFS/CIFS we need to make sure the file size is update properly for the
SEEK_HOLE and SEEK_DATA case, but since it calls the generic llseek stuff itself
that is all we have to do.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 20:47:58 -04:00
Al Viro
e0a0124936 switch vfs_path_lookup() to struct path
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:44:14 -04:00
Al Viro
8aeb376ca0 nfs: LOOKUP_{OPEN,CREATE,EXCL} is set only on the last step
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:44:02 -04:00
Al Viro
dd7dd556e4 no need to check for LOOKUP_OPEN in ->create() instances
... it will be set in nd->flag for all cases with non-NULL nd
(i.e. when called from do_last()).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:43:56 -04:00
Al Viro
8a5e929dd2 don't transliterate lower bits of ->intent.open.flags to FMODE_...
->create() instances are much happier that way...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:43:52 -04:00
Al Viro
f7c85868fc fix mknod() on nfs4 (hopefully)
a) check the right flags in ->create() (LOOKUP_OPEN, not LOOKUP_CREATE)
b) default (!LOOKUP_OPEN) open_flags is O_CREAT|O_EXCL|FMODE_READ, not 0
c) lookup_instantiate_filp() should be done only with LOOKUP_OPEN;
otherwise we need to issue CLOSE, lest we leak stateid on server.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:43:46 -04:00
Al Viro
511415980a nameidata_to_nfs_open_context() doesn't need nameidata, actually...
just open flags; switched to passing just those and
renamed to create_nfs_open_context()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:43:45 -04:00
Al Viro
3d4ff43d89 nfs_open_context doesn't need struct path either
just dentry, please...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:43:44 -04:00
Al Viro
82a2c1b77a nfs4_opendata doesn't need struct path either
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:43:42 -04:00
Al Viro
643168c2dc nfs4_closedata doesn't need to mess with struct path
instead of path_get()/path_put(), we can just use nfs_sb_{,de}active()
to pin the superblock down.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:43:41 -04:00
Al Viro
10556cb21a ->permission() sanitizing: don't pass flags to ->permission()
not used by the instances anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:43:24 -04:00
Al Viro
2830ba7f34 ->permission() sanitizing: don't pass flags to generic_permission()
redundant; all callers get it duplicated in mask & MAY_NOT_BLOCK and none of
them removes that bit.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:43:22 -04:00
Al Viro
178ea73521 kill check_acl callback of generic_permission()
its value depends only on inode and does not change; we might as
well store it in ->i_op->check_acl and be done with that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-07-20 01:43:16 -04:00
J. Bruce Fields
f85ef69ce0 pnfs: simplify pnfs files module autoloading
Embed the necessary alias into the module rather than waiting for
someone to add it to /etc/modprobe.conf

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15 19:21:58 -04:00
J. Bruce Fields
674e405b8b nfs: document nfsv4 sillyrename issues
Somebody working on this code asked what the deal was with NFSv4, since
this comment notes that it's v2/v3's statelessness that requires
sillyrename.  Shouldn't hurt to document the answer.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15 19:14:00 -04:00
Trond Myklebust
94b134ac8e NFS: Convert nfs4_set_ds_client to EXPORT_SYMBOL_GPL
This is not part of an external ABI...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15 09:12:24 -04:00
Trond Myklebust
9e00abc3c2 SUNRPC: sunrpc should not explicitly depend on NFS config options
Change explicit references to CONFIG_NFS_V4_1 to implicit ones
Get rid of the unnecessary defines in backchannel_rqst.c and
bc_svc.c: the Makefile takes care of those dependency.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15 09:12:23 -04:00
Trond Myklebust
1f9453578f NFS: Clean up - simplify the switch to read/write-through-MDS
Use nfs_pageio_reset_read_mds and nfs_pageio_reset_write_mds instead of
completely reinitialising the struct nfs_pageio_descriptor.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15 09:12:22 -04:00
Trond Myklebust
dce81290ee NFS: Move the pnfs write code into pnfs.c
...and ensure that we recoalese to take into account differences in
differences in block sizes when falling back to write through the MDS.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15 09:12:22 -04:00
Trond Myklebust
493292ddc7 NFS: Move the pnfs read code into pnfs.c
...and ensure that we recoalese to take into account differences in
block sizes when falling back to read through the MDS.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15 09:12:21 -04:00
Trond Myklebust
d9156f9f36 NFS: Allow the nfs_pageio_descriptor to signal that a re-coalesce is needed
If an attempt to do pNFS fails, and we have to fall back to writing through
the MDS, then we may want to re-coalesce the requests that we already have
since the block size for the MDS read/writes may be different to that of
the DS read/writes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15 09:12:21 -04:00
Trond Myklebust
d097971d8a NFS: Use the nfs_pageio_descriptor->pg_bsize in the read/write request
Instead of looking up the rsize and wsize, the routines that generate the
RPC requests should really be using the pg_bsize, since that is what we
use when deciding whether or not to coalesce write requests...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15 09:12:20 -04:00
Trond Myklebust
50828d7e67 NFS: Cache rpc_ops in struct nfs_pageio_descriptor
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15 09:12:20 -04:00
Trond Myklebust
275acaafd4 NFS: Clean up: split out the RPC transmission from nfs_pagein_multi/one
...and do the same for nfs_flush_multi/one.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15 09:12:17 -04:00
Peng Tao
3b6091846d NFS: fix return value of nfs_pagein_one/nfs_flush_one
Signed-off-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15 09:11:28 -04:00
Linus Torvalds
201f92e2ca Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  SUNRPC: Fix use of static variable in rpcb_getport_async
  NFSv4.1: update nfs4_fattr_bitmap_maxsz
  SUNRPC: Fix a race between work-queue and rpc_killall_tasks
  pnfs: write: Set mds_offset in the generic layer - it is needed by all LDs
2011-07-13 14:34:08 -07:00
Trond Myklebust
6e4efd5685 NFS: Clean up nfs_read_rpcsetup and nfs_write_rpcsetup
Split them up into two parts: one which sets up the struct nfs_read/write_data,
the other which sets up the actual RPC call or pNFS call.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12 13:42:02 -04:00
Trond Myklebust
87ed5eb44a NFS: Don't use DATA_SYNC writes
If we're writing back data, and the FLUSH_STABLE flag is set, then we
always want to use NFS_FILE_SYNC, since we're always in a situation where
we're doing page reclaim, and so we want to free up the page as quickly
as possible.

If we're in the FLUSH_COND_STABLE case, then we either want to use another
unstable write (if we have to do a commit anyway) or again, we want to
use NFS_FILE_SYNC because we know that we have no more pages to write
out.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12 13:42:01 -04:00
Andy Adamson
c47abcf8ff NFSv4.1: do not use deviceids after MDS clientid invalidation
Mark all deviceids established under an expired MDS clientid as invalid.
Stop all new i/o through DS and send through the MDS.
Don't use any new LAYOUTGETs that use the invalid deviceid. Purge all layouts
established under the expired MDS clientid.
Remove the MDS clientid deviceid and data servers reference

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12 13:40:29 -04:00
Trond Myklebust
a56aaa02b1 NFSv4.1: Clean up layoutreturn
Since we take a reference to it, we really ought to pass the a pointer to
the layout header in the arguments instead of assuming that
NFS_I(inode)->layout will forever point to the correct object.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12 13:40:29 -04:00
Andy Adamson
7c24d9489f NFSv4.1: File layout only supports whole file layouts
Ask for whole file layouts. Until support for layout segments is fully
supported in the file layout code, discard non-whole file layouts.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12 13:40:28 -04:00
Trond Myklebust
47cb498e93 NFSv4.1: Clean ups for the device id cache
The fact that the global device id cache holds a reference to the
nfs4_deviceid_node until it is invisible to rcu lookups implies that
we can always assume that the reference count is non-zero in
_find_get_deviceid.

Also clean up nfs4_put_deviceid_node and the removal of the device id
from the cache.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12 13:40:28 -04:00
Trond Myklebust
e885de1a5b NFSv4.1: Fall back to ordinary i/o through the mds if we have no layout segment
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12 13:40:28 -04:00
Trond Myklebust
d8007d4dd6 NFSv4.1: Add an initialisation callback for pNFS
Ensure that we always get a layout before setting up the i/o request.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12 13:40:28 -04:00
Trond Myklebust
1751c3638f NFS: Cleanup of the nfs_pageio code in preparation for a pnfs bugfix
We need to ensure that the layouts are set up before we can decide to
coalesce requests. To do so, we want to further split up the struct
nfs_pageio_descriptor operations into an initialisation callback, a
coalescing test callback, and a 'do i/o' callback.

This patch cleans up the existing callback methods before adding the
'initialisation' callback.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12 13:40:28 -04:00
Bryan Schumaker
f062eb6ced NFS: test and free stateids during recovery
When recovering open files and locks, the stateid should be tested
against the server and freed if it is invalid.  This patch adds new
recovery functions for NFS v4.1.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12 13:40:28 -04:00
Bryan Schumaker
9aeda35fd6 NFS: added FREE_STATEID call
FREE_STATEID is used to tell the server that we want to free a stateid
that no longer has any locks associated with it.  This allows the client
to reclaim locks without encountering edge conditions documented in
section 8.4.3 of RFC 5661.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12 13:40:28 -04:00
Bryan Schumaker
7d9747947a NFS: Added TEST_STATEID call
This patch adds in the xdr for doing a TEST_STATEID call with a single
stateid. RFC 5661 allows multiple stateids to be tested in a single
call, but only testing one keeps things simpler for now.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12 13:40:27 -04:00
Bryan Schumaker
fca78d6d2c NFS: Add SECINFO_NO_NAME procedure
If the client is using NFS v4.1, then we can use SECINFO_NO_NAME to find
the secflavor for the initial mount.  If the server doesn't support
SECINFO_NO_NAME then I fall back on the "guess and check" method used
for v4.0 mounts.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12 13:40:27 -04:00
Weston Andros Adamson
6382a44138 NFS: move pnfs layouts to nfs_server structure
Layouts should be tracked per nfs_server (aka superblock)
instead of per struct nfs_client, which may have multiple FSIDs associated
with it.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12 13:40:27 -04:00
Weston Andros Adamson
35dbbc99e9 NFS: fix comment
We support IPv4 and IPv6 now.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12 13:40:27 -04:00
Weston Andros Adamson
78fe0f41d9 NFS: use scope from exchange_id to skip reclaim
can be skipped if the "eir_server_scope" from the exchange_id proc differs from
previous calls.

Also, in the future server_scope will be useful for determining whether client
trunking is available

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12 13:40:27 -04:00
Weston Andros Adamson
7e574f0d39 NFS: pnfs: loop over multipath addrs on connect
Don't just use the first addr in the multipath list - instead, loop
over addresses when calling nfs4_set_ds_client() (which calls connect)
until it is successful.

Although this is not real multipath support, it's a quick fix to handle when
an MDS sends a list of addresses for a DS and some of the addr families are
unsupported or misconfigured (like no routable ipv6 addr assigned).
This will attempt all paths to the DS before giving up, instead of immediately
falling back to the MDS.

As before, an error encountered after a successful connect() will cause all
i/o to fall back to the MDS.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12 13:40:27 -04:00
Weston Andros Adamson
14f9a6076f NFS: Parse and store all multipath DS addresses
This parses and stores all addresses associated with each data server,
laying the groundwork for supporting multipath to data servers.

 - Skips over addresses that cannot be parsed (ie IPv6 addrs if v6 is not
   enabled).  Only fails if none of the addresses are recognizable
 - Currently only uses the first address that parsed cleanly
 - Tested against pynfs server (modified to support multipath)

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12 13:40:27 -04:00
Weston Andros Adamson
c9895cb69b NFS: pnfs IPv6 support
Handle ipv6 remote addresses from GETDEVICEINFO

 - supports netid "tcp" for ipv4 and "tcp6" for ipv6 as rfc 5665 specifies
 - added ds_remotestr to avoid having to handle different AFs in every dprintk
 - tested against pynfs 4.1 server, submitting ipv6 support patch to pynfs
 - tested with IPv6 disabled, it compiles cleanly and relies on rpc_pton to
   refuse to accept IPv6 addresses

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12 13:40:26 -04:00
Andy Adamson
e5012d1f38 NFSv4.1: update nfs4_fattr_bitmap_maxsz
Attribute IDs assigned in RFC 5661 now require three bitmaps.
Fixes hitting a BUG_ON in xdr_shrink_bufhead when getting ACLs.

Signed-off-by: Andy Adamson <andros@netapp.com>
Cc:stable@kernel.org [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-11 19:14:38 -04:00
Jiri Kosina
b7e9c223be Merge branch 'master' into for-next
Sync with Linus' tree to be able to apply pending patches that
are based on newer code already present upstream.
2011-07-11 14:15:55 +02:00
David Howells
c902ce1bfb FS-Cache: Add a helper to bulk uncache pages on an inode
Add an FS-Cache helper to bulk uncache pages on an inode.  This will
only work for the circumstance where the pages in the cache correspond
1:1 with the pages attached to an inode's page cache.

This is required for CIFS and NFS: When disabling inode cookie, we were
returning the cookie and setting cifsi->fscache to NULL but failed to
invalidate any previously mapped pages.  This resulted in "Bad page
state" errors and manifested in other kind of errors when running
fsstress.  Fix it by uncaching mapped pages when we disable the inode
cookie.

This patch should fix the following oops and "Bad page state" errors
seen during fsstress testing.

  ------------[ cut here ]------------
  kernel BUG at fs/cachefiles/namei.c:201!
  invalid opcode: 0000 [#1] SMP
  Pid: 5, comm: kworker/u:0 Not tainted 2.6.38.7-30.fc15.x86_64 #1 Bochs Bochs
  RIP: 0010: cachefiles_walk_to_object+0x436/0x745 [cachefiles]
  RSP: 0018:ffff88002ce6dd00  EFLAGS: 00010282
  RAX: ffff88002ef165f0 RBX: ffff88001811f500 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000282
  RBP: ffff88002ce6dda0 R08: 0000000000000100 R09: ffffffff81b3a300
  R10: 0000ffff00066c0a R11: 0000000000000003 R12: ffff88002ae54840
  R13: ffff88002ae54840 R14: ffff880029c29c00 R15: ffff88001811f4b0
  FS:  00007f394dd32720(0000) GS:ffff88002ef00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00007fffcb62ddf8 CR3: 000000001825f000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kworker/u:0 (pid: 5, threadinfo ffff88002ce6c000, task ffff88002ce55cc0)
  Stack:
   0000000000000246 ffff88002ce55cc0 ffff88002ce6dd58 ffff88001815dc00
   ffff8800185246c0 ffff88001811f618 ffff880029c29d18 ffff88001811f380
   ffff88002ce6dd50 ffffffff814757e4 ffff88002ce6dda0 ffffffff8106ac56
  Call Trace:
   cachefiles_lookup_object+0x78/0xd4 [cachefiles]
   fscache_lookup_object+0x131/0x16d [fscache]
   fscache_object_work_func+0x1bc/0x669 [fscache]
   process_one_work+0x186/0x298
   worker_thread+0xda/0x15d
   kthread+0x84/0x8c
   kernel_thread_helper+0x4/0x10
  RIP  cachefiles_walk_to_object+0x436/0x745 [cachefiles]
  ---[ end trace 1d481c9af1804caa ]---

I tested the uncaching by the following means:

 (1) Create a big file on my NFS server (104857600 bytes).

 (2) Read the file into the cache with md5sum on the NFS client.  Look in
     /proc/fs/fscache/stats:

	Pages  : mrk=25601 unc=0

 (3) Open the file for read/write ("bash 5<>/warthog/bigfile").  Look in proc
     again:

	Pages  : mrk=25601 unc=25601

Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-and-Tested-by: Suresh Jayaraman <sjayaraman@suse.de>
cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-07 13:21:56 -07:00
Boaz Harrosh
2bea038c52 pnfs: write: Set mds_offset in the generic layer - it is needed by all LDs
In current pnfs tree, all the layouts set mds_offset in their
.write_pagelist member.
mds_offset is only used by generic layer and should be handled by it.

This patch is for upstream. It is needed in this -rc series to fix a
bug in objects layout_commit.

I'll send patches for objects and blocks to be
squashed into current pnfs tree.

TODO: It looks like the read path needs the same patch.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-28 14:12:11 -04:00
Bryan Schumaker
1650add235 NFS: Fix decode_secinfo_maxsz
I initially did the calculation in bytes, and not words

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-21 11:54:07 -04:00
Trond Myklebust
19982ba856 NFSv4.1: Fix an off-by-one error in pnfs_generic_pg_test
And document what is going on there...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-21 11:54:06 -04:00
Trond Myklebust
8f7d5efbef NFSv4.1: Fix some issues with pnfs_generic_pg_test
1. If the intention is to coalesce requests 'prev' and 'req' then we
   have to ensure at least that we have a layout starting at
   req_offset(prev).

2. If we're only requesting a minimal layout of length desc->pg_count,
   we need to test the length actually returned by the server before
   we allow the coalescing to occur.

3. We need to deal correctly with (pgio->lseg == NULL)

4. Fixup the test guarding the pnfs_update_layout.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-21 11:54:05 -04:00
Benny Halevy
19345cb299 NFSv4.1: file layout must consider pg_bsize for coalescing
Otherwise we end up overflowing the rpc buffer size on the receive end.

Signed-off-by: Benny Halevy <benny@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-20 16:12:26 -04:00
Vitaliy Ivanov
e44ba033c5 treewide: remove duplicate includes
Many stupid corrections of duplicated includes based on the output of
scripts/checkincludes.pl.

Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-06-20 16:08:19 +02:00
Boaz Harrosh
df18d127f4 pnfs-obj: No longer needed to take an extra ref at add_device
Andy's last device_cache patches, already take an extra
reference on the newly inserted device_id. So we can remove it
from obj-io.

Without this patch the device_ids are leaked.

Andy's patches are not in Linus tree yet. So I'm not sure if they are
scheduled for this Kernel or the next. This patch should be added as
part of these.

CC: Andy Adamson <andros@netapp.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-19 14:49:51 -04:00
Trond Myklebust
ee7b75fc4f NFSv4: Fix a readdir regression
Commit 7ebb9315 (NFS: use secinfo when crossing mountpoints) introduces
a regression when decoding an NFSv4 readdir entry that sets the
rdattr_error field.
By treating the resulting value as if it is a decoding error, the current
code may cause us to skip valid readdir entries.

Reported-by: Andy Adamson <andros@netapp.com>
Cc: stable@kernel.org [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-16 13:24:47 -04:00
Fred Isaman
9e2dfdb308 nfs4.1: mark layout as bad on error path in _pnfs_return_layout
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 14:36:33 -04:00
Fred Isaman
ea0ded748b nfs4.1: prevent race that allowed use of freed layout in _pnfs_return_layout
mark_matching_lsegs_invalid could put the last ref to the layout, so
the get_layout_hdr needs to be called first.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 12:39:23 -04:00
Benny Halevy
1ed3a8539a NFSv4.1: need to put_layout_hdr on _pnfs_return_layout error path
We always get a reference on the layout header and we rely on
nfs4_layoutreturn_release to put it.  If we hit an allocation error
before starting the rpc proc we bail out early without dereferncing
the layout header properly.

Signed-off-by: Benny Halevy <benny@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:52:15 -04:00
David Howells
c7fd06228b NFS: (d)printks should use %zd for ssize_t arguments
(d)printks should use %zd for ssize_t arguments not %ld, otherwise they might
get a warning.  I see the following with MN10300.

fs/nfs/objlayout/objlayout.c: In function 'objlayout_read_done':
fs/nfs/objlayout/objlayout.c:294: warning: format '%ld' expects type 'long int', but argument 3 has type 'ssize_t'

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Trond Myklebust <Trond.Myklebust@netapp.com>
cc: linux-nfs@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:31 -04:00
Benny Halevy
d771e3a43e NFSv4.1: fix break condition in pnfs_find_lseg
The break condition to skip out of the loop got broken when cmp_layout
was change.  Essentially, we want to stop looking once we know no layout
on the remainder of the list can match the first byte of the looked-up
range.

Reported-by: Peng Tao <peng_tao@emc.com>
Signed-off-by: Benny Halevy <benny@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:31 -04:00
Fred Isaman
a2e1d4f2e5 nfs4.1: fix several problems with _pnfs_return_layout
_pnfs_return_layout had the following problems:

- it did not call pnfs_free_lseg_list on all paths
- it unintentionally did a forgetful return when there was no outstanding io
- it raced with concurrent LAYOUTGETS

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:30 -04:00
Andy Adamson
cec765cf58 NFSv4.1: allow zero fh array in filelayout decode layout
Signed-off-by: Andy Adamson <andros@netapp.com>
cc:stable@kernel.org [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:30 -04:00
Andy Adamson
533eb4611c NFSv4.1: allow nfs_fhget to succeed with mounted on fileid
Commit 28331a46d8 "Ensure we request the
ordinary fileid when doing readdirplus"
changed the meaning of NFS_ATTR_FATTR_FILEID which used to be set when
FATTR4_WORD1_MOUNTED_ON_FILED was requested.

Allow nfs_fhget to succeed with only a mounted on fileid when crossing
a mountpoint or a referral.

Ask for the fileid of the absent file system if mounted_on_fileid is not
supported.

Signed-off-by: Andy Adamson <andros@netapp.com>
cc:stable@kernel.org [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:29 -04:00
Trond Myklebust
1d92a08da2 NFSv4.1: Fix a refcounting issue in the pNFS device id cache
When we add something to the global device id cache, we need to bump the
reference count, so that the cache itself holds a reference.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:29 -04:00
Benny Halevy
c9c30dd5f7 NFSv4.1: deprecate headerpadsz in CREATE_SESSION
We don't support header padding yet so better off ditching it

Reported-by: Sid Moore <learnmost@gmail.com>
Signed-off-by: Benny Halevy <benny@tonian.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:28 -04:00
Peng Tao
0f66b5984d NFS41: do not update isize if inode needs layoutcommit
nfs_update_inode will update isize if there is no queued pages. For pNFS,
layoutcommit is supposed to change file size on server, the same effect as queued
pages. nfs_update_inode may be called when dirty pages are written back (nfsi->npages==0)
but layoutcommit is not sent, and it will change client file size according to server
file size. Then client ends up losing what it just writes back in pNFS path.
So we should skip updating client file size if file needs layoutcommit.

Signed-off-by: Peng Tao <peng_tao@emc.com>
Cc: stable@kernel.org   [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:24:28 -04:00
Weston Andros Adamson
9e3bd4e24e NFS: fix umount of pnfs filesystems
Unmounting a pnfs filesystem hangs using filelayout and possibly others.
This fixes the use of the rcu protected node by making use of a new 'tmpnode'
for the temporary purge list. Also, the spinlock shouldn't be held when calling
synchronize_rcu().

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-15 11:23:02 -04:00
Wu Fengguang
846d5a091b writeback: remove .nonblocking and .encountered_congestion
Remove two unused struct writeback_control fields:

	.encountered_congestion	(completely unused)
	.nonblocking		(never set, checked/showed in XFS,NFS/btrfs)

The .for_background check in nfs_write_inode() is also removed btw,
as .for_background implies WB_SYNC_NONE.

Reviewed-by: Jan Kara <jack@suse.cz>
Proposed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-06-08 08:25:23 +08:00
Linus Torvalds
cd1acdf172 Merge branch 'pnfs-submit' of git://git.open-osd.org/linux-open-osd
* 'pnfs-submit' of git://git.open-osd.org/linux-open-osd: (32 commits)
  pnfs-obj: pg_test check for max_io_size
  NFSv4.1: define nfs_generic_pg_test
  NFSv4.1: use pnfs_generic_pg_test directly by layout driver
  NFSv4.1: change pg_test return type to bool
  NFSv4.1: unify pnfs_pageio_init functions
  pnfs-obj: objlayout_encode_layoutcommit implementation
  pnfs: encode_layoutcommit
  pnfs-obj: report errors and .encode_layoutreturn Implementation.
  pnfs: encode_layoutreturn
  pnfs: layoutret_on_setattr
  pnfs: layoutreturn
  pnfs-obj: osd raid engine read/write implementation
  pnfs: support for non-rpc layout drivers
  pnfs-obj: define per-inode private structure
  pnfs: alloc and free layout_hdr layoutdriver methods
  pnfs-obj: objio_osd device information retrieval and caching
  pnfs-obj: decode layout, alloc/free lseg
  pnfs-obj: pnfs_osd XDR client implementation
  pnfs-obj: pnfs_osd XDR definitions
  pnfs-obj: objlayoutdriver module skeleton
  ...
2011-05-29 14:10:13 -07:00
Linus Torvalds
f1d1c9fa8f Merge branch 'nfs-for-2.6.40' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.40' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  SUNRPC: Support for RPC over AF_LOCAL transports
  SUNRPC: Remove obsolete comment
  SUNRPC: Use AF_LOCAL for rpcbind upcalls
  SUNRPC: Clean up use of curly braces in switch cases
  NFS: Revert NFSROOT default mount options
  SUNRPC: Rename xs_encode_tcp_fragment_header()
  nfs,rcu: convert call_rcu(nfs_free_delegation_callback) to kfree_rcu()
  nfs41: Correct offset for LAYOUTCOMMIT
  NFS: nfs_update_inode: print current and new inode size in debug output
  NFSv4.1: Fix the handling of NFS4ERR_SEQ_MISORDERED errors
  NFSv4: Handle expired stateids when the lease is still valid
  SUNRPC: Deal with the lack of a SYN_SENT sk->sk_state_change callback...
2011-05-29 11:20:02 -07:00
Boaz Harrosh
9342077011 pnfs-obj: pg_test check for max_io_size
Implement pg_test vector to test for max IO sizes. We calculate
a max_io_size member only once, and cache it in lseg so to not
do so on every page insert.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[simplify logic]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 21:03:08 +03:00
Boaz Harrosh
5b36c7dc41 NFSv4.1: define nfs_generic_pg_test
By default, unless pnfs is used coalesce pages until pg_bsize
(rsize or wsize) is reached.

pnfs layout drivers define their own pg_test methods that use
pnfs_generic_pg_test and need to define their own I/O size
limits (e.g. based on the file stripe size).

[Move a check from nfs_pageio_do_add_request to nfs_generic_pg_test]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 21:02:42 +03:00
Benny Halevy
89a58e32d9 NFSv4.1: use pnfs_generic_pg_test directly by layout driver
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:56:55 +03:00
Benny Halevy
18ad0a9f2c NFSv4.1: change pg_test return type to bool
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:56:54 +03:00
Benny Halevy
dfed206b88 NFSv4.1: unify pnfs_pageio_init functions
Use common code for pnfs_pageio_init_{read,write} and use
a common generic pg_test function.

Note that this function always assumes the the layout driver's
pg_test method is implemented.

[Fix BUG]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:56:43 +03:00
Boaz Harrosh
a0fe8bf427 pnfs-obj: objlayout_encode_layoutcommit implementation
* Define API for io-engines to report delta_space_used in IOs
* Encode the osd-layout specific information of the layoutcommit
  XDR buffer.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:55:00 +03:00
Benny Halevy
ac7db7264a pnfs: encode_layoutcommit
Add a layout driver method to encode the layout type specific
opaque part of layout commit in-line in the xdr stream.

Currently, the pnfs-objects layout driver uses it to encode metadata hints
to the MDS and the blocks layout driver to commit provisionally allocated
extents to the file.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:55:00 +03:00
Boaz Harrosh
adb58535e6 pnfs-obj: report errors and .encode_layoutreturn Implementation.
An io_state pre-allocates an error information structure for each
possible osd-device that might error during IO. When IO is done if all
was well the io_state is freed. (as today). If the I/O has ended with an
error, the io_state is queued on a per-layout err_list. When eventually
encode_layoutreturn() is called, each error is properly encoded on the
XDR buffer and only then the io_state is removed from err_list and
de-allocated.

It is up to the io_engine to fill in the segment that fault and the type
of osd_error that occurred. By calling objlayout_io_set_result() for
each failing device.

In objio_osd:
* Allocate io-error descriptors space as part of io_state
* Use generic objlayout error reporting at end of io.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:54:45 +03:00
Andy Adamson
04a555498e pnfs: encode_layoutreturn
Add a layout driver method to encode the layout type specific
opaque part of layout return in-line in the xdr stream.

Currently the pnfs-objects layout driver uses it to encode i/o error
information on LAYOUTRETURN.

Signed-off-by: Andy Adamson <andros@netapp.com>
[fixup layout header pointer for encode_layoutreturn]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:54:37 +03:00
Benny Halevy
8a1636c459 pnfs: layoutret_on_setattr
With the objects layout security model, we have object capabilities
that are associated with the layout and we anticipate that the server
will issue a cb_layoutrecall for any setattr that changes security
related attributes (user/group/mode/acl) or truncates the file.

Therefore, the layout is returned before issuing the setattr to avoid
the anticipated cb_layoutrecall.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:54:36 +03:00
Benny Halevy
cbe8260369 pnfs: layoutreturn
NFSv4.1 LAYOUTRETURN implementation

Currently, does not support layout-type payload encoding.

Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn>
[call pnfs_return_layout right before pnfs_destroy_layout]
[remove assert_spin_locked from pnfs_clear_lseg_list]
[remove wait parameter from the layoutreturn path.]
[remove return_type field from nfs4_layoutreturn_args]
[remove range from nfs4_layoutreturn_args]
[no need to send layoutcommit from _pnfs_return_layout]
[don't wait on sync layoutreturn]
[fix layout stateid in layoutreturn args]
[fixed NULL deref in _pnfs_return_layout]
[removed recaim member of nfs4_layoutreturn_args]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:54:36 +03:00
Boaz Harrosh
04f8345038 pnfs-obj: osd raid engine read/write implementation
With the use of the in-kernel osd library. Implement read/write
of data from/to osd-objects according to information specified
in the objects-layout.

Support for stripping over mirrors with a received stripe_unit.
There are however a few constrains which are not supported:
 1. Stripe Unit must be a multiple of PAGE_SIZE
 2. stripe length (stripe_unit * number_of_stripes) can not be
    bigger then 32bit.

Also support raid-groups and partial-layout. Partial-layout is
when not all the groups are received on the line, addressing
only a partial range of the file.

TODO:
  Only raid0! raid 4/5/6 support will come at later stage

A none supported layout will send IO through the MDS

[Important fallout from the last rebase]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[gfp_flags]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:54:15 +03:00
Benny Halevy
d20581aa4b pnfs: support for non-rpc layout drivers
Non-rpc layout driver such as for objects and blocks
implement their own I/O path and error handling logic.
Therefore bypass NFS-based error handling for these layout drivers.

[fix lseg ref-count bugs, and null de-refs]
[Fall out from: non-rpc layout drivers]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[get rid of PNFS_USE_RPC_CODE]
[get rid of __nfs4_write_done_cb]
[revert useless change in nfs4_write_done_cb]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:53:51 +03:00
Benny Halevy
e51b841dd0 pnfs-obj: define per-inode private structure
allocate and deallocate per-inode private pnfs_layout_hdr
in preparation for I/O implementation.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:53:51 +03:00
Benny Halevy
636fb9c89d pnfs: alloc and free layout_hdr layoutdriver methods
[gfp_flags]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:53:50 +03:00
Boaz Harrosh
b6c05f1693 pnfs-obj: objio_osd device information retrieval and caching
When a new layout is received in objio_alloc_lseg all device_ids
referenced are retrieved. The device information is queried for from MDS
and then the osd_device is looked-up from the osd-initiator library. The
devices are cached in a per-mount-point list, for later use. At unmount
all devices are "put" back to the library.

objlayout_get_deviceinfo(), objlayout_put_deviceinfo() middleware
API for retrieving device information given a device_id.

TODO: The device cache can get big. Cap its size. Keep an LRU and start
      to return devices which were not used, when list gets to big, or
      when new entries allocation fail.

[pnfs-obj: Bugs in new global-device-cache code]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[gfp_flags]
[use global device cache]
[use layout driver in global device cache]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:53:33 +03:00
Boaz Harrosh
09f5bf4e6d pnfs-obj: decode layout, alloc/free lseg
objlayout_alloc_lseg prepares an xdr_stream and calls the
raid engins objio_alloc_lseg() to allocate a private
pnfs_layout_segment.

objio_osd.c::objio_alloc_lseg() uses passed xdr_stream to
decode and store the layout_segment information in an
objio_segment struct, using the pnfs_osd_xdr.h API for
the actual parsing the layout xdr.

objlayout_free_lseg calls objio_free_lseg() to free the
allocated space.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[gfp_flags]
[removed "extern" from function definitions]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:53:06 +03:00
Boaz Harrosh
f1bc893a89 pnfs-obj: pnfs_osd XDR client implementation
* Add the fs/nfs/objlayout/pnfs_osd_xdr_cli.c file, which will
  include the XDR encode/decode implementations for the pNFS
  client objlayout driver.

[Wrong type in comments]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:52:36 +03:00
Benny Halevy
c93407d03c pnfs-obj: objlayoutdriver module skeleton
* Define the PNFS_OBJLAYOUT Kconfig option in the nfs
  master Kconfig file.
* Add the objlayout driver to the Kernel's Kbuild system.
* Add the fs/nfs/objlayout/Kbuild file for building the
  objlayoutdriver.ko driver
* Define fs/nfs/objlayout/objio_osd.c, register the driver on module
  initialization and unregister on exit.

[pnfs-obj: remove of CONFIG_PNFS fallout]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[added "unsure" clause]
[depend on NFS_V4_1]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:52:35 +03:00
J. Bruce Fields
ae50c0b5c6 pnfs: client stats
A pNFS client auto-negotiates a lot of features (minorversion level,
pNFS layout type, etc.).  This is convenient, but makes certain kinds of
failures hard for a user to detect.

For example, if the client falls back on 4.0, or falls back to MDS IO
because the user didn't connect to the right iscsi disks before
mounting, the only symptoms may be reduced performance, which may not be
noticed till long after the actual failure, and may be difficult for a
user to diagnose.

However, such "failures" may also be perfectly normal in some cases, so
we don't want to spam the system logs with them.

One approach would be to put some more information into
/proc/self/mountstats.

Signed-off-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfs: add commit client stats]
[fixup data types for "ret" variables in pnfs_try_to* inline funcs.]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[fix definition of show_pnfs for !CONFIG_PNFS]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Fix show_sessions in the not CONFIG_NFS_V4_1 case]
    There is a build error when CONFIG_NFS_V4 is set but
    CONFIG_NFS_V4_1 is *not* set. show_sessions() prototype
    was unbalanced between the two cases.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[pnfs: super.c remove CONFIG_PNFS]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:52:34 +03:00
Benny Halevy
778b5502fd pnfs: Use byte-range for cb_layoutrecall
Use recalled range to invalidate particular layout segments in the layout cache.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:52:34 +03:00
Benny Halevy
707ed5fdb5 pnfs: align layoutget requests on page boundaries
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:52:33 +03:00
Benny Halevy
fb3296eb46 pnfs: Use byte-range for layoutget
Add offset and count parameters to pnfs_update_layout and use them to get
the layout in the pageio path.

Order cache layout segments in the following order:
* offset (ascending)
* length (descending)
* iomode (RW before READ)

Test byte range against the layout segment in use in pnfs_{read,write}_pg_test
so not to coalesce pages not using the same layout segment.

[fix lseg ordering]
[clean up pnfs_find_lseg lseg arg]
[remove unnecessary FIXME]
[fix ordering in pnfs_insert_layout]
[clean up pnfs_insert_layout]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:52:32 +03:00
Benny Halevy
f7da7a129d SUNRPC: introduce xdr_init_decode_pages
Initialize xdr_stream and xdr_buf using an array of page pointers
and length of buffer.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:52:32 +03:00
Benny Halevy
35c8bb543c NFSv4.1: use layout driver in global device cache
pnfs deviceids are unique per server, per layout type.
struct nfs_client is currently used to distinguish deviceids from
different nfs servers, yet these may clash between different layout
types on the same server.  Therefore, use the layout driver associated
with each deviceid at insertion time to look it up, unhash, or
delete it.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:52:31 +03:00
Marc Eshel
1be5683b03 pnfs: CB_NOTIFY_DEVICEID
Note: This functionlaity is incomplete as all layout segments referring to
the 'to be removed device id' need to be reaped, and all in flight I/O drained.

[use be32 res in nfs4_callback_devicenotify]
[use nfs_client to qualify deviceid for cb_notify_deviceid]
[use global deviceid cache for CB_NOTIFY_DEVICEID]
[refactor device cache _lookup_deviceid]
[refactor device cache _find_get_deviceid]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Bug in new global-device-cache code]
[layout_driver MUST set free_deviceid_node if using dev-cache]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:52:31 +03:00
Benny Halevy
1775bc342c NFSv4.1: purge deviceid cache on nfs_free_client
Use the pnfs_layoutdriver_type both as a qualifier for the deviceid,
distinguishing deviceid from different layout types on the server,
and for freeing the layout-driver allocated structure containing the
nfs4_deviceid_node.

[BUG in _deviceid_purge_client]
[layout_driver MUST set free_deviceid_node if using dev-cache]
[let ver < 4.1 compile]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
[removed EXPORT_SYMBOL_GPL(nfs4_deviceid_purge_client)]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 20:50:42 +03:00
Benny Halevy
a1eaecbc4c NFSv4.1: make deviceid cache global
Move deviceid cache from the pnfs files layout driver to the
generic layer in preparation for the objects layout driver.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 12:09:48 +03:00
Benny Halevy
45df3c8b0f pnfs: resolve header dependency in pnfs.h
Some definitions in the header file depend on nfs_fs.h so pnfs.h can't
be included independently.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 12:09:48 +03:00
Benny Halevy
67d51f65bd NFSv4.1: use struct nfs_client to qualify deviceid
deviceids are unique per server, per layout type.
Therefore, in the global cache in the files layout driver
deviceids from different servers may clash so we need
to qualify them with a struct nfs_client that represents
the nfs server that returned the deviceid.

Introduced in 2.6.39 commit ea8eecdd
"NFSv4.1 move deviceid cache to filelayout driver"

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 12:09:47 +03:00
Jim Rees
3b6445a6f6 NFSv4.1: fix typo in filelayout_check_layout
Signed-off-by: Jim Rees <rees@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29 12:09:46 +03:00
Chuck Lever
4251c94833 NFS: Revert NFSROOT default mount options
Marek Belisko <marek.belisko@gmail.com> reports that recent attempts
to fix regressions in NFSROOT have broken his configuration:

> After update from 2.6.38-rc8 to 2.6.38 is mounting rootfs over nfs not possible.
> Log:
> VFS: Mounted root (nfs filesystem) on device 0:14.
> Freeing init memory: 132K
> nfs: server 10.146.1.21 not responding, still trying
> nfs: server 10.146.1.21 not responding, still trying
>
> This is never ending. I make short bisect (not too much commits
> between versions)
> and bad commit was reported: 53d4737580
>
> NFS: NFSROOT should default to "proto=udp"
>
> I've tested on mini2440 board (DM9000, static IP).
> Is there some missing option or something else to be checked?

An examination of a network trace captured during the failure shows
that the mount is actually succeeding, but that the client is not
seeing READ replies larger than 16KB.  This could be a local packet
filtering issue on the client, but we didn't troubleshoot this
further because of the reported "git bisect" result.

Last fall we removed the ad hoc mount option parser in
fs/nfs/nfsroot.c in favor of using the main parser in fs/nfs/super.c
(see commit 56463e50 "NFS: Use super.c for NFSROOT mount option
parsing").  That commit changed the default NFSROOT mount options to
be the same as those employed by user space mounts.

As it turns out, these new default mount options are not tolerated by
many embedded systems.  So far these problems have been due to
specific behavior of certain embedded NICs.  The NFS community does
not have such hardware on hand for running tests.

Commit 53d47375 recently introduced a clean way to specify default
mount options for NFSROOT, so we can now easily restore the
traditional defaults for NFSROOT:

   vers=2,udp,rsize=4096,wsize=4096

This should revert the new default NFSROOT mount options introduced
with commit 56463e50.

Tested-by: Marek Belisto <marek.belisto@open-nandra.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-05-27 17:42:47 -04:00
Lai Jiangshan
26f04dde68 nfs,rcu: convert call_rcu(nfs_free_delegation_callback) to kfree_rcu()
The rcu callback nfs_free_delegation_callback() just calls a kfree(),
so we use kfree_rcu() instead of the call_rcu(nfs_free_delegation_callback).

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-05-27 17:42:46 -04:00
Vitaliy Gusev
4b8ee2b82e nfs41: Correct offset for LAYOUTCOMMIT
A client sends offset to MDS as it was seen by DS. As result,
file size after copy is only half of original file size in case
of 2 DS.

Signed-off-by: Vitaliy Gusev <gusev.vitaliy@nexenta.com>
Cc: stable@kernel.org [2.6.39]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-05-27 17:42:01 -04:00
Harshula Jayasuriya
60c16ea877 NFS: nfs_update_inode: print current and new inode size in debug output
Hi Trond,

In nfs_update_inode debug output, print the current and new inode
size when the file size changes on the NFS server.

Signed-off-by: Harshula Jayasuriya <harshula@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-05-27 17:42:01 -04:00
Trond Myklebust
444f72fe7e NFSv4.1: Fix the handling of NFS4ERR_SEQ_MISORDERED errors
Currently, the call to nfs4_schedule_session_recovery() will actually just
result in a test of the lease when what we really want is to force a
session reset.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2011-05-27 17:42:01 -04:00
Trond Myklebust
0ced63d1a2 NFSv4: Handle expired stateids when the lease is still valid
Currently, if the server returns NFS4ERR_EXPIRED in reply to a READ or
WRITE, but the RENEW test determines that the lease is still active, we
fail to recover and end up looping forever in a READ/WRITE + RENEW death
spiral.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2011-05-27 17:42:01 -04:00
Ying Han
1495f230fa vmscan: change shrinker API by passing shrink_control struct
Change each shrinker's API by consolidating the existing parameters into
shrink_control struct.  This will simplify any further features added w/o
touching each file of shrinker.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix warning]
[kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API]
[akpm@linux-foundation.org: fix xfs warning]
[akpm@linux-foundation.org: update gfs2]
Signed-off-by: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:26 -07:00
Trond Myklebust
a75b9df9d3 NFSv4.1: Ensure that layoutget uses the correct gfp modes
Currently, writebacks may end up recursing back into the filesystem due to
GFP_KERNEL direct reclaims in the pnfs subsystem.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-05-11 22:52:13 -04:00
Andy Adamson
2887fe4552 NFSv4.1: remove pnfs_layout_hdr from pnfs_destroy_all_layouts tmp_list
Prevents an infinite loop as list was never emptied.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-05-11 14:20:13 -04:00
Andy Adamson
a8a4ae3a89 NFSv41: Resend on NFS4ERR_RETRY_UNCACHED_REP
Free the slot and resend the RPC with new session <slot#,seq#>.

For nfs4_async_handle_error, return -EAGAIN and set the task->tk_status to 0
to restart the async rpc in the rpc_restart_call_prepare state which resets
the slot.

For nfs4_handle_exception, retrying a call that uses nfs4_call_sync will
reset the slot via nfs41_call_sync_prepare.

For open/close/lock/locku/delegreturn/layoutcommit/unlink/rename/write
cachethis is true, so these operations will not trigger an
NFS4ERR_RETRY_UNCACHED_REP.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-05-11 14:01:33 -04:00
Jeff Layton
26c4c17073 nfs: don't lose MS_SYNCHRONOUS on remount of noac mount
On a remount, the VFS layer will clear the MS_SYNCHRONOUS bit on the
assumption that the flags on the mount syscall will have it set if the
remounted fs is supposed to keep it.

In the case of "noac" though, MS_SYNCHRONOUS is implied. A remount of
such a mount will lose the MS_SYNCHRONOUS flag since "sync" isn't part
of the mount options.

Reported-by: Max Matveev <makc@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-27 16:20:01 -04:00
Bryan Schumaker
613e901e1e NFS: Return meaningful status from decode_secinfo()
When compiling, I was getting this warning:
fs/nfs/nfs4xdr.c: In function ‘decode_secinfo’:
fs/nfs/nfs4xdr.c:4839:6: warning: variable ‘status’ set but not used
[-Wunused-but-set-variable]

We were unconditionally returning 0 as long as there wasn't an error
coming out of xdr_inline_decode().  We probably want to check the error
status coming out of decode_op_hdr() and decode_secinfo_gss(), rather
than assuming that everything is OK all the time.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-27 16:17:29 -04:00
Trond Myklebust
28331a46d8 NFSv4: Ensure we request the ordinary fileid when doing readdirplus
When readdir() returns a directory entry for the root of a mounted
filesystem, Linux follows the old convention of returning the inode
number of the covered directory (despite newer versions of POSIX declaring
that this is a bug).
To ensure this continues to work, the NFSv4 readdir implementation requests
the 'mounted-on-fileid' from the server.

However, readdirplus also needs to instantiate an inode for this entry, and
for that, we also need to request the real fileid as per this patch.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-27 15:57:16 -04:00
Trond Myklebust
1bd714f2a1 NFSv4: Ensure that clientid and session establishment can time out
The following patch ensures that we do not get permanently trapped in
the RPC layer when trying to establish a new client id or session.
This again ensures that the state manager can finish in a timely
fashion when the last filesystem to reference the nfs_client exits.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-24 14:29:33 -04:00
Trond Myklebust
fd954ae124 NFSv4.1: Don't loop forever in nfs4_proc_create_session
If a server for some reason keeps sending NFS4ERR_DELAY errors, we can end
up looping forever inside nfs4_proc_create_session, and so the usual
mechanisms for detecting if the nfs_client is dead don't work.

Fix this by ensuring that we loop inside the nfs4_state_manager thread
instead.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-24 14:28:18 -04:00
Bryan Schumaker
fb8a5ba811 NFSv4: Handle NFS4ERR_WRONGSEC outside of nfs4_handle_exception()
I only want to try other secflavors during an initial mount if
NFS4ERR_WRONGSEC is returned.  nfs4_handle_exception() could
potentially map other errors to EPERM, so we should handle this
error specially for correctness.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-18 17:06:00 -04:00
Bryan Schumaker
468f86134e NFSv4.1: Don't update sequence number if rpc_task is not sent
If we fail to contact the gss upcall program, then no message will
be sent to the server.  The client still updated the sequence number,
however, and this lead to NFS4ERR_SEQ_MISMATCH for the next several
RPC calls.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-18 17:05:48 -04:00
Trond Myklebust
47c2199b6e NFSv4.1: Ensure state manager thread dies on last umount
Currently, the state manager may continue to try recovering state forever
even after the last filesystem to reference that nfs_client has umounted.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2011-04-15 18:28:22 -04:00
Bryan Schumaker
c3dfc2808a NFS: Use correct variable for page bounds checking
While decoding a secinfo reply, I store the list of supported sec
flavors on a page accessible through res->flavors.  Before reading
each new flavor, I do some math to determine if there is enough
space left on this page, and I break out of my read look if there
isn't.  In order to perform this check correctly, I need to use the
address of res->flavors, rather than the address of res.

When this loop was broken early I lied to the caller and told them
that the entire list had been decoded.  This could lead to problems
if the caller tries to use any the garbage data claiming to be a
valid sec flavor.  I fixed this by using res->flavors->num_flavors
as a counter, incrementing it every time a sec flavor is
successfully decoded.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-13 15:12:23 -04:00
Bryan Schumaker
9b7160c55a NFS: don't negotiate when user specifies sec flavor
We were always attempting sec flavor negotiation, even if the user
told us a specific sec flavor to use.  If that sec flavor fails,
we should return an error rather than continuing with sec flavor
negotiation.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-13 15:12:23 -04:00
Bryan Schumaker
801a16dc7b NFS: Attempt mount with default sec flavor first
nfs4_lookup_root() is already configured to use either RPC_AUTH_UNIX
or a user specified flavor (through -o sec=<whatever>).  We should
use this flavor first, and only attempt negotiation if it fails
with -EPERM.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-13 15:12:23 -04:00
Bryan Schumaker
0fabee243a NFS: flav_array honors NFS_MAX_SECFLAVORS
NFS_MAX_SECFLAVORS should already take into account RPC_AUTH_UNIX
and RPC_AUTH_NULL, so we don't need to set aside extra slots
for them.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-13 15:12:22 -04:00
Bryan Schumaker
d1a8016a2d NFS: Fix infinite loop in gss_create_upcall()
There can be an infinite loop if gss_create_upcall() is called without
the userspace program running.  To prevent this, we return -EACCES if
we notice that pipe_version hasn't changed (indicating that the pipe
has not been opened).

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-13 15:12:22 -04:00
Weston Andros Adamson
79a48a1f5d Don't mark_inode_dirty_sync() while holding lock
mark_inode_dirty_sync() grabs the same inode lock!

race conditions between holding the lock in pnfs_set_layoutcommit() and in
mark_inode_dirty_sync() can result in a second call to pnfs_layoutcommit_inode(), but
this will be a noop as NFS_INO_LAYOUTCOMMIT won't be set in the second call

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-13 13:15:51 -04:00
Trond Myklebust
c0d0e96b84 NFS: Get rid of pointless test in nfs_commit_done
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-12 19:34:23 -04:00
Bryan Schumaker
561f0b0ad0 NFS: Remove unused argument from nfs_find_best_sec()
The inode was used in an earlier version of the code, but it isn't
used anymore.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-12 19:34:23 -04:00
Trond Myklebust
4b38a6db01 NFS: Eliminate duplicate call to nfs_mark_request_dirty
We only need to call nfs_mark_request_dirty() once in nfs_writepage_setup().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-12 19:34:22 -04:00
Jesper Juhl
160bc1604f NFS: Remove dead code from nfs_fs_mount()
In fs/nfs/super.c::nfs_fs_mount() we test for a NULL 'data':

...
 		if (data == NULL || mntfh == NULL)
 			goto out_free_fh;
...

and then further down in the function we test 'data' again:

...
 			nfs_fscache_get_super_cookie(
 				s, data ? data->fscache_uniq : NULL, NULL);
...

this second check is just dead code since there is no way 'data' could
possibly be NULL here.
We also rely on a non-NULL 'data' in more than one location between these
two tests, further proving the point that the second test is bogus.

This patch removes the dead code.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-12 19:34:21 -04:00
Dave Chinner
0d88f6e804 nfs: don't call __mark_inode_dirty while holding i_lock
nfs_scan_commit() is called with the inode->i_lock held, but it then
calls __mark_inode_dirty() while still holding the lock. This causes
a deadlock.

Push the inode->i_lock into nfs_scan_commit() so it can protect only
the parts of the code it needs to and can be dropped before the call
to __mark_inode_dirty() to avoid the deadlock.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Will Simoneau <simoneau@ele.uri.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-12 14:17:24 -07:00
Linus Torvalds
94c8a984ae Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Change initial mount authflavor only when server returns NFS4ERR_WRONGSEC
  NFS: Fix a signed vs. unsigned secinfo bug
  Revert "net/sunrpc: Use static const char arrays"
2011-04-08 11:47:35 -07:00
Bryan Schumaker
37adb89fad NFS: Change initial mount authflavor only when server returns NFS4ERR_WRONGSEC
When attempting an initial mount, we should only attempt other
authflavors if AUTH_UNIX receives a NFS4ERR_WRONGSEC error.
This allows other errors to be passed back to userspace programs.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-07 13:19:40 -07:00
Bryan Schumaker
418875900e NFS: Fix a signed vs. unsigned secinfo bug
rpc_authflavor_t is cast from an unsigned int, but the
initial code tried to use it as a signed int.  I fix
this by passing an rpc_authflavor_t pointer around, and
returning signed integers from functions.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-04-06 13:25:04 -07:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Dave Chinner
0444d76ae6 fs: don't use igrab() while holding i_lock
Fix the incorrect use of igrab() inside the i_lock in NFS and Ceph‥

If we are already holding the i_lock, we have a reference to the
inode so we can safely use ihold() to gain an extra reference. This
avoids hangs due to lock recursion on the i_lock now that the
inode_lock is gone and igrab() uses the i_lock itself.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Ryan Mallon <ryan@bluewatersys.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-29 07:50:34 -07:00
Trond Myklebust
a0e7e3cf79 NFS: Don't leak RPC clients in NFSv4 secinfo negotiation
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-27 17:48:17 +02:00
Trond Myklebust
4d65c520fb NFS: Fix a hang in the writeback path
Now that the inode scalability patches have been merged, it is no longer
safe to call igrab() under the inode->i_lock.
Now that we no longer call nfs_clear_request() until the nfs_page is
being freed, we know that we are always holding a reference to the
nfs_open_context, which again holds a reference to the path, and so
the inode cannot be freed until the last nfs_page has been removed
from the radix tree and freed.

We can therefore skip the igrab()/iput() altogether.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-27 17:48:07 +02:00
Trond Myklebust
0acd220192 Merge branch 'nfs-for-2.6.39' into nfs-for-next 2011-03-24 17:03:14 -04:00
Weston Andros Adamson
35124a0994 Cleanup XDR parsing for LAYOUTGET, GETDEVICEINFO
changes LAYOUTGET and GETDEVICEINFO XDR parsing to:
 - not use vmap, which doesn't work on incoherent archs
 - use xdr_stream parsing for all xdr

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24 17:01:41 -04:00
Andy Adamson
ef31153786 NFSv4.1 convert layoutcommit sync to boolean
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24 15:49:48 -04:00
Andy Adamson
de4b15c7e9 NFSv4.1 pnfs_layoutcommit_inode fixes
Test NFS_INO_LAYOUTCOMMIT before kzalloc
Mark inode dirty to retry LAYOUTCOMMIT on kzalloc failure.
Add comments.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24 15:49:48 -04:00
Bryan Schumaker
8f70e95f9f NFS: Determine initial mount security
When sec=<something> is not presented as a mount option,
we should attempt to determine what security flavor the
server is using.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24 13:52:42 -04:00
Bryan Schumaker
7ebb931598 NFS: use secinfo when crossing mountpoints
A submount may use different security than the parent
mount does.  We should figure out what sec flavor the
submount uses at mount time.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24 13:52:42 -04:00
Bryan Schumaker
5a5ea0d485 NFS: Add secinfo procedure
This patch adds the nfs4 operation secinfo as a
valid nfs rpc operation.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24 13:52:41 -04:00
Bryan Schumaker
7c5130588d NFS: lookup supports alternate client
A later patch will need to perform a lookup using an
alternate client with a different security flavor.
This patch adds support for doing that on NFS v4.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24 13:52:41 -04:00
Bryan Schumaker
e73b83f270 NFS: convert call_sync() to a function
This patch changes nfs4_call_sync() from a macro into a
static inline function.  As a macro, the call_sync()
function will not do any type checking and depends
on the sequence arguments always having the same name.
As a function, we get to have type checking and can
rename the arguments if we so choose.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-24 13:52:41 -04:00
Fred Isaman
cccb4d063b NFSv4.1 remove temp code that prevented ds commits
Now that all the infrastructure is in place, we will do the
right thing if we remove this special casing.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23 15:29:04 -04:00
Andy Adamson
863a3c6c68 NFSv4.1: layoutcommit
The filelayout driver sends LAYOUTCOMMIT only when COMMIT goes to
the data server (as opposed to the MDS) and the data server WRITE
is not NFS_FILE_SYNC.

Only whole file layout support means that there is only one IOMODE_RW layout
segment.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn>
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn>
Tested-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23 15:29:04 -04:00
Fred Isaman
e0c2b38018 NFSv4.1: filelayout driver specific code for COMMIT
Implement all the hooks created in the previous patches.
This requires exporting quite a few functions and adding a few
structure fields.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23 15:29:04 -04:00
Fred Isaman
988b6dceb0 NFSv4.1: remove GETATTR from ds commits
Any COMMIT compound directed to a data server needs to have the
GETATTR calls suppressed.  We here, make sure the field we are testing
(data->lseg) is set and refcounted correctly.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23 15:29:03 -04:00
Fred Isaman
a861a1e1c3 NFSv4.1: add generic layer hooks for pnfs COMMIT
We create three major hooks for the pnfs code.

pnfs_mark_request_commit() is called during writeback_done from
nfs_mark_request_commit, which gives the driver an opportunity to
claim it wants control over commiting a particular req.

pnfs_choose_commit_list() is called from nfs_scan_list
to choose which list a given req should be added to, based on
where we intend to send it for COMMIT.  It is up to the driver
to have preallocated list headers for each destination it may need.

pnfs_commit_list() is how the driver actually takes control, it is
used instead of nfs_commit_list().

In order to pass information between the above functions, we create
a union in nfs_page to hold a lseg (which is possible because the req is
not on any list while in transition), and add some flags to indicate
if we need to use the pnfs code.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23 15:29:03 -04:00
Fred Isaman
425eb736cd NFSv4.1: alloc and free commit_buckets
Create a preallocated list header to hold nfs_pages for each
non-MDS COMMIT destination.  Note this is not necessarily each DS,
but is basically each <DS, fh> pair.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23 15:29:03 -04:00
Fred Isaman
c879513e91 NFSv4.1: shift filelayout_free_lseg
Move it up to avoid forward declaration in later patch.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23 15:29:03 -04:00
Fred Isaman
5917ce8440 NFSv4.1: pull out code from nfs_commit_release
Create a separate support function for later use by data server
commit code.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23 15:29:03 -04:00
Fred Isaman
64bfeb49bd NFSv4.1: pull error handling out of nfs_commit_list
Create a separate support function for later use by data server
commit code.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23 15:29:03 -04:00
Fred Isaman
5f452431e2 NFSv4.1: add callback to nfs4_commit_done
Add a callback that the pnfs layout driver can use to do its own
error handling of the data server's COMMIT response.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23 15:29:03 -04:00
Fred Isaman
9ace33cdc6 NFSv4.1: rearrange nfs_commit_rpcsetup
Reorder nfs_commit_rpcsetup, preparing for a pnfs entry point.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23 15:29:02 -04:00
Fred Isaman
465d52437d NFSv4.1: don't send COMMIT to ds for data sync writes
Based on consensus reached in Feb 2011 interim IETF meeting regarding
use of LAYOUTCOMMIT, it has been decided that a NFS_DATA_SYNC return
from a WRITE to data server should not initiate a COMMIT.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23 15:29:02 -04:00
Bryan Schumaker
8ef2ce3e16 NFS: Detect loops in a readdir due to bad cookies
Some filesystems (such as ext4) can return the same cookie value for
multiple files.  If we try to start a readdir with one of these cookies,
the server will return the first file found with a cookie of the same
value.  This can cause the client to enter an infinite loop.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23 15:14:27 -04:00
Bryan Schumaker
480c2006eb NFS: Create nfs_open_dir_context
nfs_opendir() created a context that held much more information than we
need for a readdir.  This patch introduces a slimmed-down
nfs_open_dir_context that contains only the cookie and the cred used for
RPC operations.  The new context will eventually be used to help detect
readdir loops.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23 15:13:11 -04:00
Trond Myklebust
e47c085afb NFS: Ensure that we update the readdir filp->f_pos correctly
If we're doing a search by readdir cookie, we need to ensure that the
resulting f_pos is updated. To do so, we need to update the
desc->current_index, in the same way that we do in the search by
file offset case.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-23 15:12:12 -04:00
Gusev Vitaliy
4667058b77 nfs4: Fix NULL dereference at d_alloc_and_lookup()
d_alloc_and_lookup() calls i_op->lookup method due to
rootfh changes his fsid.

During mount i_op of NFS root inode is set to
nfs_mountpoint_inode_operations, if rpc_ops->getroot()
and rpc_ops->getattr() return different fsid.

After that  nfs_follow_remote_path() raised oops:

   BUG: unable to handle kernel NULL pointer dereference at (null)
   IP: [<          (null)>]           (null)

stack trace:

     d_alloc_and_lookup+0x4c/0x74
     do_lookup+0x1e3/0x280
     link_path_walk+0x12e/0xab0
     nfs4_remote_get_sb+0x56/0x2c0 [nfs]
     path_walk+0x67/0xe0
     vfs_path_lookup+0x8e/0x100
     nfs_follow_remote_path+0x16f/0x3e0 [nfs]
     nfs4_try_mount+0x6f/0xd0 [nfs]
     nfs_get_sb+0x269/0x400 [nfs]
     vfs_kern_mount+0x8a/0x1f0
     do_kern_mount+0x52/0x130
     do_mount+0x20a/0x260
     sys_mount+0x90/0xe0
     system_call_fastpath+0x16/0x1b

So just refresh fsid, as RFC3530 doesn't specify behavior
in case of rootfh changes fsid.

Signed-off-by: Vitaliy Gusev <gusev.vitaliy@nexenta.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-22 20:00:25 -04:00
Trond Myklebust
b8413f98f9 NFS: Fix a hang/infinite loop in nfs_wb_page()
When one of the two waits in nfs_commit_inode() is interrupted, it
returns a non-negative value, which causes nfs_wb_page() to think
that the operation was successful causing it to busy-loop rather
than exiting.
It also causes nfs_file_fsync() to incorrectly report the file as
being successfully committed to disk.

This patch fixes both problems by ensuring that we return an error
if the attempts to wait fail.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2011-03-21 21:09:24 -04:00
Trond Myklebust
b31268ac79 FS: Use stable writes when not doing a bulk flush
If we're only doing a single write, and there are no other unstable
writes being queued up, we might want to just flip to using a stable
write RPC call.

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-21 21:08:17 -04:00
Dan Carpenter
1c34092adf nfs: lock() vs unlock() typo
These should be spin_unlock() instead of spin_lock().  It's a typo.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-21 00:45:50 -04:00
Linus Torvalds
179198373c Merge branch 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (54 commits)
  RPC: killing RPC tasks races fixed
  xprt: remove redundant check
  SUNRPC: Convert struct rpc_xprt to use atomic_t counters
  SUNRPC: Ensure we always run the tk_callback before tk_action
  sunrpc: fix printk format warning
  xprt: remove redundant null check
  nfs: BKL is no longer needed, so remove the include
  NFS: Fix a warning in fs/nfs/idmap.c
  Cleanup: Factor out some cut-and-paste code.
  cleanup: save 60 lines/100 bytes by combining two mostly duplicate functions.
  NFS: account direct-io into task io accounting
  gss:krb5 only include enctype numbers in gm_upcall_enctypes
  RPCRDMA: Fix FRMR registration/invalidate handling.
  RPCRDMA: Fix to XDR page base interpretation in marshalling logic.
  NFSv4: Send unmapped uid/gids to the server when using auth_sys
  NFSv4: Propagate the error NFS4ERR_BADOWNER to nfs4_do_setattr
  NFSv4: cleanup idmapper functions to take an nfs_server argument
  NFSv4: Send unmapped uid/gids to the server if the idmapper fails
  NFSv4: If the server sends us a numeric uid/gid then accept it
  NFSv4.1: reject zero layout with zeroed stripe unit
  ...
2011-03-17 17:40:00 -07:00
Linus Torvalds
054cfaacf8 Merge branch 'mnt_devname' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'mnt_devname' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  vfs: bury ->get_sb()
  nfs: switch NFS from ->get_sb() to ->mount()
  nfs: stop mangling ->mnt_devname on NFS
  vfs: new superblock methods to override /proc/*/mount{s,info}
  nfs: nfs_do_{ref,sub}mount() superblock argument is redundant
  nfs: make nfs_path() work without vfsmount
  nfs: store devname at disconnected NFS roots
  nfs: propagate devname to nfs{,4}_get_root()
2011-03-16 19:09:57 -07:00
Al Viro
011949811b nfs: switch NFS from ->get_sb() to ->mount()
The last remaining instances of ->get_sb() can be converted ->mount()
now - nothing in them uses new vfsmount anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-16 16:48:06 -04:00
Al Viro
fd462fb51d nfs: stop mangling ->mnt_devname on NFS
now we can do that - nobody cares about its value anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-16 16:48:06 -04:00
Al Viro
c7f404b40a vfs: new superblock methods to override /proc/*/mount{s,info}
a) ->show_devname(m, mnt) - what to put into devname columns in mounts,
mountinfo and mountstats
b) ->show_path(m, mnt) - what to put into relative path column in mountinfo

Leaving those NULL gives old behaviour.  NFS switched to using those.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-16 16:48:06 -04:00
Al Viro
f8ad9c4bae nfs: nfs_do_{ref,sub}mount() superblock argument is redundant
It's always equal to dentry->d_sb

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-16 16:48:06 -04:00
Al Viro
b514f872f8 nfs: make nfs_path() work without vfsmount
part 3: now we have everything to get nfs_path() just by dentry -
just follow to (disconnected) root and pick the rest of the thing
there.

Start killing propagation of struct vfsmount * on the paths that
used to bring it to nfs_path().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-16 16:47:55 -04:00
Al Viro
b1942c5f8c nfs: store devname at disconnected NFS roots
part 2: make sure that disconnected roots have corresponding mnt_devname
values stashed into them.

Have nfs*_get_root() stuff a copy of devname into ->d_fsdata of the
found root, provided that it is disconnected.

Have ->d_release() free it when dentry goes away.

Have the places where NFS uses ->d_fsdata for sillyrename (and that
can *never* happen to a disconnected root - dentry will be attached
to its parent) free old devname copies if they find those.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-16 16:44:24 -04:00
Al Viro
0d5839ad05 nfs: propagate devname to nfs{,4}_get_root()
step 1 of ->mnt_devname fixes: make sure we have the value of devname
available in ..._get_root().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-16 16:27:04 -04:00
Linus Torvalds
bd2895eead Merge branch 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix build failure introduced by s/freezeable/freezable/
  workqueue: add system_freezeable_wq
  rds/ib: use system_wq instead of rds_ib_fmr_wq
  net/9p: replace p9_poll_task with a work
  net/9p: use system_wq instead of p9_mux_wq
  xfs: convert to alloc_workqueue()
  reiserfs: make commit_wq use the default concurrency level
  ocfs2: use system_wq instead of ocfs2_quota_wq
  ext4: convert to alloc_workqueue()
  scsi/scsi_tgt_lib: scsi_tgtd isn't used in memory reclaim path
  scsi/be2iscsi,qla2xxx: convert to alloc_workqueue()
  misc/iwmc3200top: use system_wq instead of dedicated workqueues
  i2o: use alloc_workqueue() instead of create_workqueue()
  acpi: kacpi*_wq don't need WQ_MEM_RECLAIM
  fs/aio: aio_wq isn't used in memory reclaim path
  input/tps6507x-ts: use system_wq instead of dedicated workqueue
  cpufreq: use system_wq instead of dedicated workqueues
  wireless/ipw2x00: use system_wq instead of dedicated workqueues
  arm/omap: use system_wq in mailbox
  workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER
2011-03-16 08:20:19 -07:00
Stephen Rothwell
8f68cd42d8 nfs: BKL is no longer needed, so remove the include
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-15 08:44:35 -04:00
Rob Landley
c5cb09b6f8 Cleanup: Factor out some cut-and-paste code.
Factor out some cut-and-paste code in options parsing.
Saves about 800 bytes on x86-64.

Signed-off-by: Rob Landley <rlandley@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:39:28 -05:00
Rob Landley
c12bacec45 cleanup: save 60 lines/100 bytes by combining two mostly duplicate functions.
Eliminate two mostly duplicate functions (nfs_parse_simple_hostname()
and nfs_parse_protected_hostname()) and instead just make the calling
function (nfs_parse_devname()) do everything.

Signed-off-by: Rob Landley <rlandley@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:39:28 -05:00
Konstantin Khlebnikov
7ec10f26e1 NFS: account direct-io into task io accounting
Account NFS direct-io reads and writes into Task I/O Accounting.
Do it before complition to handle aio.

NFS have unusual direct-io implementation,
thus accounting in generic code does not work.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:39:27 -05:00
Trond Myklebust
b064eca2cf NFSv4: Send unmapped uid/gids to the server when using auth_sys
The new behaviour is enabled using the new module parameter
'nfs4_disable_idmapping'.

Note that if the server rejects an unmapped uid or gid, then
the client will automatically switch back to using the idmapper.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:39:27 -05:00
Trond Myklebust
3ddeb7c5c6 NFSv4: Propagate the error NFS4ERR_BADOWNER to nfs4_do_setattr
This will be required in order to switch uid/gid mapping back on if the
admin has tried to disable it.

Note that we also propagate NFS4ERR_BADNAME at the same time, in order to
work around a Linux server bug.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:39:27 -05:00
Trond Myklebust
e4fd72a17d NFSv4: cleanup idmapper functions to take an nfs_server argument
...instead of the nfs_client.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:39:26 -05:00
Trond Myklebust
f0b851689a NFSv4: Send unmapped uid/gids to the server if the idmapper fails
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:39:26 -05:00
Trond Myklebust
5cf36cfdc8 NFSv4: If the server sends us a numeric uid/gid then accept it
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:39:26 -05:00
Benny Halevy
75247affd7 NFSv4.1: reject zero layout with zeroed stripe unit
Allowing stripe_unit==0 causes the client to crash later on
when dividing by zero.

Reported-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:45 -05:00
Fred Isaman
36fe432d33 NFSv4.1: Clear lseg pointer in ->doio function
Now that we have access to the pointer, clear it immediately after
the put, instead of in caller.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:45 -05:00
Fred Isaman
c76069bda0 NFSv4.1: rearrange ->doio args
This will make it possible to clear the lseg pointer in the same
function as it is put, instead of in the caller nfs_pageio_doio().

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:44 -05:00
Fred Isaman
a69aef1496 NFSv4.1: pnfs filelayout driver write
Allows the pnfs filelayout driver to write to the data servers.

Note that COMMIT to data servers will be implemented in a future
patch.  To avoid improper behavior, for the moment any WRITE to a data
server that would also require a COMMIT to the data server is sent
NFS_FILE_SYNC.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:44 -05:00
Fred Isaman
7ffd10640d NFSv4.1: remove GETATTR from ds writes
Any WRITE compound directed to a data server needs to have the
GETATTR calls suppressed.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:44 -05:00
Andy Adamson
0382b74409 NFSv4.1: implement generic pnfs layer write switch
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:44 -05:00
Fred Isaman
44b83799a9 NFSv4.1: trigger LAYOUTGET for writes
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:44 -05:00
Fred Isaman
5053aa568d NFSv4.1: Send lseg down into nfs_write_rpcsetup
We grab the lseg sent in from the doio function and attach it to
each struct nfs_write_data created.  This is how the lseg will be
sent to the layout driver.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:44 -05:00
Fred Isaman
b029bc9b08 NFSv4.1: add callback to nfs4_write_done
Add callback that pnfs layout driver can use to do its own handling
of data server WRITE response.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:43 -05:00
Andy Adamson
d138d5d17b NFSv4.1: rearrange nfs_write_rpcsetup
Reorder nfs_write_rpcsetup, preparing for a pnfs entry point.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:43 -05:00
Andy Adamson
568e8c494d NFSv4.1: turn off pNFS on ds connection failure
If a data server is unavailable, go through MDS.

Mark the deviceid containing the data server as a negative cache entry.
Do not try to connect to any data server on a deviceid marked as a negative
cache entry. Mark any layout that tries to use the marked deviceid as failed.

Inodes with a layout marked as fails will not use the layout for I/O, and will
not perform any more layoutgets.
Inodes without a layout will still do layoutget, but the layout will get
marked immediately.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:43 -05:00
Christoph Hellwig
ea8eecdd11 NFSv4.1 move deviceid cache to filelayout driver
No need for generic cache with only one user.
Keep a simple hash of deviceids in the filelayout driver.

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Acked-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:43 -05:00
Andy Adamson
cbdabc7f8b NFSv4.1: filelayout async error handler
Use our own async error handler.
Mark the layout as failed and retry i/o through the MDS on specified errors.

Update the mds_offset in nfs_readpage_retry so that a failed short-read retry
to a DS gets correctly resent through the MDS.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:43 -05:00
Andy Adamson
dc70d7b318 NFSv4.1: filelayout read
Attempt a pNFS file layout read by setting up the nfs_read_data struct and
calling nfs_initiate_read with the data server rpc client and the
filelayout rpc call ops.

Error handling is implemented in a subsequent patch.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Tested-by: Guo Mingyang <guomingyang@nrchpc.ac.cn>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:43 -05:00
Fred Isaman
cfe7f4120f NFSv4.1: filelayout i/o helpers
Prepare for filelayout_read_pagelist with helper functions that find the correct
data server, filehandle, and offset.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Tigran Mkrtchyan <tigran@anahit.desy.de>
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:42 -05:00
Andy Adamson
d83217c135 NFSv4.1: data server connection
Introduce a data server set_client and init session following the
nfs4_set_client and  nfs4_init_session convention.

Once a new nfs_client is on the nfs_client_list, the nfs_client cl_cons_state
serializes access to creating an nfs_client struct with matching properties.

Use the new nfs_get_client() that initializes new clients.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:42 -05:00
Andy Adamson
64419a9b20 NFSv4.1: generic read
Separate the rpc run portion of nfs_read_rpcsetup into a new function
nfs_initiate_read that is called for normal NFS I/O.

Add a pNFS read_pagelist function that is called instead of nfs_intitate_read
for pNFS reads.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:42 -05:00
Fred Isaman
bae724ef95 NFSv4.1: shift pnfs_update_layout locations
Move the pnfs_update_layout call location to nfs_pageio_do_add_request().
Grab the lseg sent in the doio function to nfs_read_rpcsetup and attach
it to each nfs_read_data so it can be sent to the layout driver.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:42 -05:00
Fred Isaman
94ad1c80e2 NFSv4.1: coelesce across layout stripes
Add a pg_test layout driver hook which is used to avoid coelescing I/O across
layout stripes.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:42 -05:00
Fred Isaman
d684d2ae10 NFSv4.1: lseg refcounting
Prepare put_lseg and get_lseg to be called from the pNFS I/O code.
Pull common code from pnfs_lseg_locked to call from pnfs_lseg.
Inline pnfs_lseg_locked into it's only caller.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:42 -05:00
Andy Adamson
94de8b27d0 NFSv4.1: add MDS mount DS only check
The DS only role cannot be used to mount.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:41 -05:00
Andy Adamson
d6fb79d433 NFSv4.1: new flag for lease time check
Data servers cannot send nfs4_proc_get_lease_time. but still need to setup
state renewal. Add the NFS_CS_CHECK_LEASE_TIME bit to indicate if the lease
time can be checked.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:41 -05:00
Andy Adamson
d3b4c9d767 NFSv4.1: new flag for state renewal check
Data servers not sharing a session with the mount MDS always have an empty
cl_superblocks list.
Replace the cl_superblocks empty list check to see if it is time to shut down
renewd with the NFS_CS_STOP_RENEW bit which is not set by such a data server.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:41 -05:00
Andy Adamson
89d1ea6579 NFSv4.1: send zero stateid seqid on v4.1 i/o
Data servers require a zero stateid seqid, and there is no advantage to not
doing the same for all NFSv4.1

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:41 -05:00
Andy Adamson
45a52a0207 NFS move nfs_client initialization into nfs_get_client
Now nfs_get_client returns an nfs_client ready to be used no matter if it was
found or created.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:41 -05:00
Andy Adamson
bf9c1387ca NFSv4.1: put_layout_hdr can remove nfsi->layout
Prevents an Oops triggered by CB_LAYOUTRECALL and LAYOUTGET race on a
pnfs_layout_hdr first pnfs_layout_segment.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:41 -05:00
Fred Isaman
136028967a NFS: change nfs_writeback_done to return void
The return values are not used by any callers.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:40 -05:00
Fred Isaman
83762c56c1 NFS: remove pointless if statement in nfs_direct_write_result
The code was doing nothing more in either branch of the if.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:40 -05:00
Fred Isaman
f49f9baac8 pnfs: fix pnfs lock inversion of i_lock and cl_lock
The pnfs code was using throughout the lock order i_lock, cl_lock.
This conflicts with the nfs delegation code.  Rework the pnfs code
to avoid taking both locks simultaneously.

Currently the code takes the double lock to add/remove the layout to a
nfs_client list, while atomically checking that the list of lsegs is
empty.  To avoid this, we rely on existing serializations.  When a
layout is initialized with lseg count equal zero, LAYOUTGET's
openstateid serialization is in effect, making it safe to assume it
stays zero unless we change it.  And once a layout's lseg count drops
to zero, it is set as DESTROYED and so will stay at zero.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:40 -05:00
Fred Isaman
9f52c2525e pnfs: do not need to clear NFS_LAYOUT_BULK_RECALL flag
We do not need to clear the NFS_LAYOUT_BULK_RECALL, as setting it
guarantees that NFS_LAYOUT_DESTROYED will be set once any outstanding
io is finished.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:40 -05:00
Fred Isaman
3851172244 pnfs: avoid incorrect use of layout stateid
The code could violate the following from RFC5661, section 12.5.3:
"Once a client has no more layouts on a file, the layout stateid is no
longer valid and MUST NOT be used."

This can occur when a layout already has a lseg, starts another
non-everlapping LAYOUTGET, and a CB_LAYOUTRECALL for the existing lseg
is processed before we hit pnfs_layout_process().

Solve by setting, each time the client has no more lsegs for a file, a
flag which blocks further use of the layout and triggers its removal.

This also fixes a second bug which occurs in the same instance as
above.  If we actually use pnfs_layout_process, we add the new lseg to
the layout, but the layout has been removed from the nfs_client list
by the intervening CB_LAYOUTRECALL and will not be added back.  Thus
the newly acquired lseg will not be properly returned in the event of
a subsequent CB_LAYOUTRECALL.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:39 -05:00
Chuck Lever
53d4737580 NFS: NFSROOT should default to "proto=udp"
There have been a number of recent reports that NFSROOT is no longer
working with default mount options, but fails only with certain NICs.

Brian Downing <bdowning@lavos.net> bisected to commit 56463e50 "NFS:
Use super.c for NFSROOT mount option parsing".  Among other things,
this commit changes the default mount options for NFSROOT to use TCP
instead of UDP as the underlying transport.

TCP seems less able to deal with NICs that are slow to initialize.
The system logs that have accompanied reports of problems all show
that NFSROOT attempts to establish a TCP connection before the NIC is
fully initialized, and thus the TCP connection attempt fails.

When a TCP connection attempt fails during a mount operation, the
NFS stack needs to fail the operation.  Usually user space knows how
and when to retry it.  The network layer does not report a distinct
error code for this particular failure mode.  Thus, there isn't a
clean way for the RPC client to see that it needs to retry in this
case, but not in others.

Because NFSROOT is used in some environments where it is not possible
to update the kernel command line to specify "udp", the proper thing
to do is change NFSROOT to use UDP by default, as it did before commit
56463e50.

To make it easier to see how to change default mount options for
NFSROOT and to distinguish default settings from mandatory settings,
I've adjusted a couple of areas to document the specifics.

root_nfs_cat() is also modified to deal with commas properly when
concatenating strings containing mount option lists.  This keeps
root_nfs_cat() call sites simpler, now that we may be concatenating
multiple mount option strings.

Tested-by: Brian Downing <bdowning@lavos.net>
Tested-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@kernel.org> # 2.6.37
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:38:07 -05:00
Huang Weiyi
57df216bd8 nfs4: remove duplicated #include
Remove duplicated #include('s) in
  fs/nfs/nfs4proc.c

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:18:37 -05:00
Trond Myklebust
f9feab1e18 NFSv4: nfs4_state_mark_reclaim_nograce() should be static
There are no more external users of nfs4_state_mark_reclaim_nograce() or
nfs4_state_mark_reclaim_reboot(), so mark them as static.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:18:36 -05:00
Trond Myklebust
ecac799a5e NFSv4: Fix the setlk error handler
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:18:36 -05:00
Trond Myklebust
b4410c2f7f NFSv4.1: Fix the handling of the SEQUENCE status bits
We want SEQUENCE status bits to be handled by the state manager in order
to avoid threading issues.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:18:35 -05:00
Trond Myklebust
0400a6b0cb NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses
nfs4_schedule_state_recovery() should only be used when we need to force
the state manager to check the lease. If we just want to start the
state manager in order to handle a state recovery situation, we should be
using nfs4_schedule_state_manager().

This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing
its use with a set of helper functions that do the right thing.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-11 15:18:22 -05:00
Andy Adamson
c34c32ea97 NFSv4.1 reclaim complete must wait for completion
Signed-off-by: Andy Adamson <andros@netapp.com>
[Trond: fix whitespace errors]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10 15:05:01 -05:00
Andy Adamson
114f64b5f2 NFSv4: remove duplicate clientid in struct nfs_client
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10 15:05:00 -05:00
Ricardo Labiaga
7d6d63d642 NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY
Fix bug where we currently retry the EXCHANGEID call again, eventhough
we already have a valid clientid.  Instead, delay and retry the CREATE_SESSION
call.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10 15:04:59 -05:00
Frank Filz
3fa0b4e201 (try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid
The problem was use of an int32, which when converted to a uint64
is sign extended resulting in a fileid that doesn't fit in 32 bits
even though the intent of the function is to fit the fileid into
32 bits.

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
[Trond: Added an include for compat.h]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10 15:04:58 -05:00
Jovi Zhang
43b7c3f051 nfs: fix compilation warning
this commit fix compilation warning as following:
linux-2.6/fs/nfs/nfs4proc.c:3265: warning: comparison of distinct pointer types lacks a cast

Signed-off-by: Jovi Zhang <bookjovi@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10 15:04:56 -05:00
Stanislav Fomichev
b9f810570d nfs: add kmalloc return value check in decode_and_add_ds
add kmalloc return value check in decode_and_add_ds

Signed-off-by: Stanislav Fomichev <kernel@fomichev.me>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10 15:04:55 -05:00
Jeff Layton
d2224e7afb nfs: close NFSv4 COMMIT vs. CLOSE race
I've been adding in more artificial delays in the NFSv4 commit and close
codepaths to uncover races. The kernel I'm testing has the patch to
close the race in __rpc_wait_for_completion_task that's in Trond's
cthon2011 branch. The reproducer I've been using does this in a loop:

	mkdir("DIR");
	fd = open("DIR/FILE", O_WRONLY|O_CREAT|O_EXCL, 0644);
	write(fd, "abcdefg", 7);
	close(fd);
	unlink("DIR/FILE");
	rmdir("DIR");

The above reproducer shouldn't result in any silly-renaming. However,
when I add a "msleep(100)" just after the nfs_commit_clear_lock call in
nfs_commit_release, I can almost always force one to occur. If I can
force it to occur with that, then it can happen without that delay
given the right timing.

nfs_commit_inode waits for the NFS_INO_COMMIT bit to clear when called
with FLUSH_SYNC set. nfs_commit_rpcsetup on the other hand does not wait
for the task to complete before putting its reference to it, so the last
reference get put in rpc_release task and gets queued to a workqueue.

In this situation, the last open context reference may be put by the
COMMIT release instead of the close() syscall. The close() syscall
returns too quickly and the unlink runs while the d_count is still
high since the COMMIT release hasn't put its dentry reference yet.

Fix this by having rpc_commit_rpcsetup wait for the RPC call to complete
before putting the task reference when FLUSH_SYNC is set. With this, the
last reference is put by the process that's initiating the FLUSH_SYNC
commit and the race is closed.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10 15:04:53 -05:00
Trond Myklebust
bf294b41ce SUNRPC: Close a race in __rpc_wait_for_completion_task()
Although they run as rpciod background tasks, under normal operation
(i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck()
and nfs4_do_close() want to be fully synchronous. This means that when we
exit, we want all references to the rpc_task to be gone, and we want
any dentry references etc. held by that task to be released.

For this reason these functions call __rpc_wait_for_completion_task(),
followed by rpc_put_task() in the expectation that the latter will be
releasing the last reference to the rpc_task, and thus ensuring that the
callback_ops->rpc_release() has been called synchronously.

This patch fixes a race which exists due to the fact that
rpciod calls rpc_complete_task() (in order to wake up the callers of
__rpc_wait_for_completion_task()) and then subsequently calls
rpc_put_task() without ensuring that these two steps are done atomically.

In order to avoid adding new spin locks, the patch uses the existing
waitqueue spin lock to order the rpc_task reference count releases between
the waiting process and rpciod.
The common case where nobody is waiting for completion is optimised for by
checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task
reference count is 1: in those cases we drop trying to grab the spin lock,
and immediately free up the rpc_task.

Those few processes that need to put the rpc_task from inside an
asynchronous context and that do not care about ordering are given a new
helper: rpc_put_task_async().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10 15:04:52 -05:00
Neil Horman
e9e3d724e2 nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3)
The "bad_page()" page allocator sanity check was reported recently (call
chain as follows):

  bad_page+0x69/0x91
  free_hot_cold_page+0x81/0x144
  skb_release_data+0x5f/0x98
  __kfree_skb+0x11/0x1a
  tcp_ack+0x6a3/0x1868
  tcp_rcv_established+0x7a6/0x8b9
  tcp_v4_do_rcv+0x2a/0x2fa
  tcp_v4_rcv+0x9a2/0x9f6
  do_timer+0x2df/0x52c
  ip_local_deliver+0x19d/0x263
  ip_rcv+0x539/0x57c
  netif_receive_skb+0x470/0x49f
  :virtio_net:virtnet_poll+0x46b/0x5c5
  net_rx_action+0xac/0x1b3
  __do_softirq+0x89/0x133
  call_softirq+0x1c/0x28
  do_softirq+0x2c/0x7d
  do_IRQ+0xec/0xf5
  default_idle+0x0/0x50
  ret_from_intr+0x0/0xa
  default_idle+0x29/0x50
  cpu_idle+0x95/0xb8
  start_kernel+0x220/0x225
  _sinittext+0x22f/0x236

It occurs because an skb with a fraglist was freed from the tcp
retransmit queue when it was acked, but a page on that fraglist had
PG_Slab set (indicating it was allocated from the Slab allocator (which
means the free path above can't safely free it via put_page.

We tracked this back to an nfsv4 setacl operation, in which the nfs code
attempted to fill convert the passed in buffer to an array of pages in
__nfs4_proc_set_acl, which gets used by the skb->frags list in
xs_sendpages.  __nfs4_proc_set_acl just converts each page in the buffer
to a page struct via virt_to_page, but the vfs allocates the buffer via
kmalloc, meaning the PG_slab bit is set.  We can't create a buffer with
kmalloc and free it later in the tcp ack path with put_page, so we need
to either:

1) ensure that when we create the list of pages, no page struct has
   PG_Slab set

 or

2) not use a page list to send this data

Given that these buffers can be multiple pages and arbitrarily sized, I
think (1) is the right way to go.  I've written the below patch to
allocate a page from the buddy allocator directly and copy the data over
to it.  This ensures that we have a put_page free-able page for every
entry that winds up on an skb frag list, so it can be safely freed when
the frame is acked.  We do a put page on each entry after the
rpc_call_sync call so as to drop our own reference count to the page,
leaving only the ref count taken by tcp_sendpages.  This way the data
will be properly freed when the ack comes in

Successfully tested by myself to solve the above oops.

Note, as this is the result of a setacl operation that exceeded a page
of data, I think this amounts to a local DOS triggerable by an
uprivlidged user, so I'm CCing security on this as well.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
CC: security@kernel.org
CC: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-04 17:28:52 -08:00
Tejun Heo
43d133c18b Merge branch 'master' into for-2.6.39 2011-02-21 09:43:56 +01:00
Chuck Lever
d1205f87bb NFS: NFSv4 readdir loses entries
On recent 2.6.38-rc kernels, connectathon basic test 6 fails on
NFSv4 mounts of OpenSolaris with something like:

> ./test6: readdir
> 	./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.12' dir entry, pass 0
> 	./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.82' dir entry, pass 0
> 	./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.164' dir entry, pass 0
> 	./test6: (/mnt/klimt/matisse.test) Test failed with 3 errors
> basic tests failed
> Tests failed, leaving /mnt/klimt mounted
> [cel@matisse cthon04]$

I narrowed the problem down to nfs4_decode_dirent() reporting that the
decode buffer had overflowed while decoding the entries for those
missing files.

verify_attr_len() assumes both it's pointer arguments reside on the
same page.  When these arguments point to locations on two different
pages, verify_attr_len() can report false errors.  This can happen now
that a large NFSv4 readdir result can span pages.

We have reasonably good checking in nfs4_decode_dirent() anyway, so
it should be safe to simply remove the extra checking.

At a guess, this was introduced by commit 6650239a, "NFS: Don't use
vm_map_ram() in readdir".

Cc: stable@kernel.org [2.6.37]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-28 13:41:35 -05:00
Chuck Lever
c08e76d0cd NFS: Micro-optimize nfs4_decode_dirent()
Make the decoding of NFSv4 directory entries slightly more efficient
by:

  1.  Avoiding unnecessary byte swapping when checking XDR booleans,
      and

  2.  Not bumping "p" when its value will be immediately replaced by
      xdr_inline_decode()

This commit makes nfs4_decode_dirent() consistent with similar logic
in the other two decode_dirent() functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-28 13:37:35 -05:00
Trond Myklebust
e00b8a2404 NFS: Fix an NFS client lockdep issue
There is no reason to be freeing the delegation cred in the rcu callback,
and doing so is resulting in a lockdep complaint that rpc_credcache_lock
is being called from both softirq and non-softirq contexts.

Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2011-01-28 13:37:09 -05:00
Andy Adamson
c7a360b05b NFS construct consistent co_ownerid for v4.1
As stated in section 2.4 of RFC 5661, subsequent instances of the client need
to present the same co_ownerid. Concatinate the client's IP dot address,
host name, and the rpc_auth pseudoflavor to form the co_ownerid.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-25 22:49:14 -05:00
Trond Myklebust
27dc1cd3ad NFS: nfs_wcc_update_inode() should set nfsi->attr_gencount
If the call to nfs_wcc_update_inode() results in an attribute update, we
need to ensure that the inode's attr_gencount gets bumped too, otherwise
we are not protected against races with other GETATTR calls.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-25 15:28:21 -05:00
Andy Adamson
b2a2897dc4 NFS improve pnfs_put_deviceid_cache debug print
What we really want to know is the ref count.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-25 15:26:51 -05:00
Andy Adamson
2c4cdf8f6d NFS fix cb_sequence error processing
Always assign the cb_process_state nfs_client pointer so a processing error
in cb_sequence after the nfs_client is found and referenced returns
a non-NULL cb_process_state nfs_client and the matching nfs_put_client in
nfs4_callback_compound dereferences the client.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-25 15:26:51 -05:00
Andy Adamson
778be232a2 NFS do not find client in NFSv4 pg_authenticate
The information required to find the nfs_client cooresponding to the incoming
back channel request is contained in the NFS layer. Perform minimal checking
in the RPC layer pg_authenticate method, and push more detailed checking into
the NFS layer where the nfs_client can be found.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-25 15:26:51 -05:00
Chuck Lever
f61f6da0d5 NFS: Prevent memory allocation failure in nfsacl_encode()
nfsacl_encode() allocates memory in certain cases.  This of course
is not guaranteed to work.

Since commit 9f06c719 "SUNRPC: New xdr_streams XDR encoder API", the
kernel's XDR encoders can't return a result indicating possibly a
failure, so a memory allocation failure in nfsacl_encode() has become
fatal (ie, the XDR code Oopses) in some cases.

However, the allocated memory is a tiny fixed amount, on the order
of 40-50 bytes.  We can easily use a stack-allocated buffer for
this, with only a wee bit of nose-holding.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-25 15:24:47 -05:00
Chuck Lever
ee5dc7732b NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!"
Milan Broz <mbroz@redhat.com> reports:

> on today Linus' tree I get OOps if using nfs.
>
> server (2.6.36) exports dir:
> /dir   172.16.1.0/24(rw,async,all_squash,no_subtree_check,anonuid=500,anongid=500)
>
> on client it is mounted  in fstab
> server:/dir  /mnt/tst  nfs  rw,soft 0 0
>
> and these commands OOpses it (simplified from a configure script):
>
> cd /dir
> touch x
> install x y
>
> [  105.327701] ------------[ cut here ]------------
> [  105.327979] kernel BUG at fs/nfs/nfs3xdr.c:1338!
> [  105.328075] invalid opcode: 0000 [#1] PREEMPT SMP
> [  105.328223] last sysfs file: /sys/devices/virtual/bdi/0:16/uevent
> [  105.328349] Modules linked in: usbcore dm_mod
> [  105.328553]
> [  105.328678] Pid: 3710, comm: install Not tainted 2.6.37+ #423 440BX Desktop Reference Platform/VMware Virtual Platform
> [  105.328853] EIP: 0060:[<c116c06c>] EFLAGS: 00010282 CPU: 0
> [  105.329152] EIP is at nfs3_xdr_enc_setacl3args+0x61/0x98
> [  105.329249] EAX: ffffffea EBX: ce941d98 ECX: 00000000 EDX: 00000004
> [  105.329340] ESI: ce941cd0 EDI: 000000a4 EBP: ce941cc0 ESP: ce941cb4
> [  105.329431]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [  105.329525] Process install (pid: 3710, ti=ce940000 task=ced36f20 task.ti=ce940000)
> [  105.336600] Stack:
> [  105.336693]  ce941cd0 ce9dc000 00000000 ce941cf8 c12ecd02 c12f43e0 c116c00b cf754158
> [  105.336982]  ce9dc004 cf754284 ce9dc004 cf7ffee8 ceff9978 ce9dc000 cf7ffee8 ce9dc000
> [  105.337182]  ce9dc000 ce941d14 c12e698d cf75412c ce941d98 cf7ffee8 cf7fff20 00000000
> [  105.337405] Call Trace:
> [  105.337695]  [<c12ecd02>] rpcauth_wrap_req+0x75/0x7f
> [  105.337806]  [<c12f43e0>] ? xdr_encode_opaque+0x12/0x15
> [  105.337898]  [<c116c00b>] ? nfs3_xdr_enc_setacl3args+0x0/0x98
> [  105.337988]  [<c12e698d>] call_transmit+0x17e/0x1e8
> [  105.338072]  [<c12ec307>] __rpc_execute+0x6d/0x1a6
> [  105.338155]  [<c12ec474>] rpc_execute+0x34/0x37
> [  105.338235]  [<c12e738d>] rpc_run_task+0xb5/0xbd
> [  105.338316]  [<c12e7474>] rpc_call_sync+0x3d/0x58
> [  105.338402]  [<c116d0c6>] nfs3_proc_setacls+0x18e/0x24f
> [  105.338493]  [<c10b3f76>] ? __kmalloc+0x148/0x1c4
> [  105.338579]  [<c10ecd01>] ? posix_acl_alloc+0x12/0x22
> [  105.338665]  [<c116d5c8>] nfs3_proc_setacl+0xa0/0xca
> [  105.338748]  [<c116d69c>] nfs3_setxattr+0x62/0x88
> [  105.338834]  [<c1317042>] ? sub_preempt_count+0x7c/0x89
> [  105.338926]  [<c116d63a>] ? nfs3_setxattr+0x0/0x88
> [  105.339026]  [<c10cfa79>] __vfs_setxattr_noperm+0x26/0x95
> [  105.339114]  [<c10cfb43>] vfs_setxattr+0x5b/0x76
> [  105.339211]  [<c10cfbfb>] setxattr+0x9d/0xc3
> [  105.339298]  [<c10a2ea8>] ? handle_pte_fault+0x258/0x5cb
> [  105.339428]  [<c1091ff6>] ? __free_pages+0x1a/0x23
> [  105.339517]  [<c10498ea>] ? up_read+0x16/0x2c
> [  105.339599]  [<c10b8365>] ? fget+0x0/0xa3
> [  105.339677]  [<c10b8365>] ? fget+0x0/0xa3
> [  105.339760]  [<c1025d23>] ? get_parent_ip+0xb/0x31
> [  105.339843]  [<c1317042>] ? sub_preempt_count+0x7c/0x89
> [  105.339931]  [<c10cfc72>] sys_fsetxattr+0x51/0x79
> [  105.340014]  [<c1002853>] sysenter_do_call+0x12/0x32
> [  105.340133] Code: 2e 76 18 00 58 31 d2 8b 7f 28 f6 43 04 01 74 03 8b 53 08 6a 00 8b 46 04 6a 01 8b 0b 52 89 fa e8 85 10 f8 ff 83 c4 0c 85 c0 79 04 <0f> 0b eb fe 31 c9 f6 43 04 04 74 03 8b 4b 0c 68 00 10 00 00 8d
> [  105.350321] EIP: [<c116c06c>] nfs3_xdr_enc_setacl3args+0x61/0x98 SS:ESP 0068:ce941cb4
> [  105.364385] ---[ end trace 01fcfe7f0f7f6e4a ]---

nfs3_xdr_enc_setacl3args() is not properly setting up the target
buffer before nfsacl_encode() attempts to encode the ACL.

Introduced by commit d9c407b1 "NFS: Introduce new-style XDR encoding
functions for NFSv3."

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-25 15:24:47 -05:00
Chuck Lever
839f7ad693 NFS: Fix "kernel BUG at fs/aio.c:554!"
Nick Piggin reports:

> I'm getting use after frees in aio code in NFS
>
> [ 2703.396766] Call Trace:
> [ 2703.396858]  [<ffffffff8100b057>] ? native_sched_clock+0x27/0x80
> [ 2703.396959]  [<ffffffff8108509e>] ? put_lock_stats+0xe/0x40
> [ 2703.397058]  [<ffffffff81088348>] ? lock_release_holdtime+0xa8/0x140
> [ 2703.397159]  [<ffffffff8108a2a5>] lock_acquire+0x95/0x1b0
> [ 2703.397260]  [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
> [ 2703.397361]  [<ffffffff81039701>] ? get_parent_ip+0x11/0x50
> [ 2703.397464]  [<ffffffff81612a31>] _raw_spin_lock_irq+0x41/0x80
> [ 2703.397564]  [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
> [ 2703.397662]  [<ffffffff811627db>] aio_put_req+0x2b/0x60
> [ 2703.397761]  [<ffffffff811647fe>] do_io_submit+0x2be/0x7c0
> [ 2703.397895]  [<ffffffff81164d0b>] sys_io_submit+0xb/0x10
> [ 2703.397995]  [<ffffffff8100307b>] system_call_fastpath+0x16/0x1b
>
> Adding some tracing, it is due to nfs completing the request then
> returning something other than -EIOCBQUEUED, so aio.c
> also completes the request.

To address this, prevent the NFS direct I/O engine from completing
async iocbs when the forward path returns an error without starting
any I/O.

This fix appears to survive ^C during both "xfstest no. 208" and "fsx
-Z."

It's likely this bug has existed for a very long while, as we are seeing
very similar symptoms in OEL 5.  Copying stable.

Cc: Stable <stable@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-25 15:24:47 -05:00
Jesper Juhl
ad3d2eedf0 NFS4: Avoid potential NULL pointer dereference in decode_and_add_ds().
On Mon, 17 Jan 2011, Mi Jinlong wrote:

>
>
> Jesper Juhl:
> > strrchr() can return NULL if nothing is found. If this happens we'll
> > dereference a NULL pointer in
> > fs/nfs/nfs4filelayoutdev.c::decode_and_add_ds().
> >
> > I tried to find some other code that guarantees that this can never
> > happen but I was unsuccessful. So, unless someone else can point to some
> > code that ensures this can never be a problem, I believe this patch is
> > needed.
> >
> > While I was changing this code I also noticed that all the dprintk()
> > statements, except one, start with "%s:". The one missing the ":" I added
> > it to.
>
>   Maybe another one also should be changed at decode_and_add_ds() at line 243:
>
>    243  printk("%s Decoded address and port %s\n", __func__, buf);
>
Missed that one. Thanks.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-25 15:24:46 -05:00
Tejun Heo
ada609ee2a workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER
WQ_RESCUER is now an internal flag and should only be used in the
workqueue implementation proper.  Use WQ_MEM_RECLAIM instead.

This doesn't introduce any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: dm-devel@redhat.com
Cc: Neil Brown <neilb@suse.de>
2011-01-25 14:35:54 +01:00
Fred Isaman
0da2a4ac33 NFS: fix handling of malloc failure during nfs_flush_multi()
Cleanup of the allocated list entries should not call
put_nfs_open_context() on each entry, as the context will
always be NULL, causing an oops.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-19 15:37:49 -05:00
David Howells
ea5b778a8b Unexport do_add_mount() and add in follow_automount(), not ->d_automount()
Unexport do_add_mount() and make ->d_automount() return the vfsmount to be
added rather than calling do_add_mount() itself.  follow_automount() will then
do the addition.

This slightly complicates things as ->d_automount() normally wants to add the
new vfsmount to an expiration list and start an expiration timer.  The problem
with that is that the vfsmount will be deleted if it has a refcount of 1 and
the timer will not repeat if the expiration list is empty.

To this end, we require the vfsmount to be returned from d_automount() with a
refcount of (at least) 2.  One of these refs will be dropped unconditionally.
In addition, follow_automount() must get a 3rd ref around the call to
do_add_mount() lest it eat a ref and return an error, leaving the mount we
have open to being expired as we would otherwise have only 1 ref on it.

d_automount() should also add the the vfsmount to the expiration list (by
calling mnt_set_expiry()) and start the expiration timer before returning, if
this mechanism is to be used.  The vfsmount will be unlinked from the
expiration list by follow_automount() if do_add_mount() fails.

This patch also fixes the call to do_add_mount() for AFS to propagate the mount
flags from the parent vfsmount.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:48 -05:00
David Howells
36d43a4376 NFS: Use d_automount() rather than abusing follow_link()
Make NFS use the new d_automount() dentry operation rather than abusing
follow_link() on directories.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:34 -05:00
David Howells
cc53ce53c8 Add a dentry op to allow processes to be held during pathwalk transit
Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
sleep when it tries to transit away from one of that filesystem's directories
during a pathwalk.  The operation is keyed off a new dentry flag
(DCACHE_MANAGE_TRANSIT).

The filesystem is allowed to be selective about which processes it holds and
which it permits to continue on or prohibits from transiting from each flagged
directory.  This will allow autofs to hold up client processes whilst letting
its userspace daemon through to maintain the directory or the stuff behind it
or mounted upon it.

The ->d_manage() dentry operation:

	int (*d_manage)(struct path *path, bool mounting_here);

takes a pointer to the directory about to be transited away from and a flag
indicating whether the transit is undertaken by do_add_mount() or
do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.

It should return 0 if successful and to let the process continue on its way;
-EISDIR to prohibit the caller from skipping to overmounted filesystems or
automounting, and to use this directory; or some other error code to return to
the user.

->d_manage() is called with namespace_sem writelocked if mounting_here is true
and no other locks held, so it may sleep.  However, if mounting_here is true,
it may not initiate or wait for a mount or unmount upon the parameter
directory, even if the act is actually performed by userspace.

Within fs/namei.c, follow_managed() is extended to check with d_manage() first
on each managed directory, before transiting away from it or attempting to
automount upon it.

follow_down() is renamed follow_down_one() and should only be used where the
filesystem deliberately intends to avoid management steps (e.g. autofs).

A new follow_down() is added that incorporates the loop done by all other
callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
and CIFS do use it, their use is removed by converting them to use
d_automount()).  The new follow_down() calls d_manage() as appropriate.  It
also takes an extra parameter to indicate if it is being called from mount code
(with namespace_sem writelocked) which it passes to d_manage().  follow_down()
ignores automount points so that it can be used to mount on them.

__follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
sleep.  It would be possible to enter d_manage() in rcu-walk mode too, and have
that determine whether to abort or not itself.  That would allow the autofs
daemon to continue on in rcu-walk mode.

Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
required as every tranist from that directory will cause d_manage() to be
invoked.  It can always be set again when necessary.

==========================
WHAT THIS MEANS FOR AUTOFS
==========================

Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
trigger the automounting of indirect mounts, and both of these can be called
with i_mutex held.

autofs knows that the i_mutex will be held by the caller in lookup(), and so
can drop it before invoking the daemon - but this isn't so for d_revalidate(),
since the lock is only held on _some_ of the code paths that call it.  This
means that autofs can't risk dropping i_mutex from its d_revalidate() function
before it calls the daemon.

The bug could manifest itself as, for example, a process that's trying to
validate an automount dentry that gets made to wait because that dentry is
expired and needs cleaning up:

	mkdir         S ffffffff8014e05a     0 32580  24956
	Call Trace:
	 [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897
	 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58
	 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e
	 [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b
	 [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149
	 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f
	 [<ffffffff80057a2f>] lookup_create+0x46/0x80
	 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4

versus the automount daemon which wants to remove that dentry, but can't
because the normal process is holding the i_mutex lock:

	automount     D ffffffff8014e05a     0 32581      1              32561
	Call Trace:
	 [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b
	 [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1
	 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14
	 [<ffffffff800e6d55>] do_rmdir+0x77/0xde
	 [<ffffffff8005d229>] tracesys+0x71/0xe0
	 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

which means that the system is deadlocked.

This patch allows autofs to hold up normal processes whilst the daemon goes
ahead and does things to the dentry tree behind the automouter point without
risking a deadlock as almost no locks are held in d_manage() and none in
d_automount().

Signed-off-by: David Howells <dhowells@redhat.com>
Was-Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-15 20:07:31 -05:00
Linus Torvalds
db9effe99a Merge branch 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin
* 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin:
  fs: fix do_last error case when need_reval_dot
  nfs: add missing rcu-walk check
  fs: hlist UP debug fixup
  fs: fix dropping of rcu-walk from force_reval_path
  fs: force_reval_path drop rcu-walk before d_invalidate
  fs: small rcu-walk documentation fixes

Fixed up trivial conflicts in Documentation/filesystems/porting
2011-01-13 20:14:13 -08:00
Nick Piggin
657e94b673 nfs: add missing rcu-walk check
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-14 02:48:39 +00:00
Trond Myklebust
8a0eebf66e NFS: Fix NFSv3 exclusive open semantics
Commit c0204fd2b8 (NFS: Clean up
nfs4_proc_create()) broke NFSv3 exclusive open by removing the code
that passes the O_EXCL flag down to nfs3_proc_create(). This patch
reverts that offending hunk from the original commit.

Reported-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org    [2.6.37]
Tested-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 12:06:29 -08:00
Al Viro
8b244ff2fa switch nfs to ->s_d_op
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-01-12 20:02:45 -05:00
Linus Torvalds
b9d919a4ac Merge branch 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (89 commits)
  NFS fix the setting of exchange id flag
  NFS: Don't use vm_map_ram() in readdir
  NFSv4: Ensure continued open and lockowner name uniqueness
  NFS: Move cl_delegations to the nfs_server struct
  NFS: Introduce nfs_detach_delegations()
  NFS: Move cl_state_owners and related fields to the nfs_server struct
  NFS: Allow walking nfs_client.cl_superblocks list outside client.c
  pnfs: layout roc code
  pnfs: update nfs4_callback_recallany to handle layouts
  pnfs: add CB_LAYOUTRECALL handling
  pnfs: CB_LAYOUTRECALL xdr code
  pnfs: change lo refcounting to atomic_t
  pnfs: check that partial LAYOUTGET return is ignored
  pnfs: add layout to client list before sending rpc
  pnfs: serialize LAYOUTGET(openstateid)
  pnfs: layoutget rpc code cleanup
  pnfs: change how lsegs are removed from layout list
  pnfs: change layout state seqlock to a spinlock
  pnfs: add prefix to struct pnfs_layout_hdr fields
  pnfs: add prefix to struct pnfs_layout_segment fields
  ...
2011-01-11 15:11:56 -08:00
Andy Adamson
357f54d6b3 NFS fix the setting of exchange id flag
Indicate support for referrals. Do not set any PNFS roles. Check the flags
returned by the server for validity. Do not use exchange flags from an old
client ID instance when recovering a client ID.

Update the EXCHID4_FLAG_XXX set to RFC 5661.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-11 14:17:09 -05:00
Trond Myklebust
68c404b18f Merge branch 'bugfixes' into nfs-for-2.6.38
Conflicts:
	fs/nfs/nfs2xdr.c
	fs/nfs/nfs3xdr.c
	fs/nfs/nfs4xdr.c
2011-01-10 14:48:02 -05:00
Trond Myklebust
6650239a4b NFS: Don't use vm_map_ram() in readdir
vm_map_ram() is not available on NOMMU platforms, and causes trouble
on incoherrent architectures such as ARM when we access the page data
through both the direct and the virtual mapping.

The alternative is to use the direct mapping to access page data
for the case when we are not crossing a page boundary, but to copy
the data into a linear scratch buffer when we are accessing data
that spans page boundaries.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: stable@kernel.org  [2.6.37]
2011-01-10 14:45:01 -05:00
Nick Piggin
873feea09e fs: dcache per-inode inode alias locking
dcache_inode_lock can be replaced with per-inode locking. Use existing
inode->i_lock for this. This is slightly non-trivial because we sometimes
need to find the inode from the dentry, which requires d_inode to be
stabilised (either with refcount or d_lock).

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:31 +11:00
Nick Piggin
b74c79e993 fs: provide rcu-walk aware permission i_ops
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:29 +11:00
Nick Piggin
34286d6662 fs: rcu-walk aware d_revalidate method
Require filesystems be aware of .d_revalidate being called in rcu-walk
mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
-ECHILD from all implementations.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:29 +11:00
Nick Piggin
fb045adb99 fs: dcache reduce branches in lookup path
Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:28 +11:00
Nick Piggin
fa0d7e3de6 fs: icache RCU free inodes
RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:26 +11:00
Nick Piggin
b5c84bf6f6 fs: dcache remove dcache_lock
dcache_lock no longer protects anything. remove it.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:23 +11:00
Nick Piggin
949854d024 fs: Use rename lock and RCU for multi-step operations
The remaining usages for dcache_lock is to allow atomic, multi-step read-side
operations over the directory tree by excluding modifications to the tree.
Also, to walk in the leaf->root direction in the tree where we don't have
a natural d_lock ordering.

This could be accomplished by taking every d_lock, but this would mean a
huge number of locks and actually gets very tricky.

Solve this instead by using the rename seqlock for multi-step read-side
operations, retry in case of a rename so we don't walk up the wrong parent.
Concurrent dentry insertions are not serialised against.  Concurrent deletes
are tricky when walking up the directory: our parent might have been deleted
when dropping locks so also need to check and retry for that.

We can also use the rename lock in cases where livelock is a worry (and it
is introduced in subsequent patch).

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:22 +11:00
Nick Piggin
b23fb0a603 fs: scale inode alias list
Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list
from concurrent modification. d_alias is also protected by d_lock.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:22 +11:00
Nick Piggin
b7ab39f631 fs: dcache scale dentry refcount
Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
we start protecting many other dentry members with d_lock.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:21 +11:00
Nick Piggin
fe15ce446b fs: change d_delete semantics
Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:18 +11:00
Trond Myklebust
d035c36c58 NFSv4: Ensure continued open and lockowner name uniqueness
In order to enable migration support, we will want to move some of the
structures that are subject to migration into the struct nfs_server.
In particular, if we are to move the state_owner and state_owner_id to
being a per-filesystem structure, then we should label the resulting
open/lock owners with a per-filesytem label to ensure global uniqueness.

This patch does so by adding the super block s_dev to the open/lock owner
name.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 16:03:13 -05:00
Chuck Lever
d3978bb325 NFS: Move cl_delegations to the nfs_server struct
Delegations are per-inode, not per-nfs_client.  When a server file
system is migrated, delegations on the client must be moved from the
source to the destination nfs_server.  Make it easier to manage a
mount point's delegation list across a migration event by moving the
list to the nfs_server struct.

Clean up: I added documenting comments to public functions I changed
in this patch.  For consistency I added comments to all the other
public functions in fs/nfs/delegation.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:57:46 -05:00
Chuck Lever
dda4b22562 NFS: Introduce nfs_detach_delegations()
Clean up:  Refactor code that takes clp->cl_lock and calls
nfs_detach_delegations_locked() into its own function.

While we're changing the call sites, get rid of the second parameter
and the logic in nfs_detach_delegations_locked() that uses it, since
callers always set that parameter of nfs_detach_delegations_locked()
to NULL.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:47:57 -05:00
Chuck Lever
24d292b894 NFS: Move cl_state_owners and related fields to the nfs_server struct
NFSv4 migration needs to reassociate state owners from the source to
the destination nfs_server data structures.  To make that easier, move
the cl_state_owners field to the nfs_server struct.  cl_openowner_id
and cl_lockowner_id accompany this move, as they are used in
conjunction with cl_state_owners.

The cl_lock field in the parent nfs_client continues to protect all
three of these fields.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:47:57 -05:00
Chuck Lever
fca5238ef3 NFS: Allow walking nfs_client.cl_superblocks list outside client.c
We're about to move some fields from struct nfs_client to struct
nfs_server.  There is a many-to-one relationship between nfs_servers
and nfs_clients.  After these fields are moved to the nfs_server
struct, to visit all of the data in these fields that is owned by one
nfs_client, code will need to visit each nfs_server on the
cl_superblocks list for that nfs_client.

To serialize changes to the cl_superblocks list during these little
expeditions, protect the list with RCU.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:47:56 -05:00
Fred Isaman
f7e8917a67 pnfs: layout roc code
A layout can request return-on-close.  How this interacts with the
forgetful model of never sending LAYOUTRETURNS is a bit ambiguous.
We forget any layouts marked roc, and wait for them to be completely
forgotten before continuing with the close.  In addition, to compensate
for races with any inflight LAYOUTGETs, and the fact that we do not get
any layout stateid back from the server, we set the barrier to the worst
case scenario of current_seqid + number of outstanding LAYOUTGETS.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:32 -05:00
Alexandros Batsakis
3684037084 pnfs: update nfs4_callback_recallany to handle layouts
While here, update the code a bit.

Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:32 -05:00
Fred Isaman
43f1b3da8b pnfs: add CB_LAYOUTRECALL handling
This is the heart of the wave 2 submission.  Add the code to trigger
drain and forget of any afected layouts.  In addition, we set a
"barrier", below which any LAYOUTGET reply is ignored.  This is to
compensate for the fact that we do not wait for outstanding LAYOUTGETs
to complete as per section 12.5.5.2.1 of RFC 5661.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:32 -05:00
Fred Isaman
f2a6256160 pnfs: CB_LAYOUTRECALL xdr code
This is the xdr decoding for CB_LAYOUTRECALL.

Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:32 -05:00
Fred Isaman
cc6e5340b0 pnfs: change lo refcounting to atomic_t
This will be required to allow us to grab reference outside of i_lock.
While we are at it, make put_layout_hdr take the same argument as all the
related functions.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:32 -05:00
Fred Isaman
fc1794c5b0 pnfs: check that partial LAYOUTGET return is ignored
Either a bad server reply, or our ignoring of multiple array segments in
a reply, can cause a reply to not meet our requirements.  Ensure
that we ignore such replies.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:32 -05:00
Fred Isaman
2130ff6636 pnfs: add layout to client list before sending rpc
Since this list will be used to search for layouts to recall,
this is necessary to avoid a race where the recall comes in,
sees there is nothing in the client list, and prepares to return
NOMATCHING, while the LAYOUTGET gets processed before the recall
updates the stateid.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:32 -05:00
Fred Isaman
cf7d63f1f9 pnfs: serialize LAYOUTGET(openstateid)
We shouldn't send a LAYOUTGET(openstateid) unless all outstanding RPCs
using the previous stateid are completed.  This requires choosing the
stateid to encode earlier, so we can abort if one is not available (we
want to use the open stateid, but a LAYOUTGET is already out using
it), and adding a count of the number of outstanding rpc calls using
layout state (which for now consist solely of LAYOUTGETs).

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:31 -05:00
Fred Isaman
c31663d4a1 pnfs: layoutget rpc code cleanup
No functional changes, just some code minor code rearrangement and
comments.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:31 -05:00
Fred Isaman
4541d16c02 pnfs: change how lsegs are removed from layout list
This is to prepare the way for sensible io draining.  Instead of just
removing the lseg from the list, we instead clear the VALID flag
(preventing new io from grabbing references to the lseg) and remove
the reference holding it in the list.  Thus the lseg will be removed
once any io in progress completes and any references still held are
dropped.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:31 -05:00
Fred Isaman
fd6002e9b8 pnfs: change layout state seqlock to a spinlock
This prepares for future changes, where the layout state needs
to change atomically with several other variables.  In particular,
it will need to know if lo->segs is empty, as we test that instead
of manipulating the NFS_LAYOUT_STATEID_SET bit.  Moreover, the
layoutstateid is not really a read-mostly structure, as it is
written almost as often as it is read.

The behavior of pnfs_get_layout_stateid is also slightly changed, so that
it no longer changes the stateid.  Its name is changed to +pnfs_choose_layoutget_stateid.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:31 -05:00
Fred Isaman
b7edfaa198 pnfs: add prefix to struct pnfs_layout_hdr fields
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:31 -05:00
Fred Isaman
566052c53b pnfs: add prefix to struct pnfs_layout_segment fields
While we are renaming all the fields, change lo->state to lo->plh_flags.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:31 -05:00
Fred Isaman
daaa82d1c7 pnfs: remove unnecessary field lgp->status
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:30 -05:00
Fred Isaman
52fabd7319 pnfs: fix incorrect comment in destroy_lseg
Comment references get_layout_hdr_locked, which never existed in
submitted code.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:30 -05:00
Andy Adamson
4a19de0f4b NFS rename client back channel transport field
Differentiate from server backchannel

Signed-off-by: Andy Adamson <andros@netapp.com>
Acked-by: Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:25 -05:00
Andy Adamson
42acd02182 NFS add session back channel draining
Currently session draining only drains the fore channel.
The back channel processing must also be drained.

Use the back channel highest_slot_used to indicate that a callback is being
processed by the callback thread.  Move the session complete to be per channel.

When the session is draininig, wait for any current back channel processing
to complete and stop all new back channel processing by returning NFS4ERR_DELAY
to the back channel client.

Drain the back channel, then the fore channel.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:25 -05:00
Andy Adamson
ece0de633c NFS RPC_AUTH_GSS unsupported on v4.1 back channel
Signed-off-by: Andy Adamson <andros@netapp.com>
Acked-by: Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:24 -05:00
Andy Adamson
c36fca52f5 NFS refactor nfs_find_client and reference client across callback processing
Fixes a bug where the nfs_client could be freed during callback processing.
Refactor nfs_find_client to use minorversion specific means to locate the
correct nfs_client structure.

In the NFS layer, V4.0 clients are found using the callback_ident field in the
CB_COMPOUND header.  V4.1 clients are found using the sessionID in the
CB_SEQUENCE operation which is also compared against the sessionID associated
with the back channel thread after a successful CREATE_SESSION.

Each of these methods finds the one an only nfs_client associated
with the incoming callback request - so nfs_find_client_next is not needed.

In the RPC layer, the pg_authenticate call needs to find the nfs_client. For
the v4.0 callback service, the callback identifier has not been decoded so a
search by address, version, and minorversion is used.  The sessionid for the
sessions based callback service has (usually) not been set for the
pg_authenticate on a CB_NULL call which can be sent prior to the return
of a CREATE_SESSION call, so the sessionid associated with the back channel
thread is not used to find the client in pg_authenticate for CB_NULL calls.

Pass the referenced nfs_client to each CB_COMPOUND operation being proceesed
via the new cb_process_state structure. The reference is held across
cb_compound processing.

Use the new cb_process_state struct to move the NFS4ERR_RETRY_UNCACHED_REP
processing from process_op into nfs4_callback_sequence where it belongs.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:24 -05:00
Andy Adamson
2c2618c6f2 NFS associate sessionid with callback connection
The sessions based callback service is started prior to the CREATE_SESSION call
so that it can handle CB_NULL requests which can be sent before the
CREATE_SESSION call returns and the session ID is known.

Set the callback sessionid after a sucessful CREATE_SESSION.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:24 -05:00
Andy Adamson
f4eecd5da3 NFS implement v4.0 callback_ident
Use the small id to pointer translator service to provide a unique callback
identifier per SETCLIENTID call used to identify the v4.0 callback service
associated with the clientid.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:24 -05:00
Andy Adamson
ea00528126 NFS do not clear minor version at nfs_client free
Resetting the client minor version operations causes nfs4_destroy_callback
to fail to shutdown the NFSv4.1 callback service.

There is no reason to reset the client minorversion operations when the
nfs_client struct is being freed.

Remove the minorverion reset and rename the function.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:24 -05:00
Andy Adamson
01c9a0bc60 NFS use svc_create_xprt for NFSv4.1 callback service
The new back channel transport means we call the normal creation routine as
well as svc_xprt_put.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-06 14:46:24 -05:00
Aneesh Kumar K.V
64c2ce8b72 nfsv4: Switch to generic xattr handling code
This patch make nfsv4 use the generic xattr handling code
to get the nfsv4 acl. This will help us to add richacl
support to nfsv4 in later patches

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-04 13:10:41 -05:00
Aneesh Kumar K.V
a8a5da996d nfs: Set MS_POSIXACL always
We want to skip VFS applying mode for NFS. So set MS_POSIXACL always
and selectively use umask. Ideally we would want to use umask only
when we don't have inheritable ACEs set. But NFS currently don't
allow to send umask to the server. So this is best what we can do
and this is consistent with NFSv3

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-04 13:10:40 -05:00
Namhyung Kim
bf0c84f161 NFS: use ERR_CAST()
Use ERR_CAST() intead of wierd-looking cast.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-04 13:10:39 -05:00
J. Bruce Fields
5f3e97c9ee nfs: fix mispelling of idmap CONFIG symbol
Trivial, but confusing when you're trying to grep through this
code....

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-04 13:10:39 -05:00
Jesper Juhl
878215feb8 NFS: Don't leak in nfs_proc_symlink()
Hi,

In fs/nfs/proc.c::nfs_proc_symlink() we will leak memory if either
nfs_alloc_fhandle() or nfs_alloc_fattr() returns NULL but the other one
doesn't.
This patch ensures memory allocated by one when the other fails is always
released (this is safe since nfs_free_fattr() and nfs_free_fhandle() both
call kfree which deals gracefully with NULL pointers).

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-01-04 13:10:36 -05:00
Trond Myklebust
1174dd1f89 NFSv4: Convert a few commas into semicolons...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-21 11:51:27 -05:00
Stanislav Kinsbursky
aa69947399 NFS: suppressing showing of default mount port value in /proc fixed
Update: added check for zero value as it was before (note: can't simply check
mountd_port for positive value because it's typeof unsigned short)

Default value for mount server port is set to NFS_UNSPEC_PORT (-1) and will not
be changed during parsing mount options for mound data version 6. This default
value will be showed for mountport in /proc/mounts always since current default
check is for zero value. This small mistake leads to big problem, because
during umount.nfs execution from old user-space utils (at least nfs-utils
1.0.9) this value will be used as the server port to connect to. This request
will be rejected (since port is 65535) and thus nfs mount point can't be
unmounted.

Note from Chuck Lever (chuck.lever@oracle.com): this is only possible if
/etc/mtab is a link to /proc/mounts.  Not all systems have this configuration.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-21 11:51:25 -05:00
J. Bruce Fields
611c96c8f7 nfs4: fix units bug causing hang on recovery
Note that cl_lease_time is in jiffies.  This can cause a very long wait
in the NFS4ERR_CLID_INUSE case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-21 11:51:24 -05:00
Jesper Juhl
72895b1ac7 nfs: Take advantage of kmem_cache_zalloc() in nfs_page_alloc()
Take advantage of kmem_cache_zalloc() in nfs_page_alloc(). Save a call to
memset() and a few bytes.

Before:
 [jj@dragon linux-2.6]$ size fs/nfs/pagelist.o
    text    data     bss     dec     hex filename
    1765       0       8    1773     6ed fs/nfs/pagelist.o
After:
 [jj@dragon linux-2.6]$ size fs/nfs/pagelist.o
    text    data     bss     dec     hex filename
    1749       0       8    1757     6dd fs/nfs/pagelist.o

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-21 11:51:24 -05:00
Tobias Klauser
c8b031ebc1 NFS: Remove redundant unlikely()
IS_ERR() already implies unlikely(), so it can be omitted here.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-21 11:51:23 -05:00
Chuck Lever
bf2695516d SUNRPC: New xdr_streams XDR decoder API
Now that all client-side XDR decoder routines use xdr_streams, there
should be no need to support the legacy calling sequence [rpc_rqst *,
__be32 *, RPC res *] anywhere.  We can construct an xdr_stream in the
generic RPC code, instead of in each decoder function.

This is a refactoring change.  It should not cause different behavior.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:25 -05:00
Chuck Lever
9f06c719f4 SUNRPC: New xdr_streams XDR encoder API
Now that all client-side XDR encoder routines use xdr_streams, there
should be no need to support the legacy calling sequence [rpc_rqst *,
__be32 *, RPC arg *] anywhere.  We can construct an xdr_stream in the
generic RPC code, instead of in each encoder function.

Also, all the client-side encoder functions return 0 now, making a
return value superfluous.  Take this opportunity to convert them to
return void instead.

This is a refactoring change.  It should not cause different behavior.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:25 -05:00
Chuck Lever
b43cd8c153 NFS: Remove unused UMNT response data structure
Clean up.

The UMNT request has a NULL response.  There's no need to set up a
mountres structure for it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:25 -05:00
Chuck Lever
98eb2b4f93 NFS: Avoid return code checking in mount XDR encoder functions
Clean up.

The trend in the other XDR encoder functions is to BUG() when encoding
problems occur, since a problem here is always due to a local coding
error.  Then, instead of a status, zero is unconditionally returned.

Update the mount client XDR encoders to behave this way.

To finish the update, use the new-style be32_to_cpup() and
cpu_to_be32() macros, and compute the buffer sizes using raw integers
instead of sizeof().  This matches the conventions used in other XDR
functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:25 -05:00
Chuck Lever
ead0059788 NFS: Squelch compiler warning in decode_getdeviceinfo()
Clean up.

.../linux/nfs-2.6/fs/nfs/nfs4xdr.c: In function ‘decode_getdeviceinfo’:
.../linux/nfs-2.6/fs/nfs/nfs4xdr.c:5008: warning: comparison between signed and unsigned integer expressions

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:24 -05:00
Chuck Lever
573c4e1ef5 NFS: Simplify ->decode_dirent() calling sequence
Clean up.

The pointer returned by ->decode_dirent() is no longer used as a
pointer.  The only call site (xdr_decode() in fs/nfs/dir.c) simply
extracts the errno value encoded in the pointer.  Replace the
returned pointer with a standard integer errno return value.

Also, pass the "server" argument as part of the nfs_entry instead of
as a separate parameter.  It's faster to derive "server" in
nfs_readdir_xdr_to_array() since we already have the directory's inode
handy.  "server" ought to be invariant for a set of entries in the
same directory, right?

The legacy versions of decode_dirent() don't use "server" anyway, so
it's wasted work for them to derive and pass "server" for each entry.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:24 -05:00
Chuck Lever
8111f37360 NFS: Fix hdrlen calculation in NFSv4's decode_read()
When computing the length of the header, be sure to include the
four octets consumed by "count".

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:24 -05:00
Chuck Lever
7d93bd71cb NFS: Repair whitespace damage in NFS PROC macro
Clean up.

When I was making other changes in this area, checkscript.pl
complained about the use of leading blanks in the PROC macros in the
xdr files.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:24 -05:00
Chuck Lever
f604870939 NFS: Move and update xdr_decode_foo() functions that we're keeping
Clean up.

Move the timestamp decoder to match the placement and naming
conventions of the other helpers.  Fold xdr_decode_fattr() into
decode_fattr3(), which is now it's only user.  Fold
xdr_decode_wcc_attr() into decode_wcc_attr(), which is now it's only
user.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:23 -05:00
Chuck Lever
b2cdd9c9c9 NFS: Remove unused old NFSv3 decoder functions
Clean up.  Remove unused legacy result decoder functions, and any
now unused decoder helper functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:23 -05:00
Chuck Lever
f5fc3c50c9 NFS: Switch in new NFSv3 decoder functions
The naming scheme of the new decoder functions, which follows the
NFSv4 XDR decoder functions, is slightly different than the scheme
used for the old functions.  Rename the functions as a separate
step to keep the patches clean.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:23 -05:00
Chuck Lever
e4f9323409 NFS: Introduce new-style XDR decoding functions for NFSv2
We'd like to prevent local buffer overflows caused by malicious or
broken servers.  New xdr_stream style decoders can do that.

For efficiency, we also eventually want to be able to pass xdr_streams
from call_decode() to all XDR decoding functions, rather than building
an xdr_stream in every XDR decoding function in the kernel.

Static helper functions are left without the "inline" directive.  This
allows the compiler to choose automatically how to optimize these for
size or speed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:23 -05:00
Chuck Lever
9d5a643439 NFS: Update xdr_encode_foo() functions that we're keeping
Clean up.  Move the timestamp and the sattr encoder to match the
placement convention of the other helpers, update their coding style,
and refresh their documenting comments.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:23 -05:00
Chuck Lever
499ff710b2 NFS: Remove unused old NFSv3 encoder functions
Clean up.  Remove unused legacy argument encoder functions, and any
now unused encoder helper functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:22 -05:00
Chuck Lever
ad96b5b5ea NFS: Replace old NFSv3 encoder functions with xdr_stream-based ones
The naming scheme of the new encoder functions, which follows the
NFSv4 XDR encoder functions, is slightly different than the scheme
used for the old functions.  Rename the functions as a separate
step to keep the patches clean.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:22 -05:00
Chuck Lever
d9c407b138 NFS: Introduce new-style XDR encoding functions for NFSv3
We're interested in taking advantage of the safety benefits of
xdr_streams.  These data structures allow more careful checking for
buffer overflow while encoding.  More careful type checking is also
introduced in the new functions.

For efficiency, we also eventually want to be able to pass xdr_streams
from call_encode() to all XDR encoding functions, rather than building
an xdr_stream in every XDR encoding function in the kernel.  To do
this means all encoders must be ready to handle a passed-in
xdr_stream.

The new encoders follow the modern paradigm for XDR encoders: BUG on
error, and always return a zero status code.

Static helper functions are left without the "inline" directive.  This
allows the compiler to choose automatically how to optimize these for
size or speed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:22 -05:00
Chuck Lever
5f96e5e31b NFS: Move and update xdr_decode_foo() functions that we're keeping
Clean up.

Move the timestamp decoder to match the placement and naming
conventions of the other helpers.  Fold xdr_decode_fattr() into
decode_fattr(), which is now it's only user.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:21 -05:00
Chuck Lever
661ad4239a NFS: Replace old NFSv2 decoder functions with xdr_stream-based ones
Clean up.  Remove unused legacy result decoder functions, and any
now unused decoder helper functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:21 -05:00
Chuck Lever
f796f8b3ae NFS: Introduce new-style XDR decoding functions for NFSv2
We'd like to prevent local buffer overflows caused by malicious or
broken servers.  New xdr_stream style decoders can do that.

For efficiency, we also eventually want to be able to pass xdr_streams
from call_decode() to all XDR decoding functions, rather than building
an xdr_stream in every XDR decoding function in the kernel.

nfs_decode_dirent() is renamed to follow the naming convention of the
other two dirent decoders.

Static helper functions are left without the "inline" directive.  This
allows the compiler to choose automatically how to optimize these for
size or speed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:21 -05:00
Chuck Lever
8582849324 NFS: Use the "nfs_stat" enum for nfs_stat_to_errno()'s argument
Clean up.

To distinguish more clearly between the on-the-wire NFSERR_ value and
our local errno values, use the proper type for the argument of
nfs_stat_to_errno().

Add a documenting comment appropriate for a global function shared
outside this source file.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:21 -05:00
Chuck Lever
282ac2a573 NFS: Update xdr_encode_foo() functions that we're keeping
Clean up.

The new helper functions are kept in order by section of RFC 1094.
Move the two timestamp encoders we're keeping, update their coding
style, and refresh their documenting comments.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:20 -05:00
Chuck Lever
2d70f533ea NFS: Remove old NFSv2 encoder functions
Clean up:  Remove unused legacy argument encoder functions, and any
now unused encoder helper functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:20 -05:00
Chuck Lever
25a0866cc6 NFS: Introduce new-style XDR encoding functions for NFSv2
We're interested in taking advantage of the safety benefits of
xdr_streams.  These data structures allow more careful checking for
buffer overflow while encoding.  More careful type checking is also
introduced in the new functions.

For efficiency, we also eventually want to be able to pass xdr_streams
from call_encode() to all XDR encoding functions, rather than building
an xdr_stream in every XDR encoding function in the kernel.  To do
this means all encoders must be ready to handle a passed-in
xdr_stream.

The new encoders follow the modern paradigm for XDR encoders: BUG on
any error, and always return a zero status code.

Static helper functions are left without the "inline" directive.  This
allows the compiler to choose automatically how to optimize these for
size or speed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-16 12:37:20 -05:00
Chuck Lever
5b362ac379 NFS: Fix panic after nfs_umount()
After a few unsuccessful NFS mount attempts in which the client and
server cannot agree on an authentication flavor both support, the
client panics.  nfs_umount() is invoked in the kernel in this case.

Turns out nfs_umount()'s UMNT RPC invocation causes the RPC client to
write off the end of the rpc_clnt's iostat array.  This is because the
mount client's nrprocs field is initialized with the count of defined
procedures (two: MNT and UMNT), rather than the size of the client's
proc array (four).

The fix is to use the same initialization technique used by most other
upper layer clients in the kernel.

Introduced by commit 0b524123, which failed to update nrprocs when
support was added for UMNT in the kernel.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=24302
BugLink: http://bugs.launchpad.net/bugs/683938

Reported-by: Stefan Bader <stefan.bader@canonical.com>
Tested-by: Stefan Bader <stefan.bader@canonical.com>
Cc: stable@kernel.org # >= 2.6.32
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-10 13:01:50 -05:00
Trond Myklebust
2df485a774 nfs: remove extraneous and problematic calls to nfs_clear_request
When a nfs_page is freed, nfs_free_request is called which also calls
nfs_clear_request to clean out the lock and open contexts and free the
pagecache page.

However, a couple of places in the nfs code call nfs_clear_request
themselves. What happens here if the refcount on the request is still high?
We'll be releasing contexts and freeing pointers while the request is
possibly still in use.

Remove those bare calls to nfs_clear_context. That should only be done when
the request is being freed.

Note that when doing this, we need to watch out for tests of req->wb_page.
Previously, nfs_set_page_tag_locked() and nfs_clear_page_tag_locked()
would check the value of req->wb_page to figure out if the page is mapped
into the nfsi->nfs_page_tree. We now indicate the page is mapped using
the new bit PG_MAPPED in req->wb_flags .

Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-07 23:02:44 -05:00
Mi Jinlong
0de1b7e800 nfs: kernel should return EPROTONOSUPPORT when not support NFSv4
When nfs client(kernel) don't support NFSv4, maybe user build
  kernel without NFSv4, there is a problem.

  Using command "mount SERVER-IP:/nfsv3 /mnt/" to mount NFSv3
  filesystem, mount should should success, but fail and get error:

    "mount.nfs: an incorrect mount option was specified"

  System call mount "nfs"(not "nfs4") with "vers=4",
  if CONFIG_NFS_V4 is not defined, the "vers=4" will be parsed
  as invalid argument and kernel return EINVAL to nfs-utils.

  About that, we really want get EPROTONOSUPPORT rather than
  EINVAL. This path make sure kernel parses argument success,
  and return EPROTONOSUPPORT at nfs_validate_mount_data().

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-07 19:30:44 -05:00
Sergey Vlasov
21ac19d484 NFS: Fix fcntl F_GETLK not reporting some conflicts
The commit 129a84de23 (locks: fix F_GETLK
regression (failure to find conflicts)) fixed the posix_test_lock()
function by itself, however, its usage in NFS changed by the commit
9d6a8c5c21 (locks: give posix_test_lock
same interface as ->lock) remained broken - subsequent NFS-specific
locking code received F_UNLCK instead of the user-specified lock type.
To fix the problem, fl->fl_type needs to be saved before the
posix_test_lock() call and restored if no local conflicts were reported.

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=23892
Tested-by: Alexander Morozov <amorozov@etersoft.ru>
Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Cc: <stable@kernel.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-07 19:30:43 -05:00
Aneesh Kumar K.V
08a22b392a nfs: Discard ACL cache on mode update
An update of mode bits can result in ACL value being changed. We need
to mark the acl cache invalid when we update mode. Similarly we need
to update file attribute when we change ACL value

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-07 19:30:42 -05:00
Trond Myklebust
47c716cbf6 NFS: Readdir cleanups
No functional changes, but clarify the code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-07 14:09:02 -05:00
Trond Myklebust
18fb5fe40c NFS: nfs_readdir_search_for_cookie() don't mark as eof if cookie not found
If we're searching for a specific cookie, and it isn't found in the page
cache, we should try an uncached_readdir(). To do so, we return EBADCOOKIE,
but we don't set desc->eof.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-07 12:41:58 -05:00
Trond Myklebust
11de3b11e0 NFS: Fix a memory leak in nfs_readdir
We need to ensure that the entries in the nfs_cache_array get cleared
when the page is removed from the page cache. To do so, we use the
freepage address_space operation.

Change nfs_readdir_clear_array to use kmap_atomic(), so that the
function can be safely called from all contexts.

Finally, modify the cache_page_release helper to call
nfs_readdir_clear_array directly, when dealing with an anonymous
page from 'uncached_readdir'.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-02 09:58:00 -05:00
Trond Myklebust
0aded708d1 NFS: Ensure we use the correct cookie in nfs_readdir_xdr_filler
We need to use the cookie from the previous array entry, not the
actual cookie that we are searching for (except for the case of
uncached_readdir).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-12-01 08:16:16 -05:00
Trond Myklebust
37a09f0745 NFS: Fix a readdirplus bug
When comparing filehandles in the helper nfs_same_file(), we should not be
using 'strncmp()': filehandles are not null terminated strings.

Instead, we should just use the existing helper nfs_compare_fh().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-30 10:18:49 -08:00
Trond Myklebust
0b26a0bf6f NFS: Ensure we return the dirent->d_type when it is known
Store the dirent->d_type in the struct nfs_cache_array_entry so that we
can use it in getdents() calls.

This fixes a regression with the new readdir code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:24:48 -05:00
Trond Myklebust
3020093f57 NFS: Correct the array bound calculation in nfs_readdir_add_to_array
It looks as if the array size calculation in MAX_READDIR_ARRAY does not
take the alignment of struct nfs_cache_array_entry into account.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:24:47 -05:00
Trond Myklebust
ece0b4233b NFS: Don't ignore errors from nfs_do_filldir()
We should ignore the errors from the filldir callback, and just interpret
them as meaning we should exit, however we should definitely pass back
ENOMEM errors.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:24:47 -05:00
Trond Myklebust
85f8607e16 NFS: Fix the error handling in "uncached_readdir()"
Currently, uncached_readdir() is broken because if fails to handle
the results from nfs_readdir_xdr_to_array() correctly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:24:46 -05:00
Trond Myklebust
7a8e1dc34f NFS: Fix a page leak in uncached_readdir()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:24:45 -05:00
Trond Myklebust
e7c58e974a NFS: Fix a page leak in nfs_do_filldir()
nfs_do_filldir() must always free desc->page when it is done, otherwise
we end up leaking the page.

Also remove unused variable 'dentry'.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:24:44 -05:00
Trond Myklebust
5c346854d8 NFS: Assume eof if the server returns no readdir records
Some servers are known to be buggy w.r.t. this. Deal with them...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:24:44 -05:00
Trond Myklebust
463a376eae NFS: Buffer overflow in ->decode_dirent() should not be fatal
Overflowing the buffer in the readdir ->decode_dirent() should not lead to
a fatal error, but rather to an attempt to reread the record in question.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:24:43 -05:00
Arun Bharadwaj
b47d19de2c Pure nfs client performance using odirect.
When an application opens a file with O_DIRECT flag, if the size of
the data that is written is equal to wsize, the client sends a
WRITE RPC with stable flag set to UNSTABLE followed by a single
COMMIT RPC rather than sending a single WRITE RPC with the stable
flag set to FILE_SYNC. This a bug.

Patch to fix this.

Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:24:42 -05:00
Arnd Bergmann
451a3c24b0 BKL: remove extraneous #include <smp_lock.h>
The big kernel lock has been removed from all these files at some point,
leaving only the #include.

Remove this too as a cleanup.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-17 08:59:32 -08:00
Catalin Marinas
04e4bd1c67 nfs: Ignore kmemleak false positive in nfs_readdir_make_qstr
Strings allocated via kmemdup() in nfs_readdir_make_qstr() are
referenced from the nfs_cache_array which is stored in a page cache
page. Kmemleak does not scan such pages and it reports several false
positives. This patch annotates the string->name pointer so that
kmemleak does not consider it a real leak.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-16 12:03:14 -05:00
Trond Myklebust
ac39612824 NFS: readdir shouldn't read beyond the reply returned by the server
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-15 20:44:29 -05:00
Trond Myklebust
8cd51a0ccd NFS: Fix a couple of regressions in readdir.
Fix up the issue that array->eof_index needs to be able to be set
even if array->size == 0.

Ensure that we catch all important memory allocation error conditions
and/or kmap() failures.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-15 20:44:28 -05:00
Trond Myklebust
23ebbd9acf Revert "NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIR"
This reverts commit 80e60639f1.

This change requires further fixes to ensure that the open doesn't
succeed if the lookup later results in a regular file being created.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-15 20:44:27 -05:00
Paulius Zaleckas
1e657bd51f Regression: fix mounting NFS when NFSv3 support is not compiled
Trying to mount NFS (root partition in my case) fails if CONFIG_NFS_V3
is not selected. nfs_validate_mount_data() returns EPROTONOSUPPORT,
because of this check:

#ifndef CONFIG_NFS_V3
	if (args->version == 3)
		goto out_v3_not_compiled;
#endif /* !CONFIG_NFS_V3 */

and args->version was always initialized to 3.

It was working in 2.6.36

Signed-off-by: Paulius Zaleckas <paulius.zaleckas@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-15 20:44:27 -05:00
Christoph Hellwig
51ee4b84f5 locks: let the caller free file_lock on ->setlease failure
The caller allocated it, the caller should free it.

The only issue so far is that we could change the flp pointer even on an
error return if the fl_change callback failed.  But we can simply move
the flp assignment after the fl_change invocation, as the callers don't
care about the flp return value if the setlease call failed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-31 06:35:15 -07:00
J. Bruce Fields
05fa3135fd locks: fix setlease methods to free passed-in lock
We modified setlease to require the caller to allocate the new lease in
the case of creating a new lease, but forgot to fix up the filesystem
methods.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Steve French <sfrench@samba.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-30 18:08:15 -07:00
Al Viro
31f43471e9 convert simple cases of nfs-related ->get_sb() to ->mount()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-29 04:17:23 -04:00
Al Viro
a4118ee1d8 a couple of open-coded ihold() introduced by nfs merge
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-29 04:14:48 -04:00
Geert Uytterhoeven
12364a4f05 nfs4: The difference of 2 pointers is ptrdiff_t
On m68k, which is 32-bit:

fs/nfs/nfs4proc.c: In function ‘nfs41_sequence_done’:
fs/nfs/nfs4proc.c:432: warning: format ‘%ld’ expects type ‘long int’, but argument 3 has type ‘int’
fs/nfs/nfs4proc.c: In function ‘nfs4_setup_sequence’:
fs/nfs/nfs4proc.c:576: warning: format ‘%ld’ expects type ‘long int’, but argument 5 has type ‘int’

On 32-bit, ptrdiff_t is int; on 64-bit, ptrdiff_t is long.

Introduced by commit dfb4f30983 ("NFSv4.1: keep
seq_res.sr_slot as pointer rather than an index")

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-28 15:49:29 -04:00
Dan Carpenter
8f0d97b415 nfs: testing the wrong variable
The intent was to test "*desc" for allocation failures, but it tests
"desc" which is always a valid pointer here.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-28 11:18:00 -04:00
Jeff Layton
015f0212d5 nfs: handle lock context allocation failures in nfs_create_request
nfs_get_lock_context can return NULL on an allocation failure.
Regression introduced by commit f11ac8db.

Reported-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-28 11:17:25 -04:00
Steve Dickson
568a810d7e Fixed Regression in NFS Direct I/O path
A typo, introduced by commit f11ac8db, in the nfs_direct_write()
routine causes writes with O_DIRECT set to fail with a ENOMEM error.

Found-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve Dickson <steved@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-28 11:14:05 -04:00
Linus Torvalds
7420a8c0de Merge branch 'flock' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl
* 'flock' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
  locks: turn lock_flocks into a spinlock
  fasync: re-organize fasync entry insertion to allow it under a spinlock
  locks/nfsd: allocate file lock outside of spinlock
  lockd: fix nlmsvc_notify_blocked locking
  lockd: push lock_flocks down
2010-10-27 18:13:34 -07:00
Arnd Bergmann
763641d812 lockd: push lock_flocks down
lockd should use lock_flocks() instead of lock_kernel()
to lock against posix locks accessing the i_flock list.

This is a prerequisite to turning lock_flocks into a
spinlock.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: J. Bruce Fields <bfields@redhat.com>
2010-10-27 21:39:39 +02:00
Linus Torvalds
426e1f5cec Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
  split invalidate_inodes()
  fs: skip I_FREEING inodes in writeback_sb_inodes
  fs: fold invalidate_list into invalidate_inodes
  fs: do not drop inode_lock in dispose_list
  fs: inode split IO and LRU lists
  fs: switch bdev inode bdi's correctly
  fs: fix buffer invalidation in invalidate_list
  fsnotify: use dget_parent
  smbfs: use dget_parent
  exportfs: use dget_parent
  fs: use RCU read side protection in d_validate
  fs: clean up dentry lru modification
  fs: split __shrink_dcache_sb
  fs: improve DCACHE_REFERENCED usage
  fs: use percpu counter for nr_dentry and nr_dentry_unused
  fs: simplify __d_free
  fs: take dcache_lock inside __d_path
  fs: do not assign default i_ino in new_inode
  fs: introduce a per-cpu last_ino allocator
  new helper: ihold()
  ...
2010-10-26 17:58:44 -07:00
Linus Torvalds
18a043f941 Merge branch 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: rename nfs.upcall -> nfs.idmap
  NFS: Fix a compile issue in nfs_root
2010-10-26 17:24:28 -07:00
Wu Fengguang
1b430beee5 writeback: remove nonblocking/encountered_congestion references
This removes more dead code that was somehow missed by commit 0d99519efe
(writeback: remove unused nonblocking and congestion checks).  There are
no behavior change except for the removal of two entries from one of the
ext4 tracing interface.

The nonblocking checks in ->writepages are no longer used because the
flusher now prefer to block on get_request_wait() than to skip inodes on
IO congestion.  The latter will lead to more seeky IO.

The nonblocking checks in ->writepage are no longer used because it's
redundant with the WB_SYNC_NONE check.

We no long set ->nonblocking in VM page out and page migration, because
a) it's effectively redundant with WB_SYNC_NONE in current code
b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
   that would skip some dirty inodes on congestion and page out others, which
   is unfair in terms of LRU age.

Inspired by Christoph Hellwig. Thanks!

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: David Howells <dhowells@redhat.com>
Cc: Sage Weil <sage@newdream.net>
Cc: Steve French <sfrench@samba.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:05 -07:00
Trond Myklebust
036a107597 NFS: Fix a compile issue in nfs_root
Stephen Rothwell reports:

> /home/test/linux-2.6/fs/nfs/nfsroot.c: In function 'nfs_root_debug':
> /home/test/linux-2.6/fs/nfs/nfsroot.c:110:2: error: 'nfs_debug'
> undeclared (first use in this function)
> /home/test/linux-2.6/fs/nfs/nfsroot.c:110:2: note: each undeclared
> identifier is reported only once for each function it appears in
> make[3]: *** [fs/nfs/nfsroot.o] Error 1
> make[2]: *** [fs/nfs] Error 2
> make[1]: *** [fs] Error 2
> make: *** [sub-make] Error 2

Which is caused by commit 306a075362
(NFS: Allow NFSROOT debugging messages to be enabled dynamically)

Fix is to disable this code when RPC_DEBUG is disabled.

Reported-by: Zimny Lech <napohybelskurwysynom2010@gmail.com>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-26 13:56:42 -04:00
Linus Torvalds
4390110fef Merge branch 'for-2.6.37' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: (99 commits)
  svcrpc: svc_tcp_sendto XPT_DEAD check is redundant
  svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue
  svcrpc: assume svc_delete_xprt() called only once
  svcrpc: never clear XPT_BUSY on dead xprt
  nfsd4: fix connection allocation in sequence()
  nfsd4: only require krb5 principal for NFSv4.0 callbacks
  nfsd4: move minorversion to client
  nfsd4: delay session removal till free_client
  nfsd4: separate callback change and callback probe
  nfsd4: callback program number is per-session
  nfsd4: track backchannel connections
  nfsd4: confirm only on succesful create_session
  nfsd4: make backchannel sequence number per-session
  nfsd4: use client pointer to backchannel session
  nfsd4: move callback setup into session init code
  nfsd4: don't cache seq_misordered replies
  SUNRPC: Properly initialize sock_xprt.srcaddr in all cases
  SUNRPC: Use conventional switch statement when reclassifying sockets
  sunrpc/xprtrdma: clean up workqueue usage
  sunrpc: Turn list_for_each-s into the ..._entry-s
  ...

Fix up trivial conflicts (two different deprecation notices added in
separate branches) in Documentation/feature-removal-schedule.txt
2010-10-26 09:55:25 -07:00
Linus Torvalds
a4dd8dce14 Merge branch 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  net/sunrpc: Use static const char arrays
  nfs4: fix channel attribute sanity-checks
  NFSv4.1: Use more sensible names for 'initialize_mountpoint'
  NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure
  NFSv4.1: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure
  NFS: client needs to maintain list of inodes with active layouts
  NFS: create and destroy inode's layout cache
  NFSv4.1: pnfs: filelayout: introduce minimal file layout driver
  NFSv4.1: pnfs: full mount/umount infrastructure
  NFS: set layout driver
  NFS: ask for layouttypes during v4 fsinfo call
  NFS: change stateid to be a union
  NFSv4.1: pnfsd, pnfs: protocol level pnfs constants
  SUNRPC: define xdr_decode_opaque_fixed
  NFSD: remove duplicate NFS4_STATEID_SIZE
2010-10-26 09:52:09 -07:00
J. Bruce Fields
43c2e885be nfs4: fix channel attribute sanity-checks
The sanity checks here are incorrect; in the worst case they allow
values that crash the client.

They're also over-reliant on the preprocessor.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-25 22:19:41 -04:00
Al Viro
7de9c6ee3e new helper: ihold()
Clones an existing reference to inode; caller must already hold one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:11 -04:00
Linus Torvalds
74eb94b218 Merge branch 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (67 commits)
  SUNRPC: Cleanup duplicate assignment in rpcauth_refreshcred
  nfs: fix unchecked value
  Ask for time_delta during fsinfo probe
  Revalidate caches on lock
  SUNRPC: After calling xprt_release(), we must restart from call_reserve
  NFSv4: Fix up the 'dircount' hint in encode_readdir
  NFSv4: Clean up nfs4_decode_dirent
  NFSv4: nfs4_decode_dirent must clear entry->fattr->valid
  NFSv4: Fix a regression in decode_getfattr
  NFSv4: Fix up decode_attr_filehandle() to handle the case of empty fh pointer
  NFS: Ensure we check all allocation return values in new readdir code
  NFS: Readdir plus in v4
  NFS: introduce generic decode_getattr function
  NFS: check xdr_decode for errors
  NFS: nfs_readdir_filler catch all errors
  NFS: readdir with vmapped pages
  NFS: remove page size checking code
  NFS: decode_dirent should use an xdr_stream
  SUNRPC: Add a helper function xdr_inline_peek
  NFS: remove readdir plus limit
  ...
2010-10-25 13:48:29 -07:00
Trond Myklebust
1c787096fc NFSv4.1: Use more sensible names for 'initialize_mountpoint'
The initialize_mountpoint/uninitialise_mountpoint functions are really about
setting or clearing the layout driver to be used on this filesystem. Change
the names to the more descriptive 'set_layoutdriver/clear_layoutdriver'.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:07:11 -04:00
Andy Adamson
16b374ca43 NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure
Implement the driver's io_ops->alloc_lseg and free_lseg functions,
which integrate into the deviceid cache and calls out to
nfs4_proc_getdeviceinfo when necessary.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Dean Hildebrand <dhildebz@umich.edu>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:07:11 -04:00
Andy Adamson
b1f69b754e NFSv4.1: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure
Add the ability to actually send LAYOUTGET and GETDEVICEINFO.  This also adds
in the machinery to handle layout state and the deviceid cache.  Note that
GETDEVICEINFO is not called directly by the generic layer.  Instead it
is called by the drivers while parsing the LAYOUTGET opaque data in response
to an unknown device id embedded therein.  RFC 5661 only encodes
device ids within the driver-specific opaque data.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Dean Hildebrand <dhildebz@umich.edu>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:07:10 -04:00
Andy Adamson
974cec8ca0 NFS: client needs to maintain list of inodes with active layouts
In particular, server reboot will invalidate all layouts.

Note that in order to have an active layout, we must get a successful response
from the server.  To avoid adding that machinery, this patch just includes a
stub that fakes up a successful return.  Since the layout is never referenced
for io, this is not a problem.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Dean Hildebrand <dhildebz@umich.edu>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:07:10 -04:00
Benny Halevy
e5e940170b NFS: create and destroy inode's layout cache
At the start of the io paths, try to grab the relevant layout
information.  This will initiate the inode's layout cache, but
stubs ensure the cache stays empty.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Dean Hildebrand <dhildebz@umich.edu>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:07:10 -04:00
Dean Hildebrand
7ab672ce31 NFSv4.1: pnfs: filelayout: introduce minimal file layout driver
This driver just registers itself and supplies trivial mount/umount functions.

Signed-off-by: Dean Hildebrand <dhildebz@umich.edu>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:07:10 -04:00
Fred Isaman
02c35fca7c NFSv4.1: pnfs: full mount/umount infrastructure
Allow a module implementing a layout type to register, and
have its mount/umount routines called for filesystems that
the server declares support it.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Bian Naimeng <biannm@cn.fujitsu.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:07:10 -04:00
Ricardo Labiaga
85e174ba6b NFS: set layout driver
Put in the infrastructure that uses information returned from the
server at mount to select a layout driver module.

In this patch, a stub is used that always returns "no driver found".

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Dean Hildebrand <dhildebz@umich.edu>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:07:10 -04:00
Andy Adamson
504913fbc8 NFS: ask for layouttypes during v4 fsinfo call
This information will be used to determine which layout driver,
if any, to use for subsequent IO on this filesystem.  Each driver
is assigned an integer id, with 0 reserved to indicate no driver.

The server can in theory return multiple ids.  However, our current
client implementation only notes the first entry and ignores the
rest.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:07:09 -04:00
Alexandros Batsakis
9449925273 NFS: change stateid to be a union
In NFSv4.1 the stateid consists of the other and seqid fields. For layout
processing we need to numerically compare the seqid value of layout stateids.
To do so, introduce a union to nfs4_stateid to switch between opaque(16 bytes)
and opaque(12 bytes) / __be32

Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:02:53 -04:00
Roman Borisov
3388bff5cf nfs: fix unchecked value
Return value of "decode_attr_bitmap()" was not checked;

Signed-off-by: Roman Borisov <ext-roman.borisov@nokia.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:00:12 -04:00
Ricardo Labiaga
55b6e7742d Ask for time_delta during fsinfo probe
Used by the client to determine if the server has a granular enough
time stamp.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 18:00:04 -04:00
Ricardo Labiaga
6b96724e50 Revalidate caches on lock
Instead of blindly zapping the caches, attempt to revalidate them if
the server has indicated that it uses high resolution timestamps.

NFSv4 should be able to always revalidate the cache since the
protocol requires the update of the change attribute on modification of
the data.  In reality, there are servers (the Linux NFS server
for example) that do not obey this requirement and use ctime as the
basis for change attribute.  Long term, the server needs to be fixed.
At this time, and to be on the safe side, continue zapping caches if
the server indicates that it does not have a high resolution timestamp.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 17:59:56 -04:00
Trond Myklebust
6f7a35bd23 NFSv4: Fix up the 'dircount' hint in encode_readdir
Also ensure we only ask for either fileid or mounted_on_fileid in the
readdirplus case too...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 13:20:25 -04:00
Trond Myklebust
9af8c222ca NFSv4: Clean up nfs4_decode_dirent
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 13:18:04 -04:00
Trond Myklebust
4f082222fa NFSv4: nfs4_decode_dirent must clear entry->fattr->valid
Otherwise, we may end up reading uninitialised data from the resulting
struct nfs_fattr.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-24 13:14:02 -04:00
Trond Myklebust
3201f3dd73 NFSv4: Fix a regression in decode_getfattr
We don't want to have the mounted_on_fileid overwrite the true fileid. We
only return the former if the server didn't supply the true fileid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:43:10 -04:00
Trond Myklebust
7ad0735300 NFSv4: Fix up decode_attr_filehandle() to handle the case of empty fh pointer
decode_attr_filehandle still needs to skip the XDR-encoded filehandle if
someone passes a null pointer argument.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:34:20 -04:00
Trond Myklebust
4a201d6e3f NFS: Ensure we check all allocation return values in new readdir code
Also some clean ups.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:38 -04:00
Bryan Schumaker
82f2e5472e NFS: Readdir plus in v4
By requsting more attributes during a readdir, we can mimic the readdir plus
operation that was in NFSv3.

To test, I ran the command `ls -lU --color=none` on directories with various
numbers of files.  Without readdir plus, I see this:

n files |    100    |   1,000   |  10,000   |  100,000  | 1,000,000
--------+-----------+-----------+-----------+-----------+----------
real    | 0m00.153s | 0m00.589s | 0m05.601s | 0m56.691s | 9m59.128s
user    | 0m00.007s | 0m00.007s | 0m00.077s | 0m00.703s | 0m06.800s
sys     | 0m00.010s | 0m00.070s | 0m00.633s | 0m06.423s | 1m10.005s
access  | 3         | 1         | 1         | 4         | 31
getattr | 2         | 1         | 1         | 1         | 1
lookup  | 104       | 1,003     | 10,003    | 100,003   | 1,000,003
readdir | 2         | 16        | 158       | 1,575     | 15,749
total   | 111       | 1,021     | 10,163    | 101,583   | 1,015,784

With readdir plus enabled, I see this:

n files |    100    |   1,000   |  10,000   |  100,000  | 1,000,000
--------+-----------+-----------+-----------+-----------+----------
real    | 0m00.115s | 0m00.206s | 0m01.079s | 0m12.521s | 2m07.528s
user    | 0m00.003s | 0m00.003s | 0m00.040s | 0m00.290s | 0m03.296s
sys     | 0m00.007s | 0m00.020s | 0m00.120s | 0m01.357s | 0m17.556s
access  | 3         | 1         | 1         | 1         | 7
getattr | 2         | 1         | 1         | 1         | 1
lookup  | 4         | 3         | 3         | 3         | 3
readdir | 6         | 62        | 630       | 6,300     | 62,993
total   | 15        | 67        | 635       | 6,305     | 63,004

Readdir plus disabled has about a 16x increase in the number of rpc calls and
is 4 - 5 times slower on large directories.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:37 -04:00
Bryan Schumaker
ae42c70a60 NFS: introduce generic decode_getattr function
Getattr should be able to decode errors and the readdir file handle.
decode_getfattr_attrs does the actual attribute decoding, while
decode_getfattr_generic will check the opcode before decoding.  This will
let other functions call decode_getfattr_attrs to decode their attributes.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:37 -04:00
Bryan Schumaker
9942438089 NFS: check xdr_decode for errors
Check if the decoded entry has the eof bit set when returning from xdr_decode
with an error.  If it does, we should set the eof bits in the array before
returning.  This should keep us from looping when we expect more data but the
server doesn't give us anything new.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:36 -04:00
Bryan Schumaker
3c8a1aeed8 NFS: nfs_readdir_filler catch all errors
Check for all errors, not a specific one.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:35 -04:00
Bryan Schumaker
56e4ebf877 NFS: readdir with vmapped pages
We can use vmapped pages to read more information from the network at once.
This will reduce the number of calls needed to complete a readdir.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
[trondmy: Added #include for linux/vmalloc.h> in fs/nfs/dir.c]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:35 -04:00
Bryan Schumaker
afa8ccc978 NFS: remove page size checking code
Remove the page size checking code for a readdir decode.  This is now done
by decode_dirent with xdr_streams.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:34 -04:00
Bryan Schumaker
babddc72a9 NFS: decode_dirent should use an xdr_stream
Convert nfs*xdr.c to use an xdr stream in decode_dirent.  This will prevent a
kernel oops that has been occuring when reading a vmapped page.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:33 -04:00
Bryan Schumaker
0715dc632a NFS: remove readdir plus limit
We will now use readdir plus even on directories that are very large.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:32 -04:00
Bryan Schumaker
d39ab9de3b NFS: re-add readdir plus
This patch adds readdir plus support to the cache array.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:31 -04:00
Trond Myklebust
baf57a09e9 NFS: Optimise the readdir searches
If we're going through the loop in nfs_readdir() more than once, we usually
do not want to restart searching from the beginning of the pages cache.

We only want to do that if the previous search failed...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:30 -04:00
Bryan Schumaker
d1bacf9eb2 NFS: add readdir cache array
This patch adds the readdir cache array and functions to retreive the array
stored on a cache page, clear the array by freeing allocated memory, add an
entry to the array, and search the array for a given cookie.

It then modifies readdir to make use of the new cache array.
With the new cache array method, we no longer need some of this code.

Finally, nfs_llseek_dir() will set file->f_pos to a value greater than 0 and
desc->dir_cookie to zero.  When we see this, readdir needs to find the file
at position file->f_pos from the start of the directory.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:30 -04:00
Randy Dunlap
8c7597f6ce nfs: include ratelimit.h, fix nfs4state build error
nfs4state.c uses interfaces from ratelimit.h.  It needs to include
that header file to fix build errors:

fs/nfs/nfs4state.c:1195: warning: type defaults to 'int' in declaration of 'DEFINE_RATELIMIT_STATE'
fs/nfs/nfs4state.c:1195: warning: parameter names (without types) in function declaration
fs/nfs/nfs4state.c:1195: error: invalid storage class for function 'DEFINE_RATELIMIT_STATE'
fs/nfs/nfs4state.c:1195: error: implicit declaration of function '__ratelimit'
fs/nfs/nfs4state.c:1195: error: '_rs' undeclared (first use in this function)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc:	Trond Myklebust <Trond.Myklebust@netapp.com>
Cc:	linux-nfs@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:29 -04:00
Trond Myklebust
168667c43b NFSv4: The state manager must ignore EKEYEXPIRED.
Otherwise, we cannot recover state correctly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:28 -04:00
Trond Myklebust
898f635c42 NFSv4: Don't ignore the error return codes from nfs_intent_set_file
If nfs_intent_set_file() returns an error, we usually want to pass that
back up the stack.

Also ensure that nfs_open_revalidate() returns '1' on success.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-23 15:27:17 -04:00
Linus Torvalds
79f14b7c56 Merge branch 'vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl
* 'vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: (30 commits)
  BKL: remove BKL from freevxfs
  BKL: remove BKL from qnx4
  autofs4: Only declare function when CONFIG_COMPAT is defined
  autofs: Only declare function when CONFIG_COMPAT is defined
  ncpfs: Lock socket in ncpfs while setting its callbacks
  fs/locks.c: prepare for BKL removal
  BKL: Remove BKL from ncpfs
  BKL: Remove BKL from OCFS2
  BKL: Remove BKL from squashfs
  BKL: Remove BKL from jffs2
  BKL: Remove BKL from ecryptfs
  BKL: Remove BKL from afs
  BKL: Remove BKL from USB gadgetfs
  BKL: Remove BKL from autofs4
  BKL: Remove BKL from isofs
  BKL: Remove BKL from fat
  BKL: Remove BKL from ext2 filesystem
  BKL: Remove BKL from do_new_mount()
  BKL: Remove BKL from cgroup
  BKL: Remove BKL from NTFS
  ...
2010-10-22 10:52:01 -07:00
Linus Torvalds
5704e44d28 Merge branch 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl
* 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
  BKL: introduce CONFIG_BKL.
  dabusb: remove the BKL
  sunrpc: remove the big kernel lock
  init/main.c: remove BKL notations
  blktrace: remove the big kernel lock
  rtmutex-tester: make it build without BKL
  dvb-core: kill the big kernel lock
  dvb/bt8xx: kill the big kernel lock
  tlclk: remove big kernel lock
  fix rawctl compat ioctls breakage on amd64 and itanic
  uml: kill big kernel lock
  parisc: remove big kernel lock
  cris: autoconvert trivial BKL users
  alpha: kill big kernel lock
  isapnp: BKL removal
  s390/block: kill the big kernel lock
  hpet: kill BKL, add compat_ioctl
2010-10-22 10:43:11 -07:00
Arnd Bergmann
6de5bd128d BKL: introduce CONFIG_BKL.
With all the patches we have queued in the BKL removal tree, only a
few dozen modules are left that actually rely on the BKL, and even
there are lots of low-hanging fruit. We need to decide what to do
about them, this patch illustrates one of the options:

Every user of the BKL is marked as 'depends on BKL' in Kconfig,
and the CONFIG_BKL becomes a user-visible option. If it gets
disabled, no BKL using module can be built any more and the BKL
code itself is compiled out.

The one exception is file locking, which is practically always
enabled and does a 'select BKL' instead. This effectively forces
CONFIG_BKL to be enabled until we have solved the fs/lockd
mess and can apply the patch that removes the BKL from fs/locks.c.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2010-10-21 15:44:13 +02:00
Trond Myklebust
6eaa61496f NFSv4: Don't call nfs4_reclaim_complete() on receiving NFS4ERR_STALE_CLIENTID
If the server sends us an NFS4ERR_STALE_CLIENTID while the state management
thread is busy reclaiming state, we do want to treat all state that wasn't
reclaimed before the STALE_CLIENTID as if a network partition occurred (see
the edge conditions described in RFC3530 and RFC5661).
What we do not want to do is to send an nfs4_reclaim_complete(), since we
haven't yet even started reclaiming state after the server rebooted.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-10-19 19:42:53 -04:00
Trond Myklebust
ae1007d37e NFSv4: Don't call nfs4_state_mark_reclaim_reboot() from error handlers
In the case of a server reboot, the state recovery thread starts by calling
nfs4_state_end_reclaim_reboot() in order to avoid edge conditions when
the server reboots while the client is in the middle of recovery.

However, if the client has already marked the nfs4_state as requiring
reboot recovery, then the above behaviour will cause the recovery thread to
treat the open as if it was part of such an edge condition: the open will
be recovered as if it was part of a lease expiration (and all the locks
will be lost).
Fix is to remove the call to nfs4_state_mark_reclaim_reboot from
nfs4_async_handle_error(), and nfs4_handle_exception(). Instead we leave it
to the recovery thread to do this for us.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-10-19 19:42:33 -04:00
Trond Myklebust
b0ed9dbc24 NFSv4: Fix open recovery
NFSv4 open recovery is currently broken: since we do not clear the
state->flags states before attempting recovery, we end up with the
'can_open_cached()' function triggering. This again leads to no OPEN call
being put on the wire.

Reported-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-10-19 19:41:55 -04:00
Trond Myklebust
bc4866b6e0 NFS: Don't SIGBUS if nfs_vm_page_mkwrite races with a cache invalidation
In the case where we lock the page, and then find out that the page has
been thrown out of the page cache, we should just return VM_FAULT_NOPAGE.
This is what block_page_mkwrite() does in these situations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-10-19 19:37:54 -04:00
Bryan Schumaker
955a857e06 NFS: new idmapper
This patch creates a new idmapper system that uses the request-key function to
place a call into userspace to map user and group ids to names.  The old
idmapper was single threaded, which prevented more than one request from running
at a single time.  This means that a user would have to wait for an upcall to
finish before accessing a cached result.

The upcall result is stored on a keyring of type id_resolver.  See the file
Documentation/filesystems/nfs/idmapper.txt for instructions.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
[Trond: fix up the return value of nfs_idmap_lookup_name and clean up code]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-07 18:48:49 -04:00
Arnd Bergmann
b89f432133 fs/locks.c: prepare for BKL removal
This prepares the removal of the big kernel lock from the
file locking code. We still use the BKL as long as fs/lockd
uses it and ceph might sleep, but we can flip the definition
to a private spinlock as soon as that's done.
All users outside of fs/lockd get converted to use
lock_flocks() instead of lock_kernel() where appropriate.

Based on an earlier patch to use a spinlock from Matthew
Wilcox, who has attempted this a few times before, the
earliest patch from over 10 years ago turned it into
a semaphore, which ended up being slower than the BKL
and was subsequently reverted.

Someone should do some serious performance testing when
this becomes a spinlock, since this has caused problems
before. Using a spinlock should be at least as good
as the BKL in theory, but who knows...

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Sage Weil <sage@newdream.net>
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
2010-10-05 11:02:04 +02:00