Commit Graph

6482 Commits

Author SHA1 Message Date
Linus Torvalds
5b0ed59649 for-6.3/block-2023-02-16
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPvfncQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpob2EADXJxcr2jjYHm/7cjKkyuVX8fr80dNdMeuY
 JFdsjG1k6Uj73BVhQQWYTcs/PsrWBHWRsv6uz4WgOELj55eXmf5Q0kJszyUeJW33
 /DjqLvtoppVcYf80xE13wKvCfn73BjwQo6xkGM0qAYn15eaXiD/Ax3xC6eJlsBeK
 PEw7EJyhacbSxZa/1D2B6+mqII1jUQWProTCc3udZ4JHi3WvdWa3Rda0qCqHl4a1
 +K2aP2YTFIRPxBzfMNa/CafWVIFubTdht+4Ds6R60RImzB9e0VUBfcsiUyW5Zg7L
 Fwv7ptXuWrALwVNdW56Oz1QikBxn2pdRR2HMLwKJW1MD8kP9r8LMm2jV5Rhiwe0B
 OQsGRYkOzBvw+bxeP5fvk0iPGVMz6ActH4gkraA5QdLqayDaFYOadlhqz0uRo5SH
 Fb42Vl658K/MHDSIk8U58TNkmrsIJsBGohXI9DOGINPPvv3XOPi4Q1HmXkGRmii0
 y+lNU/QEGh7xXXew29SPP76uQpQaYfC7NxXCMw/OpOMwehzjsjshmM2lpxi8zsgt
 PJUmfHv5qxCplNmTJXmUpmX7sS7550HUdu9FJb13DM+gzKg8bk9jWVuLrzqrVlG5
 1hKWEl1+heg1heRfaIuJVLbPI0au6Sb4uqhih/PHyrP9TWIoAruDbDJM65GKTxyE
 2uEgcHzHQw==
 =poRc
 -----END PGP SIGNATURE-----

Merge tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - NVMe updates via Christoph:
      - Small improvements to the logging functionality (Amit Engel)
      - Authentication cleanups (Hannes Reinecke)
      - Cleanup and optimize the DMA mapping cod in the PCIe driver
        (Keith Busch)
      - Work around the command effects for Format NVM (Keith Busch)
      - Misc cleanups (Keith Busch, Christoph Hellwig)
      - Fix and cleanup freeing single sgl (Keith Busch)

 - MD updates via Song:
      - Fix a rare crash during the takeover process
      - Don't update recovery_cp when curr_resync is ACTIVE
      - Free writes_pending in md_stop
      - Change active_io to percpu

 - Updates to drbd, inching us closer to unifying the out-of-tree driver
   with the in-tree one (Andreas, Christoph, Lars, Robert)

 - BFQ update adding support for multi-actuator drives (Paolo, Federico,
   Davide)

 - Make brd compliant with REQ_NOWAIT (me)

 - Fix for IOPOLL and queue entering, fixing stalled IO waiting on
   timeouts (me)

 - Fix for REQ_NOWAIT with multiple bios (me)

 - Fix memory leak in blktrace cleanup (Greg)

 - Clean up sbitmap and fix a potential hang (Kemeng)

 - Clean up some bits in BFQ, and fix a bug in the request injection
   (Kemeng)

 - Clean up the request allocation and issue code, and fix some bugs
   related to that (Kemeng)

 - ublk updates and fixes:
      - Add support for unprivileged ublk (Ming)
      - Improve device deletion handling (Ming)
      - Misc (Liu, Ziyang)

 - s390 dasd fixes (Alexander, Qiheng)

 - Improve utility of request caching and fixes (Anuj, Xiao)

 - zoned cleanups (Pankaj)

 - More constification for kobjs (Thomas)

 - blk-iocost cleanups (Yu)

 - Remove bio splitting from drivers that don't need it (Christoph)

 - Switch blk-cgroups to use struct gendisk. Some of this is now
   incomplete as select late reverts were done. (Christoph)

 - Add bvec initialization helpers, and convert callers to use that
   rather than open-coding it (Christoph)

 - Misc fixes and cleanups (Jinke, Keith, Arnd, Bart, Li, Martin,
   Matthew, Ulf, Zhong)

* tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux: (169 commits)
  brd: use radix_tree_maybe_preload instead of radix_tree_preload
  block: use proper return value from bio_failfast()
  block: bio-integrity: Copy flags when bio_integrity_payload is cloned
  block: Fix io statistics for cgroup in throttle path
  brd: mark as nowait compatible
  brd: check for REQ_NOWAIT and set correct page allocation mask
  brd: return 0/-error from brd_insert_page()
  block: sync mixed merged request's failfast with 1st bio's
  Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"
  Revert "blk-cgroup: pass a gendisk to blkg_lookup"
  Revert "blk-cgroup: delay blk-cgroup initialization until add_disk"
  Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release"
  Revert "blk-cgroup: move the cgroup information to struct gendisk"
  nvme-pci: remove iod use_sgls
  nvme-pci: fix freeing single sgl
  block: ublk: check IO buffer based on flag need_get_data
  s390/dasd: Fix potential memleak in dasd_eckd_init()
  s390/dasd: sort out physical vs virtual pointers usage
  block: Remove the ALLOC_CACHE_SLACK constant
  block: make kobj_type structures constant
  ...
2023-02-20 14:27:21 -08:00
Linus Torvalds
05e6295f7b fs.idmapped.v6.3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY+5NlQAKCRCRxhvAZXjc
 orOaAP9i2h3OJy95nO2Fpde0Bt2UT+oulKCCcGlvXJ8/+TQpyQD/ZQq47gFQ0EAz
 Br5NxeyGeecAb0lHpFz+CpLGsxMrMwQ=
 =+BG5
 -----END PGP SIGNATURE-----

Merge tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping

Pull vfs idmapping updates from Christian Brauner:

 - Last cycle we introduced the dedicated struct mnt_idmap type for
   mount idmapping and the required infrastucture in 256c8aed2b ("fs:
   introduce dedicated idmap type for mounts"). As promised in last
   cycle's pull request message this converts everything to rely on
   struct mnt_idmap.

   Currently we still pass around the plain namespace that was attached
   to a mount. This is in general pretty convenient but it makes it easy
   to conflate namespaces that are relevant on the filesystem with
   namespaces that are relevant on the mount level. Especially for
   non-vfs developers without detailed knowledge in this area this was a
   potential source for bugs.

   This finishes the conversion. Instead of passing the plain namespace
   around this updates all places that currently take a pointer to a
   mnt_userns with a pointer to struct mnt_idmap.

   Now that the conversion is done all helpers down to the really
   low-level helpers only accept a struct mnt_idmap argument instead of
   two namespace arguments.

   Conflating mount and other idmappings will now cause the compiler to
   complain loudly thus eliminating the possibility of any bugs. This
   makes it impossible for filesystem developers to mix up mount and
   filesystem idmappings as they are two distinct types and require
   distinct helpers that cannot be used interchangeably.

   Everything associated with struct mnt_idmap is moved into a single
   separate file. With that change no code can poke around in struct
   mnt_idmap. It can only be interacted with through dedicated helpers.
   That means all filesystems are and all of the vfs is completely
   oblivious to the actual implementation of idmappings.

   We are now also able to extend struct mnt_idmap as we see fit. For
   example, we can decouple it completely from namespaces for users that
   don't require or don't want to use them at all. We can also extend
   the concept of idmappings so we can cover filesystem specific
   requirements.

   In combination with the vfs{g,u}id_t work we finished in v6.2 this
   makes this feature substantially more robust and thus difficult to
   implement wrong by a given filesystem and also protects the vfs.

 - Enable idmapped mounts for tmpfs and fulfill a longstanding request.

   A long-standing request from users had been to make it possible to
   create idmapped mounts for tmpfs. For example, to share the host's
   tmpfs mount between multiple sandboxes. This is a prerequisite for
   some advanced Kubernetes cases. Systemd also has a range of use-cases
   to increase service isolation. And there are more users of this.

   However, with all of the other work going on this was way down on the
   priority list but luckily someone other than ourselves picked this
   up.

   As usual the patch is tiny as all the infrastructure work had been
   done multiple kernel releases ago. In addition to all the tests that
   we already have I requested that Rodrigo add a dedicated tmpfs
   testsuite for idmapped mounts to xfstests. It is to be included into
   xfstests during the v6.3 development cycle. This should add a slew of
   additional tests.

* tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits)
  shmem: support idmapped mounts for tmpfs
  fs: move mnt_idmap
  fs: port vfs{g,u}id helpers to mnt_idmap
  fs: port fs{g,u}id helpers to mnt_idmap
  fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap
  fs: port i_{g,u}id_{needs_}update() to mnt_idmap
  quota: port to mnt_idmap
  fs: port privilege checking helpers to mnt_idmap
  fs: port inode_owner_or_capable() to mnt_idmap
  fs: port inode_init_owner() to mnt_idmap
  fs: port acl to mnt_idmap
  fs: port xattr to mnt_idmap
  fs: port ->permission() to pass mnt_idmap
  fs: port ->fileattr_set() to pass mnt_idmap
  fs: port ->set_acl() to pass mnt_idmap
  fs: port ->get_acl() to pass mnt_idmap
  fs: port ->tmpfile() to pass mnt_idmap
  fs: port ->rename() to pass mnt_idmap
  fs: port ->mknod() to pass mnt_idmap
  fs: port ->mkdir() to pass mnt_idmap
  ...
2023-02-20 11:53:11 -08:00
Linus Torvalds
de630176bd i_version handling changes for v6.3
-----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEES8DXskRxsqGE6vXTAA5oQRlWghUFAmPuDAUTHGpsYXl0b25A
 a2VybmVsLm9yZwAKCRAADmhBGVaCFQu9D/9ZEYVvoY3s7tK5Cw+My+1U0sNQFUW0
 D46kVF0bLxv09G8prI6FZFQORguSMZli3TEsdyUbU12ak3kNHYkTW6AxxnN0jQVE
 ur6f5iOauh9mWcv2S9/gulz9KH2uiG1rGHWfClfoZgQI1QCatfaKh9nrEcNrPy71
 qpTbcN8AOhhYDP2ZAHajuwTuak1jLBHYN7ihfaUPyex9kAxUalbw5wIsXYfTB+Mn
 /xPuCAN9YL5KpuUzVSF7QfWWVEiqErSFeFip+NCS/srihURKr14JuVSwUwkTn8rR
 ONWBawXDw+eiWZ1nQmzvPTazbvsQGO1cKx8A20nOLmJjtRqAGlfngDQrCchkoSg7
 SwYDiDpDqkg5PTWSfkDMG1wY/tEm6dO89P912kY+jFuGJIrG4HPkOR20StmKkuz5
 La4bUs5UwucxxS9Hlu5+aFGrYFtuixWpR/1Wu5ITRtoq+j5vzHHrQ1wkbezNXs0w
 VxS7YqW1TDDUUXgBzYOctmQ50ADVVC63J6AbhA7w7Qh+1vpEvyqYcWZw/Vd54bPP
 tDh3FFnEGJjX0QqtdpHpyIiOm/BXiJ/8y96vEAxVzXtOmb59AQDCPq99HgWa4G8L
 62ltF7qkgXKOsWCaKk07uycE0zISnx1ZXk9bvLFeN02XhRkPxevV93Yn/YiJ7nZe
 xcR23eGRsseA/w==
 =EKA6
 -----END PGP SIGNATURE-----

Merge tag 'iversion-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull i_version updates from Jeff Layton:
 "This overhauls how we handle i_version queries from nfsd.

  Instead of having special routines and grabbing the i_version field
  directly out of the inode in some cases, we've moved most of the
  handling into the various filesystems' getattr operations. As a bonus,
  this makes ceph's change attribute usable by knfsd as well.

  This should pave the way for future work to make this value queryable
  by userland, and to make it more resilient against rolling back on a
  crash"

* tag 'iversion-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  nfsd: remove fetch_iversion export operation
  nfsd: use the getattr operation to fetch i_version
  nfsd: move nfsd4_change_attribute to nfsfh.c
  ceph: report the inode version in getattr if requested
  nfs: report the inode version in getattr if requested
  vfs: plumb i_version handling into struct kstat
  fs: clarify when the i_version counter must be updated
  fs: uninline inode_query_iversion
2023-02-20 11:21:02 -08:00
Linus Torvalds
575a7e0f81 File locking changes for v6.3
-----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEES8DXskRxsqGE6vXTAA5oQRlWghUFAmPuC5kTHGpsYXl0b25A
 a2VybmVsLm9yZwAKCRAADmhBGVaCFdXpEADFN/loTtcANLPQmLmgmJLDuZr2zKrf
 aziMJRjqGMx6BLdwBgX8/XGBNwpG4tkVbI+zdRoVHkpcayMDpLq0dnrvi79a/dGU
 fBrI72ZDMd/S9lzbodObHMziLqvgFthsPm9ldVAZ2Kt400KKNE+ozcveiC3yVGy0
 n1k5BSt/78abzpqut5whVgJBooHtUMCh3XvBJPKwgOneHfAXCm+jqaXlKKpKlpZj
 s2OUyn8BLfNkTgpAZ88L5Rkf0mftjziL6C8KOMy1hvOsyiP0IkwLuQ/kO+2H0Ate
 p3tbOGvUT+n1gYpFYBDLnuWB4G8+CVPxfoO6KGhwT4OlCJpPlNCM8O+w/A/dKn4I
 858spkpYPMy91lEkcrRLRkg/MARWGTgZ3k76fp3OWNnfruWd6ekMlYKx9n6CIy34
 Aoc3Svy9KeA7oRrbRDltmw3UVmz53GcDo337ZL1J6Jph3s86dMG7AwGYvoDfKuKK
 b0oNK5db5v50scBnRHX6UWejE5fSnnHvgC7pU57u08odCVEUALB+r8f04vmkxcVJ
 Qed7lolQdFzn9ddaOXzpg5KeCe/cX3p4IPZSTHad7CPr8gswmC135DfXCr64x2hC
 5jyNzKbe/x+7B2xCweHmEk4ojt8IU3UaYxLJoQkNeVr8rEGC9gqZgkSDe7BnTpOf
 wT0ijzhy2u5RKg==
 =Zhf3
 -----END PGP SIGNATURE-----

Merge tag 'locks-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull file locking updates from Jeff Layton:
 "The main change here is that I've broken out most of the file locking
  definitions into a new header file. I also went ahead and completed
  the removal of locks_inode function"

* tag 'locks-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  fs: remove locks_inode
  filelock: move file locking definitions to separate header file
2023-02-20 11:10:38 -08:00
Anna Schumaker
896e090eef Revert "NFSv4.2: Change the default KConfig value for READ_PLUS"
This reverts commit 7fd461c47c.

Unfortunately, it has come to our attention that there is still a bug
somewhere in the READ_PLUS code that can result in nfsroot systems on
ARM to crash during boot.

Let's do the right thing and revert this change so we don't break
people's nfsroot setups.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-02-17 09:07:19 -05:00
Christoph Hellwig
8bb7cd842c nfs: use bvec_set_page to initialize bvecs
Use the bvec_set_page helper to initialize bvecs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Link: https://lore.kernel.org/r/20230203150634.3199647-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-03 10:17:42 -07:00
Jeff Layton
58a033c9a3 nfsd: remove fetch_iversion export operation
Now that the i_version counter is reported in struct kstat, there is no
need for this export operation.

Acked-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2023-01-26 07:00:06 -05:00
Jeff Layton
61a968b4f0 nfs: report the inode version in getattr if requested
Allow NFS to report the i_version in getattr requests. Since the cost to
fetch it is relatively cheap, do it unconditionally and just set the
flag if it looks like it's valid. Also, conditionally enable the
MONOTONIC flag when the server reports its change attr type as such.

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2023-01-26 07:00:06 -05:00
Christian Brauner
39f60c1cce
fs: port xattr to mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:28 +01:00
Christian Brauner
4609e1f18e
fs: port ->permission() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:28 +01:00
Christian Brauner
13e83a4923
fs: port ->set_acl() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:27 +01:00
Christian Brauner
e18275ae55
fs: port ->rename() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:26 +01:00
Christian Brauner
5ebb29bee8
fs: port ->mknod() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:26 +01:00
Christian Brauner
c54bd91e9e
fs: port ->mkdir() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:26 +01:00
Christian Brauner
7a77db9551
fs: port ->symlink() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:25 +01:00
Christian Brauner
6c960e68aa
fs: port ->create() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:25 +01:00
Christian Brauner
b74d24f7a7
fs: port ->getattr() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:25 +01:00
Christian Brauner
c1632a0f11
fs: port ->setattr() to pass mnt_idmap
Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19 09:24:02 +01:00
Jeff Layton
5970e15dbc filelock: move file locking definitions to separate header file
The file locking definitions have lived in fs.h since the dawn of time,
but they are only used by a small subset of the source files that
include it.

Move the file locking definitions to a new header file, and add the
appropriate #include directives to the source files that need them. By
doing this we trim down fs.h a bit and limit the amount of rebuilding
that has to be done when we make changes to the file locking APIs.

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: Steve French <stfrench@microsoft.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2023-01-11 06:52:32 -05:00
Linus Torvalds
9b43a525db NFS client fixes for Linux 6.2
Highlights include:
 
 Bugfixes
 - Fix a race in the RPCSEC_GSS upcall code that causes hung RPC calls
 - Fix a broken coalescing test in the pNFS file layout driver
 - Ensure that the access cache rcu path also applies the login test
 - Fix up for a sparse warning
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmO5s6MACgkQZwvnipYK
 APJ8PA//aVnu+DqLTxeGnyr4HNIcm0su4TetghQabAf0S8nk69Ly8W+N+xCRwwCR
 /PvSe8j51tXVInpYQKSOTjDl8/0mq8cybf6N9DzDS0A+hKQfdK8jvPLU0yj/aZ6u
 D8x17ve7RjzLgw9minltq4eTvVIHpqSh/vVUyQhnj2A8qQlWrboXiQnLQAp5KBbq
 7NexvcajgfFHXm7S+edZ8NH5QZ20FNVskvzyrClp46DQLJcdNLHGJLp8tdeD31kE
 lsNBsLG0LxOFyBY28MbQptXalhBP8RRwYwxdSdIUnExzB7eKTLS6LgHuK9GV+EzL
 wSfRrnyM023momKmdFn3Fy/cRyeF1RwfQON+KAHo1cQ5fFiQS16p7lTELGkApyLc
 yOXGpAvaPJuuBJLM+Xi07ovhqpc7SjGmr7cjZslyasNtez60Le06oSihKnrgcKDx
 ZVh1pTc89O25GqGm161HaQJY9YU4XCtjxPv7rvdrlhO5Dkgxai293R5aXUFBz1dD
 tssv5a2DqIekrE29YaCi/RjIqkngCXHlwniFw/L3/aFRUyHqN9/MyxH5VeCRDI2T
 OJQRR056V1FXSBLH6Wuzs2Ixz2HnJCLR7LE0zAoPLm/RwW8CuNmEXu28jfU0Kcmt
 Hj5VyxYfnqEAGsUO0O2oNhOnxWY9OHvlvpb/9DWcke0ASpeScB4=
 =N6cT
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client fixes from Trond Myklebust:

 - Fix a race in the RPCSEC_GSS upcall code that causes hung RPC calls

 - Fix a broken coalescing test in the pNFS file layout driver

 - Ensure that the access cache rcu path also applies the login test

 - Fix up for a sparse warning

* tag 'nfs-for-6.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Fix up a sparse warning
  NFS: Judge the file access cache's timestamp in rcu path
  pNFS/filelayout: Fix coalescing test for single DS
  SUNRPC: ensure the matching upcall is in-flight upon downcall
2023-01-07 10:38:11 -08:00
Trond Myklebust
5e9a7b9c2e NFS: Fix up a sparse warning
sparse is warning about an incorrect RCU dereference.
fs/nfs/dir.c:2965:56: warning: incorrect type in argument 1 (different address spaces)
fs/nfs/dir.c:2965:56:    expected struct cred const *
fs/nfs/dir.c:2965:56:    got struct cred const [noderef] __rcu *const cred

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-01-01 20:17:26 -05:00
Chengen Du
029085b894 NFS: Judge the file access cache's timestamp in rcu path
If the user's login time is newer than the cache's timestamp,
we expect the cache may be stale and need to clear.
The stale cache will remain in the list's tail if no other
users operate on that inode.
Once the user accesses the inode, the stale cache will be
returned in rcu path.

Signed-off-by: Chengen Du <chengen.du@canonical.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-01-01 20:12:47 -05:00
Olga Kornievskaia
a6b9d2fa00 pNFS/filelayout: Fix coalescing test for single DS
When there is a single DS no striping constraints need to be placed on
the IO. When such constraint is applied then buffered reads don't
coalesce to the DS's rsize.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-12-20 12:52:39 -05:00
Linus Torvalds
71a7507afb Driver Core changes for 6.2-rc1
Here is the set of driver core and kernfs changes for 6.2-rc1.
 
 The "big" change in here is the addition of a new macro,
 container_of_const() that will preserve the "const-ness" of a pointer
 passed into it.
 
 The "problem" of the current container_of() macro is that if you pass in
 a "const *", out of it can comes a non-const pointer unless you
 specifically ask for it.  For many usages, we want to preserve the
 "const" attribute by using the same call.  For a specific example, this
 series changes the kobj_to_dev() macro to use it, allowing it to be used
 no matter what the const value is.  This prevents every subsystem from
 having to declare 2 different individual macros (i.e.
 kobj_const_to_dev() and kobj_to_dev()) and having the compiler enforce
 the const value at build time, which having 2 macros would not do
 either.
 
 The driver for all of this have been discussions with the Rust kernel
 developers as to how to properly mark driver core, and kobject, objects
 as being "non-mutable".  The changes to the kobject and driver core in
 this pull request are the result of that, as there are lots of paths
 where kobjects and device pointers are not modified at all, so marking
 them as "const" allows the compiler to enforce this.
 
 So, a nice side affect of the Rust development effort has been already
 to clean up the driver core code to be more obvious about object rules.
 
 All of this has been bike-shedded in quite a lot of detail on lkml with
 different names and implementations resulting in the tiny version we
 have in here, much better than my original proposal.  Lots of subsystem
 maintainers have acked the changes as well.
 
 Other than this change, included in here are smaller stuff like:
   - kernfs fixes and updates to handle lock contention better
   - vmlinux.lds.h fixes and updates
   - sysfs and debugfs documentation updates
   - device property updates
 
 All of these have been in the linux-next tree for quite a while with no
 problems, OTHER than some merge issues with other trees that should be
 obvious when you hit them (block tree deletes a driver that this tree
 modifies, iommufd tree modifies code that this tree also touches).  If
 there are merge problems with these trees, please let me know.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY5wz3A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yks0ACeKYUlVgCsER8eYW+x18szFa2QTXgAn2h/VhZe
 1Fp53boFaQkGBjl8mGF8
 =v+FB
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the set of driver core and kernfs changes for 6.2-rc1.

  The "big" change in here is the addition of a new macro,
  container_of_const() that will preserve the "const-ness" of a pointer
  passed into it.

  The "problem" of the current container_of() macro is that if you pass
  in a "const *", out of it can comes a non-const pointer unless you
  specifically ask for it. For many usages, we want to preserve the
  "const" attribute by using the same call. For a specific example, this
  series changes the kobj_to_dev() macro to use it, allowing it to be
  used no matter what the const value is. This prevents every subsystem
  from having to declare 2 different individual macros (i.e.
  kobj_const_to_dev() and kobj_to_dev()) and having the compiler enforce
  the const value at build time, which having 2 macros would not do
  either.

  The driver for all of this have been discussions with the Rust kernel
  developers as to how to properly mark driver core, and kobject,
  objects as being "non-mutable". The changes to the kobject and driver
  core in this pull request are the result of that, as there are lots of
  paths where kobjects and device pointers are not modified at all, so
  marking them as "const" allows the compiler to enforce this.

  So, a nice side affect of the Rust development effort has been already
  to clean up the driver core code to be more obvious about object
  rules.

  All of this has been bike-shedded in quite a lot of detail on lkml
  with different names and implementations resulting in the tiny version
  we have in here, much better than my original proposal. Lots of
  subsystem maintainers have acked the changes as well.

  Other than this change, included in here are smaller stuff like:

   - kernfs fixes and updates to handle lock contention better

   - vmlinux.lds.h fixes and updates

   - sysfs and debugfs documentation updates

   - device property updates

  All of these have been in the linux-next tree for quite a while with
  no problems"

* tag 'driver-core-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (58 commits)
  device property: Fix documentation for fwnode_get_next_parent()
  firmware_loader: fix up to_fw_sysfs() to preserve const
  usb.h: take advantage of container_of_const()
  device.h: move kobj_to_dev() to use container_of_const()
  container_of: add container_of_const() that preserves const-ness of the pointer
  driver core: fix up missed drivers/s390/char/hmcdrv_dev.c class.devnode() conversion.
  driver core: fix up missed scsi/cxlflash class.devnode() conversion.
  driver core: fix up some missing class.devnode() conversions.
  driver core: make struct class.devnode() take a const *
  driver core: make struct class.dev_uevent() take a const *
  cacheinfo: Remove of_node_put() for fw_token
  device property: Add a blank line in Kconfig of tests
  device property: Rename goto label to be more precise
  device property: Move PROPERTY_ENTRY_BOOL() a bit down
  device property: Get rid of __PROPERTY_ENTRY_ARRAY_EL*SIZE*()
  kernfs: fix all kernel-doc warnings and multiple typos
  driver core: pass a const * into of_device_uevent()
  kobject: kset_uevent_ops: make name() callback take a const *
  kobject: kset_uevent_ops: make filter() callback take a const *
  kobject: make kobject_namespace take a const *
  ...
2022-12-16 03:54:54 -08:00
Linus Torvalds
48ea09cdda hardening updates for v6.2-rc1
- Convert flexible array members, fix -Wstringop-overflow warnings,
   and fix KCFI function type mismatches that went ignored by
   maintainers (Gustavo A. R. Silva, Nathan Chancellor, Kees Cook).
 
 - Remove the remaining side-effect users of ksize() by converting
   dma-buf, btrfs, and coredump to using kmalloc_size_roundup(),
   add more __alloc_size attributes, and introduce full testing
   of all allocator functions. Finally remove the ksize() side-effect
   so that each allocation-aware checker can finally behave without
   exceptions.
 
 - Introduce oops_limit (default 10,000) and warn_limit (default off)
   to provide greater granularity of control for panic_on_oops and
   panic_on_warn (Jann Horn, Kees Cook).
 
 - Introduce overflows_type() and castable_to_type() helpers for
   cleaner overflow checking.
 
 - Improve code generation for strscpy() and update str*() kern-doc.
 
 - Convert strscpy and sigphash tests to KUnit, and expand memcpy
   tests.
 
 - Always use a non-NULL argument for prepare_kernel_cred().
 
 - Disable structleak plugin in FORTIFY KUnit test (Anders Roxell).
 
 - Adjust orphan linker section checking to respect CONFIG_WERROR
   (Xin Li).
 
 - Make sure siginfo is cleared for forced SIGKILL (haifeng.xu).
 
 - Fix um vs FORTIFY warnings for always-NULL arguments.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmOZSOoWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJjAAD/0YkvpU7f03f8hcQMJK6wv//24K
 AW41hEaBikq9RcmkuvkLLrJRibGgZ5O2xUkUkxRs/HxhkhrZ0kEw8sbwZe8MoWls
 F4Y9+TDjsrdHmjhfcBZdLnVxwcKK5wlaEcpjZXtbsfcdhx3TbgcDA23YELl5t0K+
 I11j4kYmf9SLl4CwIrSP5iACml8CBHARDh8oIMF7FT/LrjNbM8XkvBcVVT6hTbOV
 yjgA8WP2e9GXvj9GzKgqvd0uE/kwPkVAeXLNFWopPi4FQ8AWjlxbBZR0gamA6/EB
 d7TIs0ifpVU2JGQaTav4xO6SsFMj3ntoUI0qIrFaTxZAvV4KYGrPT/Kwz1O4SFaG
 rN5lcxseQbPQSBTFNG4zFjpywTkVCgD2tZqDwz5Rrmiraz0RyIokCN+i4CD9S0Ds
 oEd8JSyLBk1sRALczkuEKo0an5AyC9YWRcBXuRdIHpLo08PsbeUUSe//4pe303cw
 0ApQxYOXnrIk26MLElTzSMImlSvlzW6/5XXzL9ME16leSHOIfDeerPnc9FU9Eb3z
 ODv22z6tJZ9H/apSUIHZbMciMbbVTZ8zgpkfydr08o87b342N/ncYHZ5cSvQ6DWb
 jS5YOIuvl46/IhMPT16qWC8p0bP5YhxoPv5l6Xr0zq0ooEj0E7keiD/SzoLvW+Qs
 AHXcibguPRQBPAdiPQ==
 =yaaN
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull kernel hardening updates from Kees Cook:

 - Convert flexible array members, fix -Wstringop-overflow warnings, and
   fix KCFI function type mismatches that went ignored by maintainers
   (Gustavo A. R. Silva, Nathan Chancellor, Kees Cook)

 - Remove the remaining side-effect users of ksize() by converting
   dma-buf, btrfs, and coredump to using kmalloc_size_roundup(), add
   more __alloc_size attributes, and introduce full testing of all
   allocator functions. Finally remove the ksize() side-effect so that
   each allocation-aware checker can finally behave without exceptions

 - Introduce oops_limit (default 10,000) and warn_limit (default off) to
   provide greater granularity of control for panic_on_oops and
   panic_on_warn (Jann Horn, Kees Cook)

 - Introduce overflows_type() and castable_to_type() helpers for cleaner
   overflow checking

 - Improve code generation for strscpy() and update str*() kern-doc

 - Convert strscpy and sigphash tests to KUnit, and expand memcpy tests

 - Always use a non-NULL argument for prepare_kernel_cred()

 - Disable structleak plugin in FORTIFY KUnit test (Anders Roxell)

 - Adjust orphan linker section checking to respect CONFIG_WERROR (Xin
   Li)

 - Make sure siginfo is cleared for forced SIGKILL (haifeng.xu)

 - Fix um vs FORTIFY warnings for always-NULL arguments

* tag 'hardening-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (31 commits)
  ksmbd: replace one-element arrays with flexible-array members
  hpet: Replace one-element array with flexible-array member
  um: virt-pci: Avoid GCC non-NULL warning
  signal: Initialize the info in ksignal
  lib: fortify_kunit: build without structleak plugin
  panic: Expose "warn_count" to sysfs
  panic: Introduce warn_limit
  panic: Consolidate open-coded panic_on_warn checks
  exit: Allow oops_limit to be disabled
  exit: Expose "oops_count" to sysfs
  exit: Put an upper limit on how often we can oops
  panic: Separate sysctl logic from CONFIG_SMP
  mm/pgtable: Fix multiple -Wstringop-overflow warnings
  mm: Make ksize() a reporting-only function
  kunit/fortify: Validate __alloc_size attribute results
  drm/sti: Fix return type of sti_{dvo,hda,hdmi}_connector_mode_valid()
  drm/fsl-dcu: Fix return type of fsl_dcu_drm_connector_mode_valid()
  driver core: Add __alloc_size hint to devm allocators
  overflow: Introduce overflows_type() and castable_to_type()
  coredump: Proactively round up to kmalloc bucket size
  ...
2022-12-14 12:20:00 -08:00
Linus Torvalds
a044dab5e6 NFS client updates for Linux 6.2
Highlights include:
 
 Bugfixes
 - Fix a NULL pointer dereference in the mount parser
 - Fix a memory stomp in decode_attr_security_label
 - Fix a credential leak in _nfs4_discover_trunking()
 - Fix a buffer leak in rpcrdma_req_create()
 - Fix a leaked socket in rpc_sockname()
 - Fix a deadlock between nfs4_open_recover_helper() and delegreturn
 - Fix an Oops in nfs_d_automount()
 - Fix a potential race in nfs_call_unlink()
 - Multiple fixes for the open context mode
 - NFSv4.2 READ_PLUS fixes
 - Fix a regression in which small rsize/wsize values are being forbidden
 - Fail client initialisation if the NFSv4.x state manager thread can't run
 - avoid spurious warning of lost lock that is being unlocked.
 - Ensure the initialisation of struct nfs4_label
 
 Features and cleanups
 - Trigger the "ls -l" readdir heuristic sooner
 - Clear the file access cache upon login to ensure supplementary group
   info is in sync between the client and server
 - pnfs: Fix up the logging of layout stateids
 - NFSv4.2: Change the default KConfig value for READ_PLUS
 - Use sysfs_emit() instead of scnprintf() where appropriate
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmOYmTQACgkQZwvnipYK
 APLxWw//dAFjHXc3YOLi1J94nINZbOyAvdS2WAPVLRfMh8mEln9JAxVsUREz279N
 02EFbW/ZAN3nd2SGdYgbdzTIIJ154LlDS9DmIdwgqK2I/tEZauFeDTpTBDIDNwMg
 eruU4twttbgrE7lgVlE0ZDGlvMNFao462BriPeEN+WaaqQWB2NqMp/qoh1i4mbcn
 rQPPiovn0LiW5K5ulMT5OFz9y4I4oNaJV77sPSUew0M0iLbGtI1OdrF844+8gzb0
 BeMW/ArvM5TpuwMOGmzdt4pql8MTPf4qvcnafiMX69Eg2asGijZlr/AxOpSOYXSl
 LDmAXczQ0VSsXYkKI2bJz1d5xWeFL4ZJVsaR8W/M4aewgqHWtAIs/b+Gk2x/S+Zo
 s1lGFrHrvzhXd/c8LxuH+LBUxLqy6nep0sVWKOK/eQ79BUo8lLwev8YbNk8Sk4Rh
 N2SrcJQeHu9uYKUbWuRWt5r2L9ffYVcSQQfEk5IzCvRuz6B7sFXtwNJK8rPjz78/
 LLHNokQIPHUTYUa2P+zKwozwSwOc6y6eqEhxH8CvDJlo5OfDvT+9deyNb8ZOUxiq
 UXHXG2qQivU0wTaqoowLp8D8FT3MQB9KkUZkikGf9d884qXgbWipx0kGOp+Y6JJp
 JesK1KxcxtKZRiywBSz6fhWW78VknYWf4RR6cESjgpUSbuvblfE=
 =Mz1l
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.2-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust
 "Bugfixes:
   - Fix NULL pointer dereference in the mount parser
   - Fix memory stomp in decode_attr_security_label
   - Fix credential leak in _nfs4_discover_trunking()
   - Fix buffer leak in rpcrdma_req_create()
   - Fix leaked socket in rpc_sockname()
   - Fix deadlock between nfs4_open_recover_helper() and delegreturn
   - Fix an Oops in nfs_d_automount()
   - Fix potential race in nfs_call_unlink()
   - Multiple fixes for the open context mode
   - NFSv4.2 READ_PLUS fixes
   - Fix a regression in which small rsize/wsize values are being
     forbidden
   - Fail client initialisation if the NFSv4.x state manager thread
     can't run
   - Avoid spurious warning of lost lock that is being unlocked.
   - Ensure the initialisation of struct nfs4_label

  Features and cleanups:
   - Trigger the "ls -l" readdir heuristic sooner
   - Clear the file access cache upon login to ensure supplementary
     group info is in sync between the client and server
   - pnfs: Fix up the logging of layout stateids
   - NFSv4.2: Change the default KConfig value for READ_PLUS
   - Use sysfs_emit() instead of scnprintf() where appropriate"

* tag 'nfs-for-6.2-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (24 commits)
  NFSv4.2: Change the default KConfig value for READ_PLUS
  NFSv4.x: Fail client initialisation if state manager thread can't run
  fs: nfs: sysfs: use sysfs_emit() to instead of scnprintf()
  NFS: use sysfs_emit() to instead of scnprintf()
  NFS: Allow very small rsize & wsize again
  NFSv4.2: Fix up READ_PLUS alignment
  NFSv4.2: Set the correct size scratch buffer for decoding READ_PLUS
  SUNRPC: Fix missing release socket in rpc_sockname()
  xprtrdma: Fix regbuf data not freed in rpcrdma_req_create()
  NFS: avoid spurious warning of lost lock that is being unlocked.
  nfs: fix possible null-ptr-deref when parsing param
  NFSv4: check FMODE_EXEC from open context mode in nfs4_opendata_access()
  NFS: make sure open context mode have FMODE_EXEC when file open for exec
  NFS4.x/pnfs: Fix up logging of layout stateids
  NFS: Fix a race in nfs_call_unlink()
  NFS: Fix an Oops in nfs_d_automount()
  NFSv4: Fix a deadlock between nfs4_open_recover_helper() and delegreturn
  NFSv4: Fix a credential leak in _nfs4_discover_trunking()
  NFS: Trigger the "ls -l" readdir heuristic sooner
  NFSv4.2: Fix initialisation of struct nfs4_label
  ...
2022-12-13 08:44:41 -08:00
Linus Torvalds
764822972d NFSD 6.2 Release Notes
This release introduces support for the CB_RECALL_ANY operation.
 NFSD can send this operation to request that clients return any
 delegations they choose. The server uses this operation to handle
 low memory scenarios or indicate to a client when that client has
 reached the maximum number of delegations the server supports.
 
 The NFSv4.2 READ_PLUS operation has been simplified temporarily
 whilst support for sparse files in local filesystems and the VFS is
 improved.
 
 Two major data structure fixes appear in this release:
 
 * The nfs4_file hash table is replaced with a resizable hash table
   to reduce the latency of NFSv4 OPEN operations.
 
 * Reference counting in the NFSD filecache has been hardened against
   races.
 
 In furtherance of removing support for NFSv2 in a subsequent kernel
 release, a new Kconfig option enables server-side support for NFSv2
 to be left out of a kernel build.
 
 MAINTAINERS has been updated to indicate that changes to fs/exportfs
 should go through the NFSD tree.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmOXPE4ACgkQM2qzM29m
 f5faaRAAh7YT5V61afPbfgBybO5AbDzztpZSNjNjLZs78piSnFp6hP75yNtTviwQ
 1o7St13/NkCmDaIdGUpr02U01zbM1BDOq2wGckImOJLNSgb7xHV5r4PqkRiFkh0t
 QYSnwG+wp8fDUJeCL/nAOAu9I9EQUqHzWchxiU/h8ln2hN3rXUlIRSeo17Wy7zkD
 cNIcoAjTi9fzY3dE6H4r+lZTdNCYH+AdzChmKrHdRZQwq0Xs3FWv4gAMTLbDuD4P
 B6NDHz0Umn6XnFsJGptwozkwaWeMQw4GyJj/3iUiO8JF209SaoYXMPjJAyG6tYYa
 fUrgv4UXGeXjigDbLBA5IYxfhX7GXjMQSaj3edhzyrl8P74q4/Cq/8fDUnAZ841m
 E+TGSCPIQD0QuIjdXxLv9KLY8JNThSfcAt6jr5GBXhPZQr8xpS0BqK/Onr68fgZC
 Lpull5xN68L4A1B7cf2GNPuMyvkBKxwSGXOehldh/BkvpVMjFwqd4/q5xWC+6CcQ
 tbOkjTbbSS71nzJwZip0NphaYCa3qQPzKT4SZzn/I4I9W5otbwYBx734Bw46gTDE
 ZPUXTuJ00VPgX07wbLRahg521Fwzr+8sk1WnVYq82PoaMh1l9FjzLNGouQWBdo3E
 UzIo/KUfQKmoZce6O723L6OI4ffdK5oMtfaTpe+SiUPpV1lUAcA=
 =jNlu
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "This release introduces support for the CB_RECALL_ANY operation. NFSD
  can send this operation to request that clients return any delegations
  they choose. The server uses this operation to handle low memory
  scenarios or indicate to a client when that client has reached the
  maximum number of delegations the server supports.

  The NFSv4.2 READ_PLUS operation has been simplified temporarily whilst
  support for sparse files in local filesystems and the VFS is improved.

  Two major data structure fixes appear in this release:

   - The nfs4_file hash table is replaced with a resizable hash table to
     reduce the latency of NFSv4 OPEN operations.

   - Reference counting in the NFSD filecache has been hardened against
     races.

  In furtherance of removing support for NFSv2 in a subsequent kernel
  release, a new Kconfig option enables server-side support for NFSv2 to
  be left out of a kernel build.

  MAINTAINERS has been updated to indicate that changes to fs/exportfs
  should go through the NFSD tree"

* tag 'nfsd-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (49 commits)
  NFSD: Avoid clashing function prototypes
  SUNRPC: Fix crasher in unwrap_integ_data()
  SUNRPC: Make the svc_authenticate tracepoint conditional
  NFSD: Use only RQ_DROPME to signal the need to drop a reply
  SUNRPC: Clean up xdr_write_pages()
  SUNRPC: Don't leak netobj memory when gss_read_proxy_verf() fails
  NFSD: add CB_RECALL_ANY tracepoints
  NFSD: add delegation reaper to react to low memory condition
  NFSD: add support for sending CB_RECALL_ANY
  NFSD: refactoring courtesy_client_reaper to a generic low memory shrinker
  trace: Relocate event helper files
  NFSD: pass range end to vfs_fsync_range() instead of count
  lockd: fix file selection in nlmsvc_cancel_blocked
  lockd: ensure we use the correct file descriptor when unlocking
  lockd: set missing fl_flags field when retrieving args
  NFSD: Use struct_size() helper in alloc_session()
  nfsd: return error if nfs4_setacl fails
  lockd: set other missing fields when unlocking files
  NFSD: Add an nfsd_file_fsync tracepoint
  sunrpc: svc: Remove an unused static function svc_ungetu32()
  ...
2022-12-12 20:54:39 -08:00
Linus Torvalds
6a518afcc2 fs.acl.rework.v6.2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY5bwTgAKCRCRxhvAZXjc
 ovd2AQCK00NAtGjQCjQPQGyTa4GAPqvWgq1ef0lnhv+TL5US5gD9FncQ8UofeMXt
 pBfjtAD6ettTPCTxUQfnTwWEU4rc7Qg=
 =27Wm
 -----END PGP SIGNATURE-----

Merge tag 'fs.acl.rework.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping

Pull VFS acl updates from Christian Brauner:
 "This contains the work that builds a dedicated vfs posix acl api.

  The origins of this work trace back to v5.19 but it took quite a while
  to understand the various filesystem specific implementations in
  sufficient detail and also come up with an acceptable solution.

  As we discussed and seen multiple times the current state of how posix
  acls are handled isn't nice and comes with a lot of problems: The
  current way of handling posix acls via the generic xattr api is error
  prone, hard to maintain, and type unsafe for the vfs until we call
  into the filesystem's dedicated get and set inode operations.

  It is already the case that posix acls are special-cased to death all
  the way through the vfs. There are an uncounted number of hacks that
  operate on the uapi posix acl struct instead of the dedicated vfs
  struct posix_acl. And the vfs must be involved in order to interpret
  and fixup posix acls before storing them to the backing store, caching
  them, reporting them to userspace, or for permission checking.

  Currently a range of hacks and duct tape exist to make this work. As
  with most things this is really no ones fault it's just something that
  happened over time. But the code is hard to understand and difficult
  to maintain and one is constantly at risk of introducing bugs and
  regressions when having to touch it.

  Instead of continuing to hack posix acls through the xattr handlers
  this series builds a dedicated posix acl api solely around the get and
  set inode operations.

  Going forward, the vfs_get_acl(), vfs_remove_acl(), and vfs_set_acl()
  helpers must be used in order to interact with posix acls. They
  operate directly on the vfs internal struct posix_acl instead of
  abusing the uapi posix acl struct as we currently do. In the end this
  removes all of the hackiness, makes the codepaths easier to maintain,
  and gets us type safety.

  This series passes the LTP and xfstests suites without any
  regressions. For xfstests the following combinations were tested:
   - xfs
   - ext4
   - btrfs
   - overlayfs
   - overlayfs on top of idmapped mounts
   - orangefs
   - (limited) cifs

  There's more simplifications for posix acls that we can make in the
  future if the basic api has made it.

  A few implementation details:

   - The series makes sure to retain exactly the same security and
     integrity module permission checks. Especially for the integrity
     modules this api is a win because right now they convert the uapi
     posix acl struct passed to them via a void pointer into the vfs
     struct posix_acl format to perform permission checking on the mode.

     There's a new dedicated security hook for setting posix acls which
     passes the vfs struct posix_acl not a void pointer. Basing checking
     on the posix acl stored in the uapi format is really unreliable.
     The vfs currently hacks around directly in the uapi struct storing
     values that frankly the security and integrity modules can't
     correctly interpret as evidenced by bugs we reported and fixed in
     this area. It's not necessarily even their fault it's just that the
     format we provide to them is sub optimal.

   - Some filesystems like 9p and cifs need access to the dentry in
     order to get and set posix acls which is why they either only
     partially or not even at all implement get and set inode
     operations. For example, cifs allows setxattr() and getxattr()
     operations but doesn't allow permission checking based on posix
     acls because it can't implement a get acl inode operation.

     Thus, this patch series updates the set acl inode operation to take
     a dentry instead of an inode argument. However, for the get acl
     inode operation we can't do this as the old get acl method is
     called in e.g., generic_permission() and inode_permission(). These
     helpers in turn are called in various filesystem's permission inode
     operation. So passing a dentry argument to the old get acl inode
     operation would amount to passing a dentry to the permission inode
     operation which we shouldn't and probably can't do.

     So instead of extending the existing inode operation Christoph
     suggested to add a new one. He also requested to ensure that the
     get and set acl inode operation taking a dentry are consistently
     named. So for this version the old get acl operation is renamed to
     ->get_inode_acl() and a new ->get_acl() inode operation taking a
     dentry is added. With this we can give both 9p and cifs get and set
     acl inode operations and in turn remove their complex custom posix
     xattr handlers.

     In the future I hope to get rid of the inode method duplication but
     it isn't like we have never had this situation. Readdir is just one
     example. And frankly, the overall gain in type safety and the more
     pleasant api wise are simply too big of a benefit to not accept
     this duplication for a while.

   - We've done a full audit of every codepaths using variant of the
     current generic xattr api to get and set posix acls and
     surprisingly it isn't that many places. There's of course always a
     chance that we might have missed some and if so I'm sure we'll find
     them soon enough.

     The crucial codepaths to be converted are obviously stacking
     filesystems such as ecryptfs and overlayfs.

     For a list of all callers currently using generic xattr api helpers
     see [2] including comments whether they support posix acls or not.

   - The old vfs generic posix acl infrastructure doesn't obey the
     create and replace semantics promised on the setxattr(2) manpage.
     This patch series doesn't address this. It really is something we
     should revisit later though.

  The patches are roughly organized as follows:

   (1) Change existing set acl inode operation to take a dentry
       argument (Intended to be a non-functional change)

   (2) Rename existing get acl method (Intended to be a non-functional
       change)

   (3) Implement get and set acl inode operations for filesystems that
       couldn't implement one before because of the missing dentry.
       That's mostly 9p and cifs (Intended to be a non-functional
       change)

   (4) Build posix acl api, i.e., add vfs_get_acl(), vfs_remove_acl(),
       and vfs_set_acl() including security and integrity hooks
       (Intended to be a non-functional change)

   (5) Implement get and set acl inode operations for stacking
       filesystems (Intended to be a non-functional change)

   (6) Switch posix acl handling in stacking filesystems to new posix
       acl api now that all filesystems it can stack upon support it.

   (7) Switch vfs to new posix acl api (semantical change)

   (8) Remove all now unused helpers

   (9) Additional regression fixes reported after we merged this into
       linux-next

  Thanks to Seth for a lot of good discussion around this and
  encouragement and input from Christoph"

* tag 'fs.acl.rework.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (36 commits)
  posix_acl: Fix the type of sentinel in get_acl
  orangefs: fix mode handling
  ovl: call posix_acl_release() after error checking
  evm: remove dead code in evm_inode_set_acl()
  cifs: check whether acl is valid early
  acl: make vfs_posix_acl_to_xattr() static
  acl: remove a slew of now unused helpers
  9p: use stub posix acl handlers
  cifs: use stub posix acl handlers
  ovl: use stub posix acl handlers
  ecryptfs: use stub posix acl handlers
  evm: remove evm_xattr_acl_change()
  xattr: use posix acl api
  ovl: use posix acl api
  ovl: implement set acl method
  ovl: implement get acl method
  ecryptfs: implement set acl method
  ecryptfs: implement get acl method
  ksmbd: use vfs_remove_acl()
  acl: add vfs_remove_acl()
  ...
2022-12-12 18:46:39 -08:00
Linus Torvalds
75f4d9af8b iov_iter work; most of that is about getting rid of
direction misannotations and (hopefully) preventing
 more of the same for the future.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHQEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCY5ZzQAAKCRBZ7Krx/gZQ
 65RZAP4nTkvOn0NZLVFkuGOx8pgJelXAvrteyAuecVL8V6CR4AD40qCVY51PJp8N
 MzwiRTeqnGDxTTF7mgd//IB6hoatAA==
 =bcvF
 -----END PGP SIGNATURE-----

Merge tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull iov_iter updates from Al Viro:
 "iov_iter work; most of that is about getting rid of direction
  misannotations and (hopefully) preventing more of the same for the
  future"

* tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  use less confusing names for iov_iter direction initializers
  iov_iter: saner checks for attempt to copy to/from iterator
  [xen] fix "direction" argument of iov_iter_kvec()
  [vhost] fix 'direction' argument of iov_iter_{init,bvec}()
  [target] fix iov_iter_bvec() "direction" argument
  [s390] memcpy_real(): WRITE is "data source", not destination...
  [s390] zcore: WRITE is "data source", not destination...
  [infiniband] READ is "data destination", not source...
  [fsi] WRITE is "data source", not destination...
  [s390] copy_oldmem_kernel() - WRITE is "data source", not destination
  csum_and_copy_to_iter(): handle ITER_DISCARD
  get rid of unlikely() on page_copy_sane() calls
2022-12-12 18:29:54 -08:00
Anna Schumaker
7fd461c47c NFSv4.2: Change the default KConfig value for READ_PLUS
Now that we've worked out performance issues and have a server patch
addressing the failed xfstests, we can safely enable this feature by
default.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-12-10 13:24:59 -05:00
Chuck Lever
247c01ff5f trace: Relocate event helper files
Steven Rostedt says:
> The include/trace/events/ directory should only hold files that
> are to create events, not headers that hold helper functions.
>
> Can you please move them out of include/trace/events/ as that
> directory is "special" in the creation of events.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-12-10 11:01:12 -05:00
Trond Myklebust
b4e4f66901 NFSv4.x: Fail client initialisation if state manager thread can't run
If the state manager thread fails to start, then we should just mark the
client initialisation as failed so that other processes or threads don't
get stuck in nfs_wait_client_init_complete().

Reported-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Fixes: 4697bd5e94 ("NFSv4: Fix a race in the net namespace mount notification")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-12-06 13:03:46 -05:00
ye xingchen
19cdc8fa5b fs: nfs: sysfs: use sysfs_emit() to instead of scnprintf()
Follow the advice of the Documentation/filesystems/sysfs.rst and show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the
value to be returned to user space.

Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-12-06 12:32:37 -05:00
ye xingchen
700fa9b1b3 NFS: use sysfs_emit() to instead of scnprintf()
Follow the advice of the Documentation/filesystems/sysfs.rst and show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the
value to be returned to user space.

Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-12-06 12:32:37 -05:00
Anna Schumaker
a60214c246 NFS: Allow very small rsize & wsize again
940261a195 introduced nfs_io_size() to clamp the iosize to a multiple
of PAGE_SIZE. This had the unintended side effect of no longer allowing
iosizes less than a page, which could be useful in some situations.

UDP already has an exception that causes it to fall back on the
power-of-two style sizes instead. This patch adds an additional
exception for very small iosizes.

Reported-by: Jeff Layton <jlayton@kernel.org>
Fixes: 940261a195 ("NFS: Allow setting rsize / wsize to a multiple of PAGE_SIZE")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-12-06 12:30:58 -05:00
Anna Schumaker
f8527028a7 NFSv4.2: Fix up READ_PLUS alignment
Assume that the first segment will be a DATA segment, and place the data
directly into the xdr pages so it doesn't need to be shifted.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-12-06 12:29:35 -05:00
Anna Schumaker
36357fe74e NFSv4.2: Set the correct size scratch buffer for decoding READ_PLUS
The scratch_buf array is 16 bytes, but I was passing 32 to the
xdr_set_scratch_buffer() function. Fix this by using sizeof(), which is
what I probably should have been doing this whole time.

Fixes: d3b00a802c ("NFS: Replace the READ_PLUS decoding code")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-12-06 12:28:35 -05:00
NeilBrown
ef8d98f20d NFS: avoid spurious warning of lost lock that is being unlocked.
When the NFSv4 state manager recovers state after a server restart, it
reports that locks have been lost if it finds any lock state for which
recovery hasn't been successful.  i.e. any for which
NFS_LOCK_INITIALIZED is not set.

However it only tries to recover locks that are still linked to
inode->i_flctx.  So if a lock has been removed from inode->i_flctx, but
the state for that lock has not yet been destroyed, then a spurious
warning results.

nfs4_proc_unlck() calls locks_lock_inode_wait() - which removes the lock
from ->i_flctx - before sending the unlock request to the server and
before the final nfs4_put_lock_state() is called.  This allows a window
in which a spurious warning can be produced.

So add a new flag NFS_LOCK_UNLOCKING which is set once the decision has
been made to unlock the lock.  This will prevent it from triggering any
warning.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-12-06 10:45:11 -05:00
Hawkins Jiawei
5559405df6 nfs: fix possible null-ptr-deref when parsing param
According to commit "vfs: parse: deal with zero length string value",
kernel will set the param->string to null pointer in vfs_parse_fs_string()
if fs string has zero length.

Yet the problem is that, nfs_fs_context_parse_param() will dereferences the
param->string, without checking whether it is a null pointer, which may
trigger a null-ptr-deref bug.

This patch solves it by adding sanity check on param->string
in nfs_fs_context_parse_param().

Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-12-06 10:43:24 -05:00
ChenXiaoSong
d564d2c4c2 NFSv4: check FMODE_EXEC from open context mode in nfs4_opendata_access()
After converting file f_flags to open context mode by flags_to_mode(), open
context mode will have FMODE_EXEC when file open for exec, so we check
FMODE_EXEC from open context mode.

No functional change, just simplify the code.

Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-12-06 10:39:17 -05:00
ChenXiaoSong
6f1c1d95dc NFS: make sure open context mode have FMODE_EXEC when file open for exec
Because file f_mode never have FMODE_EXEC, open context mode won't get
FMODE_EXEC from file f_mode. Open context mode only care about FMODE_READ/
FMODE_WRITE/FMODE_EXEC, and all info about open context mode can be convert
from file f_flags, so convert file f_flags to open context mode by
flags_to_mode().

Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-12-06 10:38:38 -05:00
Trond Myklebust
d01c6ed6db NFS4.x/pnfs: Fix up logging of layout stateids
If the layout is invalid, then just log a '0' value.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-12-05 14:27:05 -05:00
Jeff Layton
17b985def2 nfs: use locks_inode_context helper
nfs currently doesn't access i_flctx safely. This requires a
smp_load_acquire, as the pointer is set via cmpxchg (a release
operation).

Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-11-30 05:08:10 -05:00
Trond Myklebust
5776a9cd2a NFS: Fix a race in nfs_call_unlink()
We should check that the filehandles match before transferring the
sillyrename data to the newly looked-up dentry in case the name was
reused on the server.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-11-27 22:10:00 -05:00
Trond Myklebust
35e3b6ae84 NFS: Fix an Oops in nfs_d_automount()
When mounting from a NFSv4 referral, path->dentry can end up being a
negative dentry, so derive the struct nfs_server from the dentry
itself instead.

Fixes: 2b0143b5c9 ("VFS: normal filesystems (and lustre): d_inode() annotations")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-11-27 22:10:00 -05:00
Trond Myklebust
51069e4aef NFSv4: Fix a deadlock between nfs4_open_recover_helper() and delegreturn
If we're asked to recover open state while a delegation return is
outstanding, then the state manager thread cannot use a cached open, so
if the server returns a delegation, we can end up deadlocked behind the
pending delegreturn.
To avoid this problem, let's just ask the server not to give us a
delegation unless we're explicitly reclaiming one.

Fixes: be36e185bd ("NFSv4: nfs4_open_recover_helper() must set share access")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-11-27 22:09:59 -05:00
Trond Myklebust
e83458fce0 NFSv4: Fix a credential leak in _nfs4_discover_trunking()
Fixes: 4f40a5b554 ("NFSv4: Add an fattr allocation to _nfs4_discover_trunking()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-11-27 22:09:59 -05:00
Benjamin Coddington
85aa8ddc38 NFS: Trigger the "ls -l" readdir heuristic sooner
Since commit 1a34c8c9a4 ("NFS: Support larger readdir buffers") has
updated dtsize, and with recent improvements to the READDIRPLUS helper
heuristic, the heuristic may not trigger until many dentries are emitted
to userspace.   This will cause many thousands of GETATTR calls for "ls
-l" when the directory's pagecache has already been populated.  This
manifests as poor performance for long directory listings after an
initially fast "ls -l".

Fix this by emitting only 17 entries for any first pass through the NFS
directory's ->iterate_shared(), which allows userpace to prime the
counters for the heuristic.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-11-27 22:09:59 -05:00
Trond Myklebust
c528f70f50 NFSv4.2: Fix initialisation of struct nfs4_label
The call to nfs4_label_init_security() should return a fully initialised
label.

Fixes: aa9c266962 ("NFS: Client implementation of Labeled-NFS")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-11-27 22:09:59 -05:00
Trond Myklebust
43c1031f71 NFSv4.2: Fix a memory stomp in decode_attr_security_label
We must not change the value of label->len if it is zero, since that
indicates we stored a label.

Fixes: b4487b9354 ("nfs: Fix getxattr kernel panic and memory overflow")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-11-27 22:09:59 -05:00