There's a focus on fixes for the memfd_pin_folios() work which was added
into 6.11. Apart from that, the usual shower of singleton fixes.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZvbhSAAKCRDdBJ7gKXxA
jp8CAP47txk2c+tBLggog2MkQamADY5l5MT6E3fYq3ghSiKtVQEAnqX3LiQJ02tB
o9LcPcVrM90QntpKrLP1CpWCVdR+zA8=
=e0QC
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-09-27-09-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"19 hotfixes. 13 are cc:stable.
There's a focus on fixes for the memfd_pin_folios() work which was
added into 6.11. Apart from that, the usual shower of singleton fixes"
* tag 'mm-hotfixes-stable-2024-09-27-09-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
ocfs2: fix uninit-value in ocfs2_get_block()
zram: don't free statically defined names
memory tiers: use default_dram_perf_ref_source in log message
Revert "list: test: fix tests for list_cut_position()"
kselftests: mm: fix wrong __NR_userfaultfd value
compiler.h: specify correct attribute for .rodata..c_jump_table
mm/damon/Kconfig: update DAMON doc URL
mm: kfence: fix elapsed time for allocated/freed track
ocfs2: fix deadlock in ocfs2_get_system_file_inode
ocfs2: reserve space for inline xattr before attaching reflink tree
mm: migrate: annotate data-race in migrate_folio_unmap()
mm/hugetlb: simplify refs in memfd_alloc_folio
mm/gup: fix memfd_pin_folios alloc race panic
mm/gup: fix memfd_pin_folios hugetlb page allocation
mm/hugetlb: fix memfd_pin_folios resv_huge_pages leak
mm/hugetlb: fix memfd_pin_folios free_huge_pages leak
mm/filemap: fix filemap_get_folios_contig THP panic
mm: make SPLIT_PTE_PTLOCKS depend on SMP
tools: fix shared radix-tree build
no_llseek had been defined to NULL two years ago, in commit 868941b144
("fs: remove no_llseek")
To quote that commit,
At -rc1 we'll need do a mechanical removal of no_llseek -
git grep -l -w no_llseek | grep -v porting.rst | while read i; do
sed -i '/\<no_llseek\>/d' $i
done
would do it.
Unfortunately, that hadn't been done. Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
.llseek = no_llseek,
so it's obviously safe.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
syzbot reported an uninit-value BUG:
BUG: KMSAN: uninit-value in ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
do_mpage_readpage+0xc45/0x2780 fs/mpage.c:225
mpage_readahead+0x43f/0x840 fs/mpage.c:374
ocfs2_readahead+0x269/0x320 fs/ocfs2/aops.c:381
read_pages+0x193/0x1110 mm/readahead.c:160
page_cache_ra_unbounded+0x901/0x9f0 mm/readahead.c:273
do_page_cache_ra mm/readahead.c:303 [inline]
force_page_cache_ra+0x3b1/0x4b0 mm/readahead.c:332
force_page_cache_readahead mm/internal.h:347 [inline]
generic_fadvise+0x6b0/0xa90 mm/fadvise.c:106
vfs_fadvise mm/fadvise.c:185 [inline]
ksys_fadvise64_64 mm/fadvise.c:199 [inline]
__do_sys_fadvise64 mm/fadvise.c:214 [inline]
__se_sys_fadvise64 mm/fadvise.c:212 [inline]
__x64_sys_fadvise64+0x1fb/0x3a0 mm/fadvise.c:212
x64_sys_call+0xe11/0x3ba0
arch/x86/include/generated/asm/syscalls_64.h:222
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
This is because when ocfs2_extent_map_get_blocks() fails, p_blkno is
uninitialized. So the error log will trigger the above uninit-value
access.
The error log is out-of-date since get_blocks() was removed long time ago.
And the error code will be logged in ocfs2_extent_map_get_blocks() once
ocfs2_get_cluster() fails, so fix this by only logging inode and block.
Link: https://syzkaller.appspot.com/bug?extid=9709e73bae885b05314b
Link: https://lkml.kernel.org/r/20240925090600.3643376-1-joseph.qi@linux.alibaba.com
Fixes: ccd979bdbc ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: syzbot+9709e73bae885b05314b@syzkaller.appspotmail.com
Tested-by: syzbot+9709e73bae885b05314b@syzkaller.appspotmail.com
Cc: Heming Zhao <heming.zhao@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
syzbot has found a possible deadlock in ocfs2_get_system_file_inode [1].
The scenario is depicted here,
CPU0 CPU1
lock(&ocfs2_file_ip_alloc_sem_key);
lock(&osb->system_file_mutex);
lock(&ocfs2_file_ip_alloc_sem_key);
lock(&osb->system_file_mutex);
The function calls which could lead to this are:
CPU0
ocfs2_mknod - lock(&ocfs2_file_ip_alloc_sem_key);
.
.
.
ocfs2_get_system_file_inode - lock(&osb->system_file_mutex);
CPU1 -
ocfs2_fill_super - lock(&osb->system_file_mutex);
.
.
.
ocfs2_read_virt_blocks - lock(&ocfs2_file_ip_alloc_sem_key);
This issue can be resolved by making the down_read -> down_read_try
in the ocfs2_read_virt_blocks.
[1] https://syzkaller.appspot.com/bug?extid=e0055ea09f1f5e6fabdd
Link: https://lkml.kernel.org/r/20240924093257.7181-1-pvmohammedanees2003@gmail.com
Signed-off-by: Mohammed Anees <pvmohammedanees2003@gmail.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: <syzbot+e0055ea09f1f5e6fabdd@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=e0055ea09f1f5e6fabdd
Tested-by: syzbot+e0055ea09f1f5e6fabdd@syzkaller.appspotmail.com
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
One of our customers reported a crash and a corrupted ocfs2 filesystem.
The crash was due to the detection of corruption. Upon troubleshooting,
the fsck -fn output showed the below corruption
[EXTENT_LIST_FREE] Extent list in owner 33080590 claims 230 as the next free chain record,
but fsck believes the largest valid value is 227. Clamp the next record value? n
The stat output from the debugfs.ocfs2 showed the following corruption
where the "Next Free Rec:" had overshot the "Count:" in the root metadata
block.
Inode: 33080590 Mode: 0640 Generation: 2619713622 (0x9c25a856)
FS Generation: 904309833 (0x35e6ac49)
CRC32: 00000000 ECC: 0000
Type: Regular Attr: 0x0 Flags: Valid
Dynamic Features: (0x16) HasXattr InlineXattr Refcounted
Extended Attributes Block: 0 Extended Attributes Inline Size: 256
User: 0 (root) Group: 0 (root) Size: 281320357888
Links: 1 Clusters: 141738
ctime: 0x66911b56 0x316edcb8 -- Fri Jul 12 06:02:30.829349048 2024
atime: 0x66911d6b 0x7f7a28d -- Fri Jul 12 06:11:23.133669517 2024
mtime: 0x66911b56 0x12ed75d7 -- Fri Jul 12 06:02:30.317552087 2024
dtime: 0x0 -- Wed Dec 31 17:00:00 1969
Refcount Block: 2777346
Last Extblk: 2886943 Orphan Slot: 0
Sub Alloc Slot: 0 Sub Alloc Bit: 14
Tree Depth: 1 Count: 227 Next Free Rec: 230
## Offset Clusters Block#
0 0 2310 2776351
1 2310 2139 2777375
2 4449 1221 2778399
3 5670 731 2779423
4 6401 566 2780447
....... .... .......
....... .... .......
The issue was in the reflink workfow while reserving space for inline
xattr. The problematic function is ocfs2_reflink_xattr_inline(). By the
time this function is called the reflink tree is already recreated at the
destination inode from the source inode. At this point, this function
reserves space for inline xattrs at the destination inode without even
checking if there is space at the root metadata block. It simply reduces
the l_count from 243 to 227 thereby making space of 256 bytes for inline
xattr whereas the inode already has extents beyond this index (in this
case up to 230), thereby causing corruption.
The fix for this is to reserve space for inline metadata at the destination
inode before the reflink tree gets recreated. The customer has verified the
fix.
Link: https://lkml.kernel.org/r/20240918063844.1830332-1-gautham.ananthakrishna@oracle.com
Fixes: ef962df057 ("ocfs2: xattr: fix inlined xattr reflink")
Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This reverts commit fb97d2eb54.
The logging was questionable to begin with, but it seems to actively
deadlock on the task lock.
"On second thought, let's not log core dump failures. 'Tis a silly place"
because if you can't tell your core dump is truncated, maybe you should
just fix your debugger instead of adding bugs to the kernel.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Link: https://lore.kernel.org/all/d122ece6-3606-49de-ae4d-8da88846bef2@oracle.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmb0mWUACgkQiiy9cAdy
T1Fwbgv/Zoe5LZukUe4s87xO7IC73Wfn2UBUQmvDUtK1djRF3HrL1QOtXLnFfPb/
pFJTPiNljM/NPcpXAk+7qz1XFihkOwGNJOFFuQPNrwcDX4LLF35sqoeRij1qRkXn
06yLPQRBI2SQLehLqi/Avk4TEatber7uGZMXgOaLN54doiNY8kMYcsIgEQWoe15h
muxCUoPopSokU5+s0H6ObDoXX10KS3ir/1ArmmZ8oh1be363ysye0bf6+mnVNr/P
I5yiERdYrN+oo6ZzC0XjyYSp0SnCbu8jck2g5ydIKUyQ7gbiSE8XqCNVy6ALndxg
URMlYtL+gVknmJk9NJcc8gVp79EZcdjUIbFSTQ1Pa8x++nQCBl9rge1AZ9G/zzY2
Ul6xIVoP5DNgcwXvMka+lJgAsoRgB5olcEBMdltaCpKCLjWNjyzvOzb+kP2L30IC
/nPZJbVQSrdr3ropybapAlHLG57Jk1ad1QdaBEiu5ss528mSmKc+t288zPQKIhU5
Ogqr3CxB
=nVf0
-----END PGP SIGNATURE-----
Merge tag 'v6.12-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
"Most are from the recent SMB3.1.1 test event, and also an important
netfs fix for a cifs mtime write regression
- fix mode reported by stat of readonly directories and files
- DFS (global namespace) related fixes
- fixes for special file support via reparse points
- mount improvement and reconnect fix
- fix for noisy log message on umount
- two netfs related fixes, one fixing a recent regression, and add
new write tracepoint"
* tag 'v6.12-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
netfs, cifs: Fix mtime/ctime update for mmapped writes
cifs: update internal version number
smb: client: print failed session logoffs with FYI
cifs: Fix reversion of the iter in cifs_readv_receive().
smb3: fix incorrect mode displayed for read-only files
smb: client: fix parsing of device numbers
smb: client: set correct device number on nfs reparse points
smb: client: propagate error from cifs_construct_tcon()
smb: client: fix DFS failover in multiuser mounts
cifs: Make the write_{enter,done,err} tracepoints display netfs info
smb: client: fix DFS interlink failover
smb: client: improve purging of cached referrals
smb: client: avoid unnecessary reconnects when refreshing referrals
Several new features here:
virtio-balloon supports new stats
vdpa supports setting mac address
vdpa/mlx5 suspend/resume as well as MKEY ops are now faster
virtio_fs supports new sysfs entries for queue info
virtio/vsock performance has been improved
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmbz7ykPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpkk8H/A3vMRYXBzne9anezZLvADKS/CpX7v0DFEVj
VfSMWXvYdUariYDyyb7pZsvK5QR22pE0pIaW6Kcgv9fNwq27M/H6g6NJk5ny8a7d
216AQs1J28pXPPY+q03fhf3SzE3yHP8aeD9lyiO9QJYfs9vjtoyZeBGt3a4IUSX4
ZeNBAx8xWTBcEDIIcZLdY1DNDTbZ4+qQ12Ln9IKq7D4xkE6l7Xh+HGdgTWTnDZ8P
qEUUOmJTFKTQdOiVuU4NN3wzgHKWHdwKg0uWXo7ereYr3kYe3q//jCcLMv88a1x0
XP7NRBQg/rsErwTMdLz6ffyqXJs6lGGqNXzRfZKEwAvmnh/+zs4=
=gNBq
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"Several new features here:
- virtio-balloon supports new stats
- vdpa supports setting mac address
- vdpa/mlx5 suspend/resume as well as MKEY ops are now faster
- virtio_fs supports new sysfs entries for queue info
- virtio/vsock performance has been improved
And fixes, cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits)
vsock/virtio: avoid queuing packets when intermediate queue is empty
vsock/virtio: refactor virtio_transport_send_pkt_work
fw_cfg: Constify struct kobj_type
vdpa/mlx5: Postpone MR deletion
vdpa/mlx5: Introduce init/destroy for MR resources
vdpa/mlx5: Rename mr_mtx -> lock
vdpa/mlx5: Extract mr members in own resource struct
vdpa/mlx5: Rename function
vdpa/mlx5: Delete direct MKEYs in parallel
vdpa/mlx5: Create direct MKEYs in parallel
MAINTAINERS: add virtio-vsock driver in the VIRTIO CORE section
virtio_fs: add sysfs entries for queue information
virtio_fs: introduce virtio_fs_put_locked helper
vdpa: Remove unused declarations
vdpa/mlx5: Parallelize VQ suspend/resume for CVQ MQ command
vdpa/mlx5: Small improvement for change_num_qps()
vdpa/mlx5: Keep notifiers during suspend but ignore
vdpa/mlx5: Parallelize device resume
vdpa/mlx5: Parallelize device suspend
vdpa/mlx5: Use async API for vq modify commands
...
Introduce sysfs entries to provide visibility to the multiple queues
used by the Virtio FS device. This enhancement allows users to query
information about these queues.
Specifically, add two sysfs entries:
1. Queue name: Provides the name of each queue (e.g. hiprio/requests.8).
2. CPU list: Shows the list of CPUs that can process requests for each
queue.
The CPU list feature is inspired by similar functionality in the block
MQ layer, which provides analogous sysfs entries for block devices.
These new sysfs entries will improve observability and aid in debugging
and performance tuning of Virtio FS devices.
Reviewed-by: Idan Zach <izach@nvidia.com>
Reviewed-by: Shai Malin <smalin@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Message-Id: <20240825130716.9506-2-mgurtovoy@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introduce a new helper function virtio_fs_put_locked to encapsulate the
common pattern of releasing a virtio_fs reference while holding a lock.
The existing virtio_fs_put helper will be used to release a virtio_fs
reference while not holding a lock.
Also add an assertion in case the lock is not taken when it should.
Reviewed-by: Idan Zach <izach@nvidia.com>
Reviewed-by: Shai Malin <smalin@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Message-Id: <20240825130716.9506-1-mgurtovoy@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The cifs flag CIFS_INO_MODIFIED_ATTR, which indicates that the mtime and
ctime need to be written back on close, got taken over by netfs as
NETFS_ICTX_MODIFIED_ATTR to avoid the need to call a function pointer to
set it.
The flag gets set correctly on buffered writes, but doesn't get set by
netfs_page_mkwrite(), leading to occasional failures in generic/080 and
generic/215.
Fix this by setting the flag in netfs_page_mkwrite().
Fixes: 73425800ac ("netfs, cifs: Move CIFS_INO_MODIFIED_ATTR to netfs_inode")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202409161629.98887b2-oliver.sang@intel.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Do not flood dmesg with failed session logoffs as kerberos tickets
getting expired or passwords being rotated is a very common scenario.
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
cifs_read_iter_from_socket() copies the iterator that's passed in for the
socket to modify as and if it will, and then advances the original iterator
by the amount sent. However, both callers revert the advancement (although
receive_encrypted_read() zeros beyond the iterator first). The problem is,
though, that cifs_readv_receive() reverts by the original length, not the
amount transmitted which can cause an oops in iov_iter_revert().
Fix this by:
(1) Remove the iov_iter_advance() from cifs_read_iter_from_socket().
(2) Remove the iov_iter_revert() from both callers. This fixes the bug in
cifs_readv_receive().
(3) In receive_encrypted_read(), if we didn't get back as much data as the
buffer will hold, copy the iterator, advance the copy and use the copy
to drive iov_iter_zero().
As a bonus, this gets rid of some unnecessary work.
This was triggered by generic/074 with the "-o sign" mount option.
Fixes: 3ee1a1fc39 ("cifs: Cut over to using netfslib")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Commands like "chmod 0444" mark a file readonly via the attribute flag
(when mapping of mode bits into the ACL are not set, or POSIX extensions
are not negotiated), but they were not reported correctly for stat of
directories (they were reported ok for files and for "ls"). See example
below:
root:~# ls /mnt2 -l
total 12
drwxr-xr-x 2 root root 0 Sep 21 18:03 normaldir
-rwxr-xr-x 1 root root 0 Sep 21 23:24 normalfile
dr-xr-xr-x 2 root root 0 Sep 21 17:55 readonly-dir
-r-xr-xr-x 1 root root 209716224 Sep 21 18:15 readonly-file
root:~# stat -c %a /mnt2/readonly-dir
755
root:~# stat -c %a /mnt2/readonly-file
555
This fixes the stat of directories when ATTR_READONLY is set
(in cases where the mode can not be obtained other ways).
root:~# stat -c %a /mnt2/readonly-dir
555
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Report correct major and minor numbers from special files created with
NFS reparse points.
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Fix major and minor numbers set on special files created with NFS
reparse points.
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Propagate error from cifs_construct_tcon() in cifs_sb_tlink() instead of
always returning -EACCES.
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
For sessions and tcons created on behalf of new users accessing a
multiuser mount, matching their sessions in tcon_super_cb() with
master tcon will always lead to false as every new user will have its
own session and tcon.
All multiuser sessions, however, will inherit ->dfs_root_ses from
master tcon, so match it instead.
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Make the write RPC tracepoints use the same trace macro complexes as the
read tracepoints and display the netfs request and subrequest IDs where
available (see commit 519be98971 "cifs: Add a tracepoint to track credits
involved in R/W requests").
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <stfrench@microsoft.com>
cc: Paulo Alcantara (Red Hat) <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
The DFS interlinks point to different DFS namespaces so make sure to
use the correct DFS root server to chase any DFS links under it by
storing the SMB session in dfs_ref_walk structure and then using it on
every referral walk.
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Purge cached referrals that have a single target when reaching maximum
of cache size as the client won't need them to failover. Otherwise
remove oldest cache entry.
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Do not mark tcons for reconnect when current connection matches any of
the targets returned by new referral even when there is no cached
entry.
Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
New Features:
* Add a 'noalignwrite' mount option for lock-less 'lost writes' prevention
* Add support for the LOCALIO protocol extention
Bugfixes:
* Fix memory leak in error path of nfs4_do_reclaim()
* Simplify and guarantee lock owner uniqueness
* Fix -Wformat-truncation warning
* Fix folio refcounts by using folio_attach_private()
* Fix failing the mount system call when the server is down
* Fix detection of "Proxying of Times" server support
Cleanups:
* Annotate struct nfs_cache_array with __counted_by()
* Remove unnecessary NULL checks before kfree()
* Convert RPC_TASK_* constants to an enum
* Remove obsolete or misleading comments and declerations
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmbzKDAACgkQ18tUv7Cl
QOusjBAAxTSoVbHocl+9eYpvKnscPArgPXnfd6mB9rnQRgtnceTO2ei7cdiE2qhz
dxQiyzlAXh3e7dGwoy02qEd6wTTqWeQ8ESdMpCAqSacBU4tu5owfzNSWunZvgYYj
QOhjdmv8M1IfZnTstPlVrRNaZcDkhV1tzEtpZppkEqhTB0bHWqrcM4EdklTWT0Yc
PGMpGbfuGsa4qZy2vWl7doERVEgK8mBeahLtYFD2W6phIvNWgD6IlKy66RaK2RfH
nXmZoZbI2/ioi4TKvNyY8xoGMGvetLI1h8YNQYkEg060XCkisLZDOvoodUAylOTR
2jHQLG5+/ejhpD/zgPghGZDSGNN1GyZaH09E/vtiS+3k9OXxFz6Rq68VnC6kpMA4
TIUYsT8ejPzs2gW59iDFGB6cKI4XnRtxgmApW/Za0y9A72PSi+G/pbWAk7ThjTxf
+HySsba4baA63opIgBSLVBrUsXZfdn/KTDTZ4nkPiq57BggGcZv7Y2ItOTXA+pB/
5nigDKkhWsYVjMbkx6wmh+VO2gv4/Z8WqsmiDwFMpVqM0w8eycBOHjOumuuc6nmw
y+2OKZqU2Npm2HI/R8lA7nB1m2QP5t7CRM2+xlZNuavHrfsMaqHNl8/9VgxlCATQ
/Zo74hbhmCgQYxrTjL8XFQG9/8y0o3H5IcTEr/SgCVxHyDSan1I=
=YjyC
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"New Features:
- Add a 'noalignwrite' mount option for lock-less 'lost writes' prevention
- Add support for the LOCALIO protocol extention
Bugfixes:
- Fix memory leak in error path of nfs4_do_reclaim()
- Simplify and guarantee lock owner uniqueness
- Fix -Wformat-truncation warning
- Fix folio refcounts by using folio_attach_private()
- Fix failing the mount system call when the server is down
- Fix detection of "Proxying of Times" server support
Cleanups:
- Annotate struct nfs_cache_array with __counted_by()
- Remove unnecessary NULL checks before kfree()
- Convert RPC_TASK_* constants to an enum
- Remove obsolete or misleading comments and declerations"
* tag 'nfs-for-6.12-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (41 commits)
nfs: Fix `make htmldocs` warnings in the localio documentation
nfs: add "NFS Client and Server Interlock" section to localio.rst
nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst
nfs: add Documentation/filesystems/nfs/localio.rst
nfs: implement client support for NFS_LOCALIO_PROGRAM
nfs/localio: use dedicated workqueues for filesystem read and write
pnfs/flexfiles: enable localio support
nfs: enable localio for non-pNFS IO
nfs: add LOCALIO support
nfs: pass struct nfsd_file to nfs_init_pgio and nfs_init_commit
nfsd: implement server support for NFS_LOCALIO_PROGRAM
nfsd: add LOCALIO support
nfs_common: prepare for the NFS client to use nfsd_file for LOCALIO
nfs_common: add NFS LOCALIO auxiliary protocol enablement
SUNRPC: replace program list with program array
SUNRPC: add svcauth_map_clnt_to_svc_cred_local
SUNRPC: remove call_allocate() BUG_ONs
nfsd: add nfsd_serv_try_get and nfsd_serv_put
nfsd: add nfsd_file_acquire_local()
nfsd: factor out __fh_verify to allow NULL rqstp to be passed
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZvKlbgAKCRDh3BK/laaZ
PLliAP9q5btlhlffnRg2LWCf4rIzbJ6vkORkc+GeyAXnWkIljQEA9En1K2vyg7Tk
f9FvNQK9C+pS0GxURDRI7YedJ2f9FQ0=
=wuY0
-----END PGP SIGNATURE-----
Merge tag 'fuse-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
- Add support for idmapped fuse mounts (Alexander Mikhalitsyn)
- Add optimization when checking for writeback (yangyun)
- Add tracepoints (Josef Bacik)
- Clean up writeback code (Joanne Koong)
- Clean up request queuing (me)
- Misc fixes
* tag 'fuse-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (32 commits)
fuse: use exclusive lock when FUSE_I_CACHE_IO_MODE is set
fuse: clear FR_PENDING if abort is detected when sending request
fs/fuse: convert to use invalid_mnt_idmap
fs/mnt_idmapping: introduce an invalid_mnt_idmap
fs/fuse: introduce and use fuse_simple_idmap_request() helper
fs/fuse: fix null-ptr-deref when checking SB_I_NOIDMAP flag
fuse: allow O_PATH fd for FUSE_DEV_IOC_BACKING_OPEN
virtio_fs: allow idmapped mounts
fuse: allow idmapped mounts
fuse: warn if fuse_access is called when idmapped mounts are allowed
fuse: handle idmappings properly in ->write_iter()
fuse: support idmapped ->rename op
fuse: support idmapped ->set_acl
fuse: drop idmap argument from __fuse_get_acl
fuse: support idmapped ->setattr op
fuse: support idmapped ->permission inode op
fuse: support idmapped getattr inode op
fuse: support idmap for mkdir/mknod/symlink/create/tmpfile
fuse: support idmapped FUSE_EXT_GROUPS
fuse: add an idmap argument to fuse_simple_request
...
- Clean-up unnecessary codes as ->valid_size is supported.
- buffered-IO fallback is no longer needed when using direct-IO.
- Move ->valid_size extension from mmap to ->page_mkwrite.
This improves the overhead caused by unnecessary zero-out during mmap.
- Fix memleaks from exfat_load_bitmap() and exfat_create_upcase_table().
- Add sops->shutdown and ioctl.
- Add Yuezhang Mo as a reviwer.
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmbytEQWHGxpbmtpbmpl
b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCEqUD/sEerRjBeNi+ivTvYtxqQGaDCnj
Re6gBUt138rF2qyVcX3dP0wMHVNEHzjtdJjZGuQXAKttkZ1qW1wGbz0kyIyFjRfZ
MHPaaqAavDiDFqxZnJvB9xKsuU6mb0Kr0JC6mKet3KD+Q2VekePSX+3SvwRDcPNb
4CroYvJtOOWy21FKvKc2LxZBrowTElCPIhiXbHgWRhJBVhi4edrDo0391enzkKwt
Is0/RzMbAsQ08Ap+TH6YIlPtA9aVSiTDyal1YaIgpXjaVxqF3MpMfPFG6+XJ8GOw
k9BXM5XH5YXPZXallG8Fkx5Hh6Nrf9Vuvt68KbLQuzL6MdDEb8vTPEycQFHpapLx
hk5TrL23Ok2RU/AJJXUDxii+J+3YzuTgIL6sdgJbaYb1ZYebiMzjRkwUJpH3dqg+
lx1QtYWsVRR8fTtBEle1yVbOPcuyUWUkMpKVIUseVL0EiQNpiwBSGKKuus3Cul4O
KA6Kx8hYEguHAIBn5U52mzIl9Ye+j+QyRmcmA/qnObk/1h+5FKn+HgnMINex0qmz
PXzI+cLta6TZKtb8+KnTNImRXCDtcvtG9wkF25M3vmzBMiLfTnEZsXKwF+fPiydw
+N19vX6HVT8JpIOGhbsRQp7abLR2IhYCeZQCWdT09Ol0VUsXx87+CfsLQpM3xw4U
79nicqiwHjVP98Wjyg==
=vVfO
-----END PGP SIGNATURE-----
Merge tag 'exfat-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat updates from Namjae Jeon:
- Clean-up unnecessary codes as ->valid_size is supported
- buffered-IO fallback is no longer needed when using direct-IO
- Move ->valid_size extension from mmap to ->page_mkwrite. This
improves the overhead caused by unnecessary zero-out during mmap.
- Fix memleaks from exfat_load_bitmap() and exfat_create_upcase_table()
- Add sops->shutdown and ioctl
- Add Yuezhang Mo as a reviwer
* tag 'exfat-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
MAINTAINERS: exfat: add myself as reviewer
exfat: resolve memory leak from exfat_create_upcase_table()
exfat: move extend valid_size into ->page_mkwrite()
exfat: fix memory leak in exfat_load_bitmap()
exfat: Implement sops->shutdown and ioctl
exfat: do not fallback to buffered write
exfat: drop ->i_size_ondisk
In this series, the main changes include 1) converting major IO paths to use
folio, and 2) adding various knobs to control GC more flexibly for Zoned
devices. In addition, there are several patches to address corner cases of
atomic file operations and better support for file pinning on zoned device.
Enhancement:
- add knobs to tune foreground/background GCs for Zoned devices
- convert IO paths to use folio
- reduce expensive checkpoint trigger frequency
- allow F2FS_IPU_NOCACHE for pinned file
- forcibly migrate to secure space for zoned device file pinning
- get rid of buffer_head use
- add write priority option based on zone UFS
- get rid of online repair on corrupted directory
Bug fix:
- fix to don't panic system for no free segment fault injection
- fix to don't set SB_RDONLY in f2fs_handle_critical_error()
- avoid unused block when dio write in LFS mode
- compress: don't redirty sparse cluster during {,de}compress
- check discard support for conventional zones
- atomic: prevent atomic file from being dirtied before commit
- atomic: fix to check atomic_file in f2fs ioctl interfaces
- atomic: fix to forbid dio in atomic_file
- atomic: fix to truncate pagecache before on-disk metadata truncation
- atomic: create COW inode from parent dentry
- atomic: fix to avoid racing w/ GC
- atomic: require FMODE_WRITE for atomic write ioctls
- fix to wait page writeback before setting gcing flag
- fix to avoid racing in between read and OPU dio write, dio completion
- fix several potential integer overflows in file offsets and dir_block_index
- fix to avoid use-after-free in f2fs_stop_gc_thread()
As usual, there are several code clean-ups and refactorings.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmbyJn8ACgkQQBSofoJI
UNJz9Q/+LDDJjD6xh0Fs6H2NeltFNbuNmS79kN5oG0xfjIAiKXE1lsw2n2gwrDKv
EHKUPa2D4Rztckp8EFF6/st2SXVXH5U7YY2z5jkIUFccbeod+CrK9AGHjJe54iXL
D0ulbgE2jR8uuwAkNEooNJK1a5ZhZLVy+fXknNIgKoqx31YYE+mKOJaaJFbCxvNT
grZdH9ApweJB8L4A4ebwIWyBy8Bh4lhr2d6ngsx6HA5TFA2Ay0V9kaoZrLPZvJhv
3qJ+xu3oeGJbP4e5h5g9omafBskI1pfEE6/sY94o1Zy5Ahx3iCR6U/qehtyyU3TF
5QLoMXTvIz0MkRuBaW1XxVDpFevVzUfYmbLycuxjArBtjHnvsdh12DKT1Pk5BDZ4
GgkUyt4pK4PYyEZFtayCleLZljSRzKzi+Y9XEs82z01s41mvx71kz44bR8SPcb1Q
D4VOJld4O4qMmNrZhhwW8sj4UiDVgliURwmpiZwz9zT9fXU/ZPD1gThcfSWJZ/53
rrx87e1Bnyk/cMuN/gxEdVV20nggxng4hl2oDcUzBBV1G1R9I3RZJWQt/YFXpB0O
Whv5pJkV8BZXFWoRmm9cpWe0MslRRhsKBPzcKmlowy/lYdgjpQTmh7TSJ1Teh+2Y
r77XI31Y/ACaKDJsRmUVbtqdM3N/88N97Fa52wOByK0PjMbgM0E=
=EKzY
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"The main changes include converting major IO paths to use folio, and
adding various knobs to control GC more flexibly for Zoned devices.
In addition, there are several patches to address corner cases of
atomic file operations and better support for file pinning on zoned
device.
Enhancement:
- add knobs to tune foreground/background GCs for Zoned devices
- convert IO paths to use folio
- reduce expensive checkpoint trigger frequency
- allow F2FS_IPU_NOCACHE for pinned file
- forcibly migrate to secure space for zoned device file pinning
- get rid of buffer_head use
- add write priority option based on zone UFS
- get rid of online repair on corrupted directory
Bug fixes:
- fix to don't panic system for no free segment fault injection
- fix to don't set SB_RDONLY in f2fs_handle_critical_error()
- avoid unused block when dio write in LFS mode
- compress: don't redirty sparse cluster during {,de}compress
- check discard support for conventional zones
- atomic: prevent atomic file from being dirtied before commit
- atomic: fix to check atomic_file in f2fs ioctl interfaces
- atomic: fix to forbid dio in atomic_file
- atomic: fix to truncate pagecache before on-disk metadata truncation
- atomic: create COW inode from parent dentry
- atomic: fix to avoid racing w/ GC
- atomic: require FMODE_WRITE for atomic write ioctls
- fix to wait page writeback before setting gcing flag
- fix to avoid racing in between read and OPU dio write, dio completion
- fix several potential integer overflows in file offsets and dir_block_index
- fix to avoid use-after-free in f2fs_stop_gc_thread()
As usual, there are several code clean-ups and refactorings"
* tag 'f2fs-for-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (60 commits)
f2fs: allow F2FS_IPU_NOCACHE for pinned file
f2fs: forcibly migrate to secure space for zoned device file pinning
f2fs: remove unused parameters
f2fs: fix to don't panic system for no free segment fault injection
f2fs: fix to don't set SB_RDONLY in f2fs_handle_critical_error()
f2fs: add valid block ratio not to do excessive GC for one time GC
f2fs: create gc_no_zoned_gc_percent and gc_boost_zoned_gc_percent
f2fs: do FG_GC when GC boosting is required for zoned devices
f2fs: increase BG GC migration window granularity when boosted for zoned devices
f2fs: add reserved_segments sysfs node
f2fs: introduce migration_window_granularity
f2fs: make BG GC more aggressive for zoned devices
f2fs: avoid unused block when dio write in LFS mode
f2fs: fix to check atomic_file in f2fs ioctl interfaces
f2fs: get rid of online repaire on corrupted directory
f2fs: prevent atomic file from being dirtied before commit
f2fs: get rid of page->index
f2fs: convert read_node_page() to use folio
f2fs: convert __write_node_page() to use folio
f2fs: convert f2fs_write_data_page() to use folio
...
* Bug fix: Avoid evaluating non-mount ctl_tables as a sysctl_mount_point by
removing the unlikely (but possible) chance that the permanently empty
ctl_table array shares its address with another ctl_table.
* Update Joel Granados' contact info in MAINTAINERS.
Testing
* Bug fix merged to linux-next after 6.11-rc5
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmbtUfsACgkQupfNUreW
QU+ptAv+JEbJR+VMLDZk3xAm1iRvqyzpVank8pJGgdp3kSYPY1KdJqAHo+ZXAD1r
jtJMwyI4ELIQ7NnLq50qUCPScdBpXpG73QVp8Foip43x0E/pOxiSiz5C0m4NbRsU
cKQuiRauOpfqrNvh22TVB3jjpL4cZbRpyP1kpMgf8edn3YlhYhJ04oXjVUk+zMMA
muUifAlUAUMhQiHOqLynA7ZObwlqY+QoiJF8v4IPAcynYk0ZKNBmqAAMdIoBF4c8
rrgtIt/xlaJB6a1usS9B5xFbamrlaNPaiA3ul+lUeMLcArtoB5gwMTz+AGLu46aB
+RCjhXmY3LBvMD16eyY9cge/3Qud2jyoyh0Qp9wHhgJwsaDWlG51qzi+PHr8ZtV5
jdtk8QzHZs07JFsE2HabppCtrgNxnwPiwBS7xm45u/p8dZ7buCvtZm1MEjgHu9M/
6iVSEs+/S3+AZMz+/K3WaqaP/kYhXklwT16xN7b1+oxLNFfJ2RLcx0Xc7yZpGwRM
8YbaGnR2
=85Xa
-----END PGP SIGNATURE-----
Merge tag 'sysctl-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl update from Joel Granados:
- Avoid evaluating non-mount ctl_tables as a sysctl_mount_point by
removing the unlikely (but possible) chance that the permanently
empty ctl_table array shares its address with another ctl_table
- Update Joel Granados' contact info in MAINTAINERS
* tag 'sysctl-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
MAINTAINERS: update email for Joel Granados
sysctl: avoid spurious permanent empty tables
This may be a typo. The comment has said shared locks are
not allowed when this bit is set. If using shared lock, the
wait in `fuse_file_cached_io_open` may be forever.
Fixes: 205c1d8026 ("fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP")
CC: stable@vger.kernel.org # v6.9
Signed-off-by: yangyun <yangyun50@huawei.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The (!fiq->connected) check was moved into the queuing method resulting in
the following:
Fixes: 5de8acb41c ("fuse: cleanup request queuing towards virtiofs")
Reported-by: Lai, Yi <yi1.lai@linux.intel.com>
Closes: https://lore.kernel.org/all/ZvFEAM6JfrBKsOU0@ly-workstation/
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL"
RPC method that allows the Linux NFS client to verify the local Linux
NFS server can see the nonce (single-use UUID) the client generated and
made available in nfs_common for subsequent lookup and verification
by the NFS server. If matched, the NFS server populates members in the
nfs_uuid_t struct. The NFS client then transfers these nfs_uuid_t
struct member pointers to the nfs_client struct and cleans up the
nfs_uuid_t struct. See: fs/nfs/localio.c:nfs_local_probe()
This protocol isn't part of an IETF standard, nor does it need to be
considering it is Linux-to-Linux auxiliary RPC protocol that amounts
to an implementation detail.
Localio is only supported when UNIX-style authentication (AUTH_UNIX, aka
AUTH_SYS) is used (enforced by fs/nfs/localio.c:nfs_local_probe()).
The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of
the fixed UUID_SIZE (16 bytes). The fixed size opaque encode and decode
XDR methods are used instead of the less efficient variable sized
methods.
Having a nonce (single-use uuid) is better than using the same uuid
for the life of the server, and sending it proactively by client
rather than reactively by the server is also safer.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Co-developed-by: NeilBrown <neilb@suse.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
For localio access, don't call filesystem read() and write() routines
directly. This solves two problems:
1) localio writes need to use a normal (non-memreclaim) unbound
workqueue. This avoids imposing new requirements on how underlying
filesystems process frontend IO, which would cause a large amount
of work to update all filesystems. Without this change, when XFS
starts getting low on space, XFS flushes work on a non-memreclaim
work queue, which causes a priority inversion problem:
00573 workqueue: WQ_MEM_RECLAIM writeback:wb_workfn is flushing !WQ_MEM_RECLAIM xfs-sync/vdc:xfs_flush_inodes_worker
00573 WARNING: CPU: 6 PID: 8525 at kernel/workqueue.c:3706 check_flush_dependency+0x2a4/0x328
00573 Modules linked in:
00573 CPU: 6 PID: 8525 Comm: kworker/u71:5 Not tainted 6.10.0-rc3-ktest-00032-g2b0a133403ab #18502
00573 Hardware name: linux,dummy-virt (DT)
00573 Workqueue: writeback wb_workfn (flush-0:33)
00573 pstate: 400010c5 (nZcv daIF -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00573 pc : check_flush_dependency+0x2a4/0x328
00573 lr : check_flush_dependency+0x2a4/0x328
00573 sp : ffff0000c5f06bb0
00573 x29: ffff0000c5f06bb0 x28: ffff0000c998a908 x27: 1fffe00019331521
00573 x26: ffff0000d0620900 x25: ffff0000c5f06ca0 x24: ffff8000828848c0
00573 x23: 1fffe00018be0d8e x22: ffff0000c1210000 x21: ffff0000c75fde00
00573 x20: ffff800080bfd258 x19: ffff0000cad63400 x18: ffff0000cd3a4810
00573 x17: 0000000000000000 x16: 0000000000000000 x15: ffff800080508d98
00573 x14: 0000000000000000 x13: 204d49414c434552 x12: 1fffe0001b6eeab2
00573 x11: ffff60001b6eeab2 x10: dfff800000000000 x9 : ffff60001b6eeab3
00573 x8 : 0000000000000001 x7 : 00009fffe491154e x6 : ffff0000db775593
00573 x5 : ffff0000db775590 x4 : ffff0000db775590 x3 : 0000000000000000
00573 x2 : 0000000000000027 x1 : ffff600018be0d62 x0 : dfff800000000000
00573 Call trace:
00573 check_flush_dependency+0x2a4/0x328
00573 __flush_work+0x184/0x5c8
00573 flush_work+0x18/0x28
00573 xfs_flush_inodes+0x68/0x88
00573 xfs_file_buffered_write+0x128/0x6f0
00573 xfs_file_write_iter+0x358/0x448
00573 nfs_local_doio+0x854/0x1568
00573 nfs_initiate_pgio+0x214/0x418
00573 nfs_generic_pg_pgios+0x304/0x480
00573 nfs_pageio_doio+0xe8/0x240
00573 nfs_pageio_complete+0x160/0x480
00573 nfs_writepages+0x300/0x4f0
00573 do_writepages+0x12c/0x4a0
00573 __writeback_single_inode+0xd4/0xa68
00573 writeback_sb_inodes+0x470/0xcb0
00573 __writeback_inodes_wb+0xb0/0x1d0
00573 wb_writeback+0x594/0x808
00573 wb_workfn+0x5e8/0x9e0
00573 process_scheduled_works+0x53c/0xd90
00573 worker_thread+0x370/0x8c8
00573 kthread+0x258/0x2e8
00573 ret_from_fork+0x10/0x20
2) Some filesystem writeback routines can end up taking up a lot of
stack space (particularly XFS). Instead of risking running over
due to the extra overhead from the NFS stack, we should just call
these routines from a workqueue job. Since we need to do this to
address 1) above we're able to avoid possibly blowing the stack
"for free".
Use of dedicated workqueues improves performance over using the
system_unbound_wq.
Also, the creds used to open the file are used to override_creds() in
both nfs_local_call_read() and nfs_local_call_write() -- otherwise the
workqueue could have elevated capabilities (which the caller may not).
Lastly, care is taken to set PF_LOCAL_THROTTLE | PF_MEMALLOC_NOIO in
nfs_do_local_write() to avoid writeback deadlocks.
The PF_LOCAL_THROTTLE flag prevents deadlocks in balance_dirty_pages()
by causing writes to only be throttled against other writes to the
same bdi (it keeps the throttling local). Normally all writes to
bdi(s) are throttled equally (after throughput factors are allowed
for).
The PF_MEMALLOC_NOIO flag prevents the lower filesystem IO from
causing memory reclaim to re-enter filesystems or IO devices and so
prevents deadlocks from occuring where IO that cleans pages is
waiting on IO to complete.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Co-developed-by: NeilBrown <neilb@suse.de>
Signed-off-by: NeilBrown <neilb@suse.de> # eliminated wait_for_completion
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
If the DS is local to this client use localio to write the data.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Try a local open of the file being written to, and if it succeeds,
then use localio to issue IO.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Add client support for bypassing NFS for localhost reads, writes, and
commits. This is only useful when the client and the server are
running on the same host.
nfs_local_probe() is stubbed out, later commits will enable client and
server handshake via a Linux-only LOCALIO auxiliary RPC protocol.
This has dynamic binding with the nfsd module (via nfs_localio module
which is part of nfs_common). LOCALIO will only work if nfsd is
already loaded.
The "localio_enabled" nfs kernel module parameter can be used to
disable and enable the ability to use LOCALIO support.
CONFIG_NFS_LOCALIO enables NFS client support for LOCALIO.
Lastly, LOCALIO uses an nfsd_file to initiate all IO. To make proper
use of nfsd_file (and nfsd's filecache) its lifetime (duration before
nfsd_file_put is called) must extend until after commit, read and
write operations. So rather than immediately drop the nfsd_file
reference in nfs_local_open_fh(), that doesn't happen until
nfs_local_pgio_release() for read/write and not until
nfs_local_release_commit_data() for commit. The same applies to the
reference held on nfsd's nn->nfsd_serv. Both objects' lifetimes and
associated references are managed through calls to
nfs_to->nfsd_file_put_local().
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de> # nfs_open_local_fh
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
The nfsd_file will be passed, in future commits, by callers
that enable LOCALIO support (for both regular NFS and pNFS IO).
[Derived from patch authored by Weston Andros Adamson, but switched
from passing struct file to struct nfsd_file]
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL"
RPC method that allows the Linux NFS client to verify the local Linux
NFS server can see the nonce (single-use UUID) the client generated and
made available in nfs_common. The server expects this protocol to use
the same transport as NFS and NFSACL for its RPCs. This protocol
isn't part of an IETF standard, nor does it need to be considering it
is Linux-to-Linux auxiliary RPC protocol that amounts to an
implementation detail.
The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of
the fixed UUID_SIZE (16 bytes). The fixed size opaque encode and decode
XDR methods are used instead of the less efficient variable sized
methods.
The RPC program number for the NFS_LOCALIO_PROGRAM is 400122 (as assigned
by IANA, see https://www.iana.org/assignments/rpc-program-numbers/ ):
Linux Kernel Organization 400122 nfslocalio
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
[neilb: factored out and simplified single localio protocol]
Co-developed-by: NeilBrown <neilb@suse.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Add server support for bypassing NFS for localhost reads, writes, and
commits. This is only useful when both the client and server are
running on the same host.
If nfsd_open_local_fh() fails then the NFS client will both retry and
fallback to normal network-based read, write and commit operations if
localio is no longer supported.
Care is taken to ensure the same NFS security mechanisms are used
(authentication, etc) regardless of whether localio or regular NFS
access is used. The auth_domain established as part of the traditional
NFS client access to the NFS server is also used for localio. Store
auth_domain for localio in nfsd_uuid_t and transfer it to the client
if it is local to the server.
Relative to containers, localio gives the client access to the network
namespace the server has. This is required to allow the client to
access the server's per-namespace nfsd_net struct.
This commit also introduces the use of NFSD's percpu_ref to interlock
nfsd_destroy_serv and nfsd_open_local_fh, to ensure nn->nfsd_serv is
not destroyed while in use by nfsd_open_local_fh and other LOCALIO
client code.
CONFIG_NFS_LOCALIO enables NFS server support for LOCALIO.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Co-developed-by: NeilBrown <neilb@suse.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
The next commit will introduce nfsd_open_local_fh() which returns an
nfsd_file structure. This commit exposes LOCALIO's required NFSD
symbols to the NFS client:
- Make nfsd_open_local_fh() symbol and other required NFSD symbols
available to NFS in a global 'nfs_to' nfsd_localio_operations
struct (global access suggested by Trond, nfsd_localio_operations
suggested by NeilBrown). The next commit will also introduce
nfsd_localio_ops_init() that init_nfsd() will call to initialize
'nfs_to'.
- Introduce nfsd_file_file() that provides access to nfsd_file's
backing file. Keeps nfsd_file structure opaque to NFS client (as
suggested by Jeff Layton).
- Introduce nfsd_file_put_local() that will put the reference to the
nfsd_file's associated nn->nfsd_serv and then put the reference to
the nfsd_file (as suggested by NeilBrown).
Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> # nfs_to
Suggested-by: NeilBrown <neilb@suse.de> # nfsd_localio_operations
Suggested-by: Jeff Layton <jlayton@kernel.org> # nfsd_file_file
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
fs/nfs_common/nfslocalio.c provides interfaces that enable an NFS
client to generate a nonce (single-use UUID) and associated nfs_uuid_t
struct, register it with nfs_common for subsequent lookup and
verification by the NFS server and if matched the NFS server populates
members in the nfs_uuid_t struct.
nfs_common's nfs_uuids list is the basis for localio enablement, as
such it has members that point to nfsd memory for direct use by the
client (e.g. 'net' is the server's network namespace, through it the
client can access nn->nfsd_serv).
This commit also provides the base nfs_uuid_t interfaces to allow
proper net namespace refcounting for the LOCALIO use case.
CONFIG_NFS_LOCALIO controls the nfs_common, NFS server and NFS client
enablement for LOCALIO. If both NFS_FS=m and NFSD=m then
NFS_COMMON_LOCALIO_SUPPORT=m and nfs_localio.ko is built (and provides
nfs_common's LOCALIO support).
# lsmod | grep nfs_localio
nfs_localio 12288 2 nfsd,nfs
sunrpc 745472 35 nfs_localio,nfsd,auth_rpcgss,lockd,nfsv3,nfs
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Co-developed-by: NeilBrown <neilb@suse.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
A service created with svc_create_pooled() can be given a linked list of
programs and all of these will be served.
Using a linked list makes it cumbersome when there are several programs
that can be optionally selected with CONFIG settings.
After this patch is applied, API consumers must use only
svc_create_pooled() when creating an RPC service that listens for more
than one RPC program.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Introduce nfsd_serv_try_get and nfsd_serv_put and update the nfsd code
to prevent nfsd_destroy_serv from destroying nn->nfsd_serv until any
caller of nfsd_serv_try_get releases their reference using nfsd_serv_put.
A percpu_ref is used to implement the interlock between
nfsd_destroy_serv and any caller of nfsd_serv_try_get.
This interlock is needed to properly wait for the completion of client
initiated localio calls to nfsd (that are _not_ in the context of nfsd).
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
nfsd_file_acquire_local() can be used to look up a file by filehandle
without having a struct svc_rqst. This can be used by NFS LOCALIO to
allow the NFS client to bypass the NFS protocol to directly access a
file provided by the NFS server which is running in the same kernel.
In nfsd_file_do_acquire() care is taken to always use fh_verify() if
rqstp is not NULL (as is the case for non-LOCALIO callers). Otherwise
the non-LOCALIO callers will not supply the correct and required
arguments to __fh_verify (e.g. gssclient isn't passed).
Introduce fh_verify_local() wrapper around __fh_verify to make it
clear that LOCALIO is intended caller.
Also, use GC for nfsd_file returned by nfsd_file_acquire_local. GC
offers performance improvements if/when a file is reopened before
launderette cleans it from the filecache's LRU.
Suggested-by: Jeff Layton <jlayton@kernel.org> # use filecache's GC
Signed-off-by: NeilBrown <neilb@suse.de>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
__fh_verify() offers an interface like fh_verify() but doesn't require
a struct svc_rqst *, instead it also takes the specific parts as
explicit required arguments. So it is safe to call __fh_verify() with
a NULL rqstp, but the net, cred, and client args must not be NULL.
__fh_verify() does not use SVC_NET(), nor does the functions it calls.
Rather than using rqstp->rq_client pass the client and gssclient
explicitly to __fh_verify and then to nfsd_set_fh_dentry().
Lastly, it should be noted that the previous commit prepared for 4
associated tracepoints to only be used if rqstp is not NULL (this is a
stop-gap that should be properly fixed so localio also benefits from
the utility these tracepoints provide when debugging fh_verify
issues).
Signed-off-by: NeilBrown <neilb@suse.de>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
LOCALIO will be able to call fh_verify() with a NULL rqstp. In this
case, the existing trace points need to be skipped because they
want to dereference the address fields in the passed-in rqstp.
Temporarily make these trace points conditional to avoid a seg
fault in this case. Putting the "rqstp != NULL" check in the trace
points themselves makes the check more efficient.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Currently, fh_verify() makes some daring assumptions about which
version of file handle the caller wants, based on the things it can
find in the passed-in rqstp. The about-to-be-introduced LOCALIO use
case sometimes has no svc_rqst context, so this logic won't work in
that case.
Instead, examine the passed-in file handle. It's .max_size field
should carry information to allow nfsd_set_fh_dentry() to initialize
the file handle appropriately.
The file handle used by lockd and the one created by write_filehandle
never need any of the version-specific fields (which affect things
like write and getattr requests and pre/post attributes).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
There are several places where __fh_verify unconditionally dereferences
rqstp to check that the connection is suitably secure. They look at
rqstp->rq_xprt which is not meaningful in the target use case of
"localio" NFS in which the client talks directly to the local server.
Prepare these to always succeed when rqstp is NULL.
Signed-off-by: NeilBrown <neilb@suse.de>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
LOCALIO-initiated open operations are not running in an nfsd thread
and thus do not have an associated svc_rqst context.
Signed-off-by: NeilBrown <neilb@suse.de>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Eliminates duplicate functions in various files to allow for
additional callers.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Common nfs4_stat_to_errno() is used by fs/nfs/nfs4xdr.c and will be
used by fs/nfs/localio.c
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>