instead of doing it in the callsites for open_shroot.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The cost is that we might need to flip '/' to '\\' in more than
just the prefix. Needs profiling, but I suspect that we won't
get slowdown on that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
build_path_from_dentry() open-codes dentry_path_raw(). The reason
we can't use dentry_path_raw() in there (and postprocess the
result as needed) is that the callers of build_path_from_dentry()
expect that the object to be freed on cleanup and the string to
be used are at the same address. That's painful, since the path
is naturally built end-to-beginning - we start at the leaf and
go through the ancestors, accumulating the pathname.
Life would be easier if we left the buffer allocation to callers.
It wouldn't be exact-sized buffer, but none of the callers keep
the result for long - it's always freed before the caller returns.
So there's no need to do exact-sized allocation; better use
__getname()/__putname(), same as we do for pathname arguments
of syscalls. What's more, there's no need to do allocation under
spinlocks, so GFP_ATOMIC is not needed.
Next patch will replace the open-coded dentry_path_raw() (in
build_path_from_dentry_optional_prefix()) with calling the real
thing. This patch only introduces wrappers for allocating/freeing
the buffers and switches to new calling conventions:
build_path_from_dentry(dentry, buf)
expects buf to be address of a page-sized object or NULL,
return value is a pathname built inside that buffer on success,
ERR_PTR(-ENOMEM) if buf is NULL and ERR_PTR(-ENAMETOOLONG) if
the pathname won't fit into page. Note that we don't need to
check for failure when allocating the buffer in the caller -
build_path_from_dentry() will do the right thing.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
As it is, it takes const char * and, in some cases, stores it in
caller's variable that is plain char *. Fortunately, none of the
callers actually proceeded to modify the string via now-non-const
alias, but that's trouble waiting to happen.
It's easy to do properly, anyway...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
strndup(s, strlen(s)) is a highly unidiomatic way to spell strdup(s);
it's *NOT* safer in any way, since strlen() is just as sensitive to
NUL-termination as strdup() is.
strndup() is for situations when you need a copy of a known-sized
substring, not a magic security juju to drive the bad spirits away.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steve French <stfrench@microsoft.com>
While reviewing a patch clarifying locks and locking hierarchy I
realized some locks were unused.
This commit removes old data and code that isn't actually used
anywhere, or hidden in ifdefs which cannot be enabled from the kernel
config.
* The uid/gid trees and associated locks are left-overs from when
uid/sid mapping had an extra caching layer on top of the keyring and
are now unused.
See commit faa65f07d2 ("cifs: simplify id_to_sid and sid_to_id mapping code")
from 2012.
* cifs_oplock_break_ops is a left-over from when slow_work was remplaced
by regular workqueue and is now unused.
See commit 9b64697246 ("cifs: use workqueue instead of slow-work")
from 2010.
* CIFSSMBSetAttrLegacy is SMB1 cruft dealing with some legacy
NT4/Win9x behaviour.
* Remove CONFIG_CIFS_DNOTIFY_EXPERIMENTAL left-overs. This was already
partially removed in 392e1c5dc9 ("cifs: rename and clarify CIFS_ASYNC_OP and CIFS_NO_RESP")
from 2019. Kill it completely.
* Another candidate that was considered but spared is
CONFIG_CIFS_NFSD_EXPORT which has an empty implementation and cannot
be enabled by a config option (although it is listed but disabled with
"BROKEN" as a dep). It's unclear whether this could even function
today in its current form but it has it's own .c file and Kconfig
entry which is a bit more involved to remove and might make a come
back?
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].
Also, this helps with the ongoing efforts to enable -Warray-bounds by
fixing the following warning:
CC [M] fs/cifs/cifssmb.o
fs/cifs/cifssmb.c: In function ‘CIFSFindNext’:
fs/cifs/cifssmb.c:4636:23: warning: array subscript 1 is above array bounds of ‘char[1]’ [-Warray-bounds]
4636 | pSMB->ResumeFileName[name_len+1] = 0;
| ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays
Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/109
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
struct cifs_writedata is declared twice.
One is declared at 209th line.
And struct cifs_writedata is defined blew.
The declaration hear is not needed. Remove the duplicate.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
This commit doesn't change the logic of SWN.
Add dummy implementation of SWN functions when SWN is disabled instead
of using ifdef sections.
The dummy functions get optimized out, this leads to clearer code and
compile time type-checking regardless of config options with no
runtime penalty.
Leave the simple ifdefs section as-is.
A single bitfield (bool foo:1) on its own will use up one int. Move
tcon->use_witness out of ifdefs with the other tcon bitfields.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Samuel Cabrero <scabrero@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
[MS-SMB2] protocol specification was recently updated to include
new flags, new negotiate context and some minor changes to fields.
Update smb2pdu.h structure definitions to match the newest version
of the protocol specification. Updates to the compression context
values will be in a followon patch.
Signed-off-by: Steve French <stfrench@microsoft.com>
A few of the semaphores had been removed, and one additional one
needed to be noted in the comments.
Signed-off-by: Steve French <stfrench@microsoft.com>
Fix the following gcc warning:
fs/cifs/cifsacl.c:1097:8: warning: variable ‘nmode’ set but not used
[-Wunused-but-set-variable].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Commit 653a5efb84 ("cifs: update super_operations to show_devname")
introduced the display of devname for cifs mounts. However, when mounting
a share which has a whitespace in the name, that exact share name is also
displayed in mountinfo. Make sure that all whitespace is escaped.
Signed-off-by: Maciek Borzecki <maciek.borzecki@gmail.com>
CC: <stable@vger.kernel.org> # 5.11+
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
struct cifs_readdata is declared twice. One is declared
at 208th line.
And struct cifs_readdata is defined blew.
The declaration here is not needed. Remove the duplicate.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
On cifs_reconnect, make sure that DNS resolution happens again.
It could be the cause of connection to go dead in the first place.
This also contains the fix for a build issue identified by Intel bot.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: <stable@vger.kernel.org> # 5.11+
Signed-off-by: Steve French <stfrench@microsoft.com>
There were two problems (one of which could cause data corruption)
that were noticed with duplicate extents (ie reflink)
when debugging why various xfstests were being incorrectly skipped
(e.g. generic/138, generic/140, generic/142). First, we were not
updating the file size locally in the cache when extending a
file due to reflink (it would refresh after actimeo expires)
but xfstest was checking the size immediately which was still
0 so caused the test to be skipped. Second, we were setting
the target file size (which could shrink the file) in all cases
to the end of the reflinked range rather than only setting the
target file size when reflink would extend the file.
CC: <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Make SMB2 not print out an error when an oplock break is received for an
unknown handle, similar to SMB1. The debug message which is printed for
these unknown handles may also be misleading, so fix that too.
The SMB2 lease break path is not affected by this patch.
Without this, a program which writes to a file from one thread, and
opens, reads, and writes the same file from another thread triggers the
below errors several times a minute when run against a Samba server
configured with "smb2 leases = no".
CIFS: VFS: \\192.168.0.1 No task to wake, unknown frame received! NumMids 2
00000000: 424d53fe 00000040 00000000 00000012 .SMB@...........
00000010: 00000001 00000000 ffffffff ffffffff ................
00000020: 00000000 00000000 00000000 00000000 ................
00000030: 00000000 00000000 00000000 00000000 ................
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
RHBZ: 1933527
Under SMB1 + POSIX, if an inode is reused on a server after we have read and
cached a part of a file, when we then open the new file with the
re-cycled inode there is a chance that we may serve the old data out of cache
to the application.
This only happens for SMB1 (deprecated) and when posix are used.
The simplest solution to avoid this race is to force a revalidate
on smb1-posix open.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
My recent fixes to cifsacl to maintain inherited ACEs had
regressed modefromsid when an older ACL already exists.
Found testing xfstest 495 with modefromsid mount option
Fixes: f506550889 ("cifs: Retain old ACEs when converting between mode bits and ACL")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
For AES256 encryption (GCM and CCM), we need to adjust the size of a few
fields to 32 bytes instead of 16 to accommodate the larger keys.
Also, the L value supplied to the key generator needs to be changed from
to 256 when these algorithms are used.
Keeping the ioctl struct for dumping keys of the same size for now.
Will send out a different patch for that one.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: <stable@vger.kernel.org> # v5.10+
Signed-off-by: Steve French <stfrench@microsoft.com>
Applications that create and extend and write to a file do not
expect to see 0 allocation size. When file is extended,
set its allocation size to a plausible value until we have a
chance to query the server for it. When the file is cached
this will prevent showing an impossible number of allocated
blocks (like 0). This fixes e.g. xfstests 614 which does
1) create a file and set its size to 64K
2) mmap write 64K to the file
3) stat -c %b for the file (to query the number of allocated blocks)
It was failing because we returned 0 blocks. Even though we would
return the correct cached file size, we returned an impossible
allocation size.
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: <stable@vger.kernel.org>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
If CONFIG_CIFS_ROOT is not set, rootfs mount option is invalid
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
CC: <stable@vger.kernel.org> # v5.11
Signed-off-by: Steve French <stfrench@microsoft.com>
A typo is found out by codespell tool in 251th lines of cifs_swn.c:
$ codespell ./fs/cifs/
./cifs_swn.c:251: funciton ==> function
Fix a typo found by codespell.
Signed-off-by: Liu xuzhi <liu.xuzhi@zte.com.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>
smb311_update_preauth_hash() uses the shash in server->secmech without
appropriate locking, and this can lead to sessions corrupting each
other's preauth hashes.
The following script can easily trigger the problem:
#!/bin/sh -e
NMOUNTS=10
for i in $(seq $NMOUNTS);
mkdir -p /tmp/mnt$i
umount /tmp/mnt$i 2>/dev/null || :
done
while :; do
for i in $(seq $NMOUNTS); do
mount -t cifs //192.168.0.1/test /tmp/mnt$i -o ... &
done
wait
for i in $(seq $NMOUNTS); do
umount /tmp/mnt$i
done
done
Usually within seconds this leads to one or more of the mounts failing
with the following errors, and a "Bad SMB2 signature for message" is
seen in the server logs:
CIFS: VFS: \\192.168.0.1 failed to connect to IPC (rc=-13)
CIFS: VFS: cifs_mount failed w/return code = -13
Fix it by holding the server mutex just like in the other places where
the shashes are used.
Fixes: 8bd68c6e47 ("CIFS: implement v3.11 preauth integrity")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
CC: <stable@vger.kernel.org>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
After the fix for retaining externally set ACEs with cifsacl and
modefromsid,idsfromsid, there was an issue in populating the
inherited ACEs after setting the ACEs introduced by these two modes.
Fixed this by updating the ACE pointer again after the call to
populate_new_aces.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Rohith Surabattula <rohiths@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
In case of interrupted syscalls, prevent sending CLOSE commands for
compound CREATE+CLOSE requests by introducing an
CIFS_CP_CREATE_CLOSE_OP flag to indicate lower layers that it should
not send a CLOSE command to the MIDs corresponding the compound
CREATE+CLOSE request.
A simple reproducer:
#!/bin/bash
mount //server/share /mnt -o username=foo,password=***
tc qdisc add dev eth0 root netem delay 450ms
stat -f /mnt &>/dev/null & pid=$!
sleep 0.01
kill $pid
tc qdisc del dev eth0 root
umount /mnt
Before patch:
...
6 0.256893470 192.168.122.2 → 192.168.122.15 SMB2 402 Create Request File: ;GetInfo Request FS_INFO/FileFsFullSizeInformation;Close Request
7 0.257144491 192.168.122.15 → 192.168.122.2 SMB2 498 Create Response File: ;GetInfo Response;Close Response
9 0.260798209 192.168.122.2 → 192.168.122.15 SMB2 146 Close Request File:
10 0.260841089 192.168.122.15 → 192.168.122.2 SMB2 130 Close Response, Error: STATUS_FILE_CLOSED
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
In cifs_statfs(), if server->ops->queryfs is not NULL, then we should
use its return value rather than always returning 0. Instead, use rc
variable as it is properly set to 0 in case there is no
server->ops->queryfs.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
A customer has reported that their dmesg were being flooded by
CIFS: VFS: \\server Cancelling wait for mid xxx cmd: a
CIFS: VFS: \\server Cancelling wait for mid yyy cmd: b
CIFS: VFS: \\server Cancelling wait for mid zzz cmd: c
because some processes that were performing statfs(2) on the share had
been interrupted due to their automount setup when certain users
logged in and out.
Change it to FYI as they should be mostly informative rather than
error messages.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The MIDs are mostly printed as decimal, so let's make it consistent.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When doing a large read or write workload we only
very gradually increase the number of credits
which can cause problems with parallelizing large i/o
(I/O ramps up more slowly than it should for large
read/write workloads) especially with multichannel
when the number of credits on the secondary channels
starts out low (e.g. less than about 130) or when
recovering after server throttled back the number
of credit.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
With multichannel, operations like the queries
from "ls -lR" can cause all credits to be used and
errors to be returned since max_credits was not
being set correctly on the secondary channels and
thus the client was requesting 0 credits incorrectly
in some cases (which can lead to not having
enough credits to perform any operation on that
channel).
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
CC: <stable@vger.kernel.org> # v5.8+
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmA4bocACgkQiiy9cAdy
T1HuMwv/bmZ53ilDwhCph/UKoLJm/bRyvCp+GBECtLLS/C/4qz5IBLpPPr2yhOyH
gmkeCZZWhj0nzGAYxhVDAdBz9IPEA7bae503IaOuk5uXaCXC8htsq/Cd7qpJmHlf
5vCdBTmHBiUt02dUcZ9A3bm855xJLEINHH9YdJM157ysqLgttIibLQB0F/gJ49DR
QIIdq7sZNJXcTgRsUzJZNnrWLDi2oVIoUlq5M6d8ypmZC0ArPNfrSafjW6h5rqpj
UYBwtUDNwQiS0lgwR4mji4PCen0GGwMFtyVDOpdJLJq3fO995yse2BRk0BFHVH1i
xfAskQjkxAHEcfzQC1cM4ouT/WYu8nHaLK1vp/1lVr93mo8KqSX+SW/bLsXfpQkm
w//xMy94HdM2pgyM6J1pYnKPb7s/DG19RYPktQ5oYn0fR5qYlqALAmd02JRjO3xV
cbjbmWXXzFFsFJc5MJmM6wVLJzRb4a50SN1W37aHNXWi8ktpsaNzz33LGw1pF2OT
K6P2DqJe
=RQo/
-----END PGP SIGNATURE-----
Merge tag '5.12-smb3-part1' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs updates from Steve French:
- improvements to mode bit conversion, chmod and chown when using
cifsacl mount option
- two new mount options for controlling attribute caching
- improvements to crediting and reconnect, improved debugging
- reconnect fix
- add SMB3.1.1 dialect to default dialects for vers=3
* tag '5.12-smb3-part1' of git://git.samba.org/sfrench/cifs-2.6: (27 commits)
cifs: update internal version number
cifs: use discard iterator to discard unneeded network data more efficiently
cifs: introduce helper for finding referral server to improve DFS target resolution
cifs: check all path components in resolved dfs target
cifs: fix DFS failover
cifs: fix nodfs mount option
cifs: fix handling of escaped ',' in the password mount argument
cifs: Add new parameter "acregmax" for distinct file and directory metadata timeout
cifs: convert revalidate of directories to using directory metadata cache timeout
cifs: Add new mount parameter "acdirmax" to allow caching directory metadata
cifs: If a corrupted DACL is returned by the server, bail out.
cifs: minor simplification to smb2_is_network_name_deleted
TCON Reconnect during STATUS_NETWORK_NAME_DELETED
cifs: cleanup a few le16 vs. le32 uses in cifsacl.c
cifs: Change SIDs in ACEs while transferring file ownership.
cifs: Retain old ACEs when converting between mode bits and ACL.
cifs: Fix cifsacl ACE mask for group and others.
cifs: clarify hostname vs ip address in /proc/fs/cifs/DebugData
cifs: change confusing field serverName (to ip_addr)
cifs: Fix inconsistent IS_ERR and PTR_ERR
...
The iterator, ITER_DISCARD, that can only be used in READ mode and
just discards any data copied to it, was added to allow a network
filesystem to discard any unwanted data sent by a server.
Convert cifs_discard_from_socket() to use this.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Some servers seem to mistakenly report different values for
capabilities and share flags, so we can't always rely on those values
to decide whether the resolved target can handle any new DFS
referrals.
Add a new helper is_referral_server() to check if all resolved targets
can handle new DFS referrals by directly looking at the
GET_DFS_REFERRAL.ReferralHeaderFlags value as specified in MS-DFSC
2.2.4 RESP_GET_DFS_REFERRAL in addition to is_tcon_dfs().
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@vger.kernel.org # 5.11
Signed-off-by: Steve French <stfrench@microsoft.com>
Handle the case where a resolved target share is like
//server/users/dir, and the user "foo" has no read permission to
access the parent folder "users" but has access to the final path
component "dir".
is_path_remote() already implements that, so call it directly.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@vger.kernel.org # 5.11
Signed-off-by: Steve French <stfrench@microsoft.com>
In do_dfs_failover(), the mount_get_conns() function requires the full
fs context in order to get new connection to server, so clone the
original context and change it accordingly when retrying the DFS
targets in the referral.
If failover was successful, then update original context with the new
UNC, prefix path and ip address.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@vger.kernel.org # 5.11
Signed-off-by: Steve French <stfrench@microsoft.com>
Skip DFS resolving when mounting with 'nodfs' even if
CONFIG_CIFS_DFS_UPCALL is enabled.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@vger.kernel.org # 5.11
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Passwords can contain ',' which are also used as the separator between
mount options. Mount.cifs will escape all ',' characters as the string ",,".
Update parsing of the mount options to detect ",," and treat it as a single
'c' character.
Fixes: 24e0a1eff9 ("cifs: switch to new mount api")
Cc: stable@vger.kernel.org # 5.11
Reported-by: Simon Taylor <simon@simon-taylor.me.uk>
Tested-by: Simon Taylor <simon@simon-taylor.me.uk>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The new optional mount parameter "acregmax" allows a different
timeout for file metadata ("acdirmax" now allows controlling timeout
for directory metadata). Setting "actimeo" still works as before,
and changes timeout for both files and directories, but
specifying "acregmax" or "acdirmax" allows overriding the
default more granularly which can be a big performance benefit
on some workloads. "acregmax" is already used by NFS as a mount
parameter (albeit with a larger default and thus looser caching).
Suggested-by: Tom Talpey <tom@talpey.com>
Reviewed-By: Tom Talpey <tom@talpey.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The new optional mount parm, "acdirmax" allows caching the metadata
for a directory longer than file metadata, which can be very helpful
for performance. Convert cifs_inode_needs_reval to check acdirmax
for revalidating directory metadata.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-By: Tom Talpey <tom@talpey.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
nfs and cifs on Linux currently have a mount parameter "actimeo" to control
metadata (attribute) caching but cifs does not have additional mount
parameters to allow distinguishing between caching directory metadata
(e.g. needed to revalidate paths) and that for files.
Add new mount parameter "acdirmax" to allow caching metadata for
directories more loosely than file data. NFS adjusts metadata
caching from acdirmin to acdirmax (and another two mount parms
for files) but to reduce complexity, it is safer to just introduce
the one mount parm to allow caching directories longer. The
defaults for acdirmax and actimeo (for cifs.ko) are conservative,
1 second (NFS defaults acdirmax to 60 seconds). For many workloads,
setting acdirmax to a higher value is safe and will improve
performance. This patch leaves unchanged the default values
for caching metadata for files and directories but gives the
user more flexibility in adjusting them safely for their workload
via the new mount parm.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-By: Tom Talpey <tom@talpey.com>
Static code analysis reported a possible null pointer dereference
in my last commit:
cifs: Retain old ACEs when converting between mode bits and ACL.
This could happen if the DACL returned by the server is corrupted.
We were trying to continue by assuming that the file has empty DACL.
We should bail out with an error instead.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reported-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Rohith Surabattula <rohiths@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYCegywAKCRCRxhvAZXjc
ouJ6AQDlf+7jCQlQdeKKoN9QDFfMzG1ooemat36EpRRTONaGuAD8D9A4sUsG4+5f
4IU5Lj9oY4DEmF8HenbWK2ZHsesL2Qg=
=yPaw
-----END PGP SIGNATURE-----
Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull idmapped mounts from Christian Brauner:
"This introduces idmapped mounts which has been in the making for some
time. Simply put, different mounts can expose the same file or
directory with different ownership. This initial implementation comes
with ports for fat, ext4 and with Christoph's port for xfs with more
filesystems being actively worked on by independent people and
maintainers.
Idmapping mounts handle a wide range of long standing use-cases. Here
are just a few:
- Idmapped mounts make it possible to easily share files between
multiple users or multiple machines especially in complex
scenarios. For example, idmapped mounts will be used in the
implementation of portable home directories in
systemd-homed.service(8) where they allow users to move their home
directory to an external storage device and use it on multiple
computers where they are assigned different uids and gids. This
effectively makes it possible to assign random uids and gids at
login time.
- It is possible to share files from the host with unprivileged
containers without having to change ownership permanently through
chown(2).
- It is possible to idmap a container's rootfs and without having to
mangle every file. For example, Chromebooks use it to share the
user's Download folder with their unprivileged containers in their
Linux subsystem.
- It is possible to share files between containers with
non-overlapping idmappings.
- Filesystem that lack a proper concept of ownership such as fat can
use idmapped mounts to implement discretionary access (DAC)
permission checking.
- They allow users to efficiently changing ownership on a per-mount
basis without having to (recursively) chown(2) all files. In
contrast to chown (2) changing ownership of large sets of files is
instantenous with idmapped mounts. This is especially useful when
ownership of a whole root filesystem of a virtual machine or
container is changed. With idmapped mounts a single syscall
mount_setattr syscall will be sufficient to change the ownership of
all files.
- Idmapped mounts always take the current ownership into account as
idmappings specify what a given uid or gid is supposed to be mapped
to. This contrasts with the chown(2) syscall which cannot by itself
take the current ownership of the files it changes into account. It
simply changes the ownership to the specified uid and gid. This is
especially problematic when recursively chown(2)ing a large set of
files which is commong with the aforementioned portable home
directory and container and vm scenario.
- Idmapped mounts allow to change ownership locally, restricting it
to specific mounts, and temporarily as the ownership changes only
apply as long as the mount exists.
Several userspace projects have either already put up patches and
pull-requests for this feature or will do so should you decide to pull
this:
- systemd: In a wide variety of scenarios but especially right away
in their implementation of portable home directories.
https://systemd.io/HOME_DIRECTORY/
- container runtimes: containerd, runC, LXD:To share data between
host and unprivileged containers, unprivileged and privileged
containers, etc. The pull request for idmapped mounts support in
containerd, the default Kubernetes runtime is already up for quite
a while now: https://github.com/containerd/containerd/pull/4734
- The virtio-fs developers and several users have expressed interest
in using this feature with virtual machines once virtio-fs is
ported.
- ChromeOS: Sharing host-directories with unprivileged containers.
I've tightly synced with all those projects and all of those listed
here have also expressed their need/desire for this feature on the
mailing list. For more info on how people use this there's a bunch of
talks about this too. Here's just two recent ones:
https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdfhttps://fosdem.org/2021/schedule/event/containers_idmap/
This comes with an extensive xfstests suite covering both ext4 and
xfs:
https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts
It covers truncation, creation, opening, xattrs, vfscaps, setid
execution, setgid inheritance and more both with idmapped and
non-idmapped mounts. It already helped to discover an unrelated xfs
setgid inheritance bug which has since been fixed in mainline. It will
be sent for inclusion with the xfstests project should you decide to
merge this.
In order to support per-mount idmappings vfsmounts are marked with
user namespaces. The idmapping of the user namespace will be used to
map the ids of vfs objects when they are accessed through that mount.
By default all vfsmounts are marked with the initial user namespace.
The initial user namespace is used to indicate that a mount is not
idmapped. All operations behave as before and this is verified in the
testsuite.
Based on prior discussions we want to attach the whole user namespace
and not just a dedicated idmapping struct. This allows us to reuse all
the helpers that already exist for dealing with idmappings instead of
introducing a whole new range of helpers. In addition, if we decide in
the future that we are confident enough to enable unprivileged users
to setup idmapped mounts the permission checking can take into account
whether the caller is privileged in the user namespace the mount is
currently marked with.
The user namespace the mount will be marked with can be specified by
passing a file descriptor refering to the user namespace as an
argument to the new mount_setattr() syscall together with the new
MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern
of extensibility.
The following conditions must be met in order to create an idmapped
mount:
- The caller must currently have the CAP_SYS_ADMIN capability in the
user namespace the underlying filesystem has been mounted in.
- The underlying filesystem must support idmapped mounts.
- The mount must not already be idmapped. This also implies that the
idmapping of a mount cannot be altered once it has been idmapped.
- The mount must be a detached/anonymous mount, i.e. it must have
been created by calling open_tree() with the OPEN_TREE_CLONE flag
and it must not already have been visible in the filesystem.
The last two points guarantee easier semantics for userspace and the
kernel and make the implementation significantly simpler.
By default vfsmounts are marked with the initial user namespace and no
behavioral or performance changes are observed.
The manpage with a detailed description can be found here:
1d7b902e28
In order to support idmapped mounts, filesystems need to be changed
and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The
patches to convert individual filesystem are not very large or
complicated overall as can be seen from the included fat, ext4, and
xfs ports. Patches for other filesystems are actively worked on and
will be sent out separately. The xfstestsuite can be used to verify
that port has been done correctly.
The mount_setattr() syscall is motivated independent of the idmapped
mounts patches and it's been around since July 2019. One of the most
valuable features of the new mount api is the ability to perform
mounts based on file descriptors only.
Together with the lookup restrictions available in the openat2()
RESOLVE_* flag namespace which we added in v5.6 this is the first time
we are close to hardened and race-free (e.g. symlinks) mounting and
path resolution.
While userspace has started porting to the new mount api to mount
proper filesystems and create new bind-mounts it is currently not
possible to change mount options of an already existing bind mount in
the new mount api since the mount_setattr() syscall is missing.
With the addition of the mount_setattr() syscall we remove this last
restriction and userspace can now fully port to the new mount api,
covering every use-case the old mount api could. We also add the
crucial ability to recursively change mount options for a whole mount
tree, both removing and adding mount options at the same time. This
syscall has been requested multiple times by various people and
projects.
There is a simple tool available at
https://github.com/brauner/mount-idmapped
that allows to create idmapped mounts so people can play with this
patch series. I'll add support for the regular mount binary should you
decide to pull this in the following weeks:
Here's an example to a simple idmapped mount of another user's home
directory:
u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt
u1001@f2-vm:/$ ls -al /home/ubuntu/
total 28
drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
drwxr-xr-x 4 root root 4096 Oct 28 04:00 ..
-rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
-rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc
-rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile
-rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful
-rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo
u1001@f2-vm:/$ ls -al /mnt/
total 28
drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 .
drwxr-xr-x 29 root root 4096 Oct 28 22:01 ..
-rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history
-rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc
-rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile
-rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful
-rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo
u1001@f2-vm:/$ touch /mnt/my-file
u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file
u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file
u1001@f2-vm:/$ ls -al /mnt/my-file
-rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file
u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
-rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file
u1001@f2-vm:/$ getfacl /mnt/my-file
getfacl: Removing leading '/' from absolute path names
# file: mnt/my-file
# owner: u1001
# group: u1001
user::rw-
user:u1001:rwx
group::rw-
mask::rwx
other::r--
u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
getfacl: Removing leading '/' from absolute path names
# file: home/ubuntu/my-file
# owner: ubuntu
# group: ubuntu
user::rw-
user:ubuntu:rwx
group::rw-
mask::rwx
other::r--"
* tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits)
xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl
xfs: support idmapped mounts
ext4: support idmapped mounts
fat: handle idmapped mounts
tests: add mount_setattr() selftests
fs: introduce MOUNT_ATTR_IDMAP
fs: add mount_setattr()
fs: add attr_flags_to_mnt_flags helper
fs: split out functions to hold writers
namespace: only take read lock in do_reconfigure_mnt()
mount: make {lock,unlock}_mount_hash() static
namespace: take lock_mount_hash() directly when changing flags
nfs: do not export idmapped mounts
overlayfs: do not mount on top of idmapped mounts
ecryptfs: do not mount on top of idmapped mounts
ima: handle idmapped mounts
apparmor: handle idmapped mounts
fs: make helpers idmap mount aware
exec: handle idmapped mounts
would_dump: handle idmapped mounts
...
Trivial change to clarify code in smb2_is_network_name_deleted
Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>