Fix build warning by removing unused variable 'server':
fs/cifs/inode.c:1089:26: warning:
variable server set but not used [-Wunused-but-set-variable]
1089 | struct TCP_Server_Info *server;
| ^~~~~~
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
This reverts commit 9ffad9263b.
Upon additional testing with older servers, it was found that
the original commit introduced a regression when using the old SMB1
dialect and rsyncing over an existing file.
The patch will need to be respun to address this, likely including
a larger refactoring of the SMB1 and SMB3 rename code paths to make
it less confusing and also to address some additional rename error
cases that SMB3 may be able to workaround.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reported-by: Patrick Fernie <patrick.fernie@gmail.com>
CC: Stable <stable@vger.kernel.org>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Acked-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
When xfstest generic/035, we found the target file was deleted
if the rename return -EACESS.
In cifs_rename2, we unlink the positive target dentry if rename
failed with EACESS or EEXIST, even if the target dentry is positived
before rename. Then the existing file was deleted.
We should just delete the target file which created during the
rename.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
As the man description of the truncate, if the size changed,
then the st_ctime and st_mtime fields should be updated. But
in cifs, we doesn't do it.
It lead the xfstests generic/313 failed.
So, add the ATTR_MTIME|ATTR_CTIME flags on attrs when change
the file size
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pavel noticed that a debug message (disabled by default) in creating the security
descriptor context could be useful for new file creation owner fields
(as we already have for the mode) when using mount parm idsfromsid.
[38120.392272] CIFS: FYI: owner S-1-5-88-1-0, group S-1-5-88-2-0
[38125.792637] CIFS: FYI: owner S-1-5-88-1-1000, group S-1-5-88-2-1000
Also cleans up a typo in a comment
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Adds calls to the newer info level for query info using SMB3.1.1 posix extensions.
The remaining two places that call the older query info (non-SMB3.1.1 POSIX)
require passing in the fid and can be updated in a later patch.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Adds support for better query info on dentry revalidation (using
the SMB3.1.1 POSIX extensions level 100). Followon patch will
add support for translating the UID/GID from the SID and also
will add support for using the posix query info on lookup.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl7aelsACgkQiiy9cAdy
T1HDmwv9Fj6OaXXx+btNvbB6xTWvCwMVKHwTPURMx+IjBYjJC65yPGkInPPkfUVo
7L9h55XCLwFohECleZJCkKOrJtnX1P8SsHtZck6QqjvUETJl/L3pAXpMMYACHLpg
x4DE/NFkcW95J38s9Jtjhphq8ZGUhuDhaT+QeEd2Iq8HzAxk5ND47ZXkomMx1EEM
ZsOrmJF+k2YQyDDpfhJeVF5iZDkbpASqA/TlLxxGH34IdAZIUB9qtGKADNLZ6YyT
qpG601CSrEdl3tVY+SlRMHqwVTRhCViPD6Q3fMw8Xha436RIiWJJ4Rvn6bSP/ZQl
PDPuSVRB2zmd70C/3ojXdku9+VfQLO52qkO3bf2IjgVJ3ARrxFxW7cb7bmYRqdyT
WI5N1+8gETrIAK7aB3QKdmkcRFDtJD3wOTfBcgctuB8WrYrDvW2MNKkPbQdY5tnN
xfp4f10Dg4d+/8knSytJrdKkDublU0kGbfLAa0oupjAzV6WB0qNt0TNGxE42L+ug
iFSJOZxi
=gCcP
-----END PGP SIGNATURE-----
Merge tag '5.8-rc-smb3-fixes-part-1' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs updates from Steve French:
"22 changesets, 2 for stable.
Includes big performance improvement for large i/o when using
multichannel, also includes DFS fixes"
* tag '5.8-rc-smb3-fixes-part-1' of git://git.samba.org/sfrench/cifs-2.6: (22 commits)
cifs: update internal module version number
cifs: multichannel: try to rebind when reconnecting a channel
cifs: multichannel: use pointer for binding channel
smb3: remove static checker warning
cifs: multichannel: move channel selection above transport layer
cifs: multichannel: always zero struct cifs_io_parms
cifs: dump Security Type info in DebugData
smb3: fix incorrect number of credits when ioctl MaxOutputResponse > 64K
smb3: default to minimum of two channels when multichannel specified
cifs: multichannel: move channel selection in function
cifs: fix minor typos in comments and log messages
smb3: minor update to compression header definitions
cifs: minor fix to two debug messages
cifs: Standardize logging output
smb3: Add new parm "nodelete"
cifs: move some variables off the stack in smb2_ioctl_query_info
cifs: reduce stack use in smb2_compound_op
cifs: get rid of unused parameter in reconn_setup_dfs_targets()
cifs: handle hostnames that resolve to same ip in failover
cifs: set up next DFS target before generic_ip_connect()
...
* Fix performance problems found in dioread_nolock now that it is the
default, caused by transaction leaks.
* Clean up fiemap handling in ext4
* Clean up and refactor multiple block allocator (mballoc) code
* Fix a problem with mballoc with a smaller file systems running out
of blocks because they couldn't properly use blocks that had been
reserved by inode preallocation.
* Fixed a race in ext4_sync_parent() versus rename()
* Simplify the error handling in the extent manipulation code
* Make sure all metadata I/O errors are felected to ext4_ext_dirty()'s and
ext4_make_inode_dirty()'s callers.
* Avoid passing an error pointer to brelse in ext4_xattr_set()
* Fix race which could result to freeing an inode on the dirty last
in data=journal mode.
* Fix refcount handling if ext4_iget() fails
* Fix a crash in generic/019 caused by a corrupted extent node
-----BEGIN PGP SIGNATURE-----
iQEyBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl7Ze8kACgkQ8vlZVpUN
gaNChAf4xn0ytFSrweI/S2Sp05G/2L/ocZ2TZZk2ZdGeN1E+ABdSIv/zIF9zuFgZ
/pY/C+fyEZWt4E3FlNO8gJzoEedkzMCMnUhSIfI+wZbcclyTOSNMJtnrnJKAEtVH
HOvGZJmg357jy407RCGhZpJ773nwU2xhBTr5OFxvSf9mt/vzebxIOnw5D7HPlC1V
Fgm6Du8q+tRrPsyjv1Yu4pUEVXMJ7qUcvt326AXVM3kCZO1Aa5GrURX0w3J4mzW1
tc1tKmtbLcVVYTo9CwHXhk/edbxrhAydSP2iACand3tK6IJuI6j9x+bBJnxXitnr
vsxsfTYMG18+2SxrJ9LwmagqmrRq
=HMTs
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"A lot of bug fixes and cleanups for ext4, including:
- Fix performance problems found in dioread_nolock now that it is the
default, caused by transaction leaks.
- Clean up fiemap handling in ext4
- Clean up and refactor multiple block allocator (mballoc) code
- Fix a problem with mballoc with a smaller file systems running out
of blocks because they couldn't properly use blocks that had been
reserved by inode preallocation.
- Fixed a race in ext4_sync_parent() versus rename()
- Simplify the error handling in the extent manipulation code
- Make sure all metadata I/O errors are felected to
ext4_ext_dirty()'s and ext4_make_inode_dirty()'s callers.
- Avoid passing an error pointer to brelse in ext4_xattr_set()
- Fix race which could result to freeing an inode on the dirty last
in data=journal mode.
- Fix refcount handling if ext4_iget() fails
- Fix a crash in generic/019 caused by a corrupted extent node"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits)
ext4: avoid unnecessary transaction starts during writeback
ext4: don't block for O_DIRECT if IOCB_NOWAIT is set
ext4: remove the access_ok() check in ext4_ioctl_get_es_cache
fs: remove the access_ok() check in ioctl_fiemap
fs: handle FIEMAP_FLAG_SYNC in fiemap_prep
fs: move fiemap range validation into the file systems instances
iomap: fix the iomap_fiemap prototype
fs: move the fiemap definitions out of fs.h
fs: mark __generic_block_fiemap static
ext4: remove the call to fiemap_check_flags in ext4_fiemap
ext4: split _ext4_fiemap
ext4: fix fiemap size checks for bitmap files
ext4: fix EXT4_MAX_LOGICAL_BLOCK macro
add comment for ext4_dir_entry_2 file_type member
jbd2: avoid leaking transaction credits when unreserving handle
ext4: drop ext4_journal_free_reserved()
ext4: mballoc: use lock for checking free blocks while retrying
ext4: mballoc: refactor ext4_mb_good_group()
ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling
ext4: mballoc: refactor ext4_mb_discard_preallocations()
...
SMB2_read/SMB2_write check and use cifs_io_parms->server, which might
be uninitialized memory.
This change makes all callers zero-initialize the struct.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
No need to pull the fiemap definitions into almost every file in the
kernel build.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Link: https://lore.kernel.org/r/20200523073016.2944131-5-hch@lst.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Use pr_fmt to standardize all logging for fs/cifs.
Some logging output had no CIFS: specific prefix.
Now all output has one of three prefixes:
o CIFS:
o CIFS: VFS:
o Root-CIFS:
Miscellanea:
o Convert printks to pr_<level>
o Neaten macro definitions
o Remove embedded CIFS: prefixes from formats
o Convert "illegal" to "invalid"
o Coalesce formats
o Add missing '\n' format terminations
o Consolidate multiple cifs_dbg continuations into single calls
o More consistent use of upper case first word output logging
o Multiline statement argument alignment and wrapping
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
In order to handle workloads where it is important to make sure that
a buggy app did not delete content on the drive, the new mount option
"nodelete" allows standard permission checks on the server to work,
but prevents on the client any attempts to unlink a file or delete
a directory on that mount point. This can be helpful when running
a little understood app on a network mount that contains important
content that should not be deleted.
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
As per POSIX, the correct spelling is EACCES:
include/uapi/asm-generic/errno-base.h:#define EACCES 13 /* Permission denied */
Fixes: b8f7442bc4 ("CIFS: refactor cifs_get_inode_info()")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Steve French <stfrench@microsoft.com>
Found a read performance issue when linux kernel page size is 64KB.
If linux kernel page size is 64KB and mount options cache=strict &
vers=2.1+, it does not support cifs_readpages(). Instead, it is using
cifs_readpage() and cifs_read() with maximum read IO size 16KB, which is
much slower than read IO size 1MB when negotiated SMB 2.1+. Since modern
SMB server supported SMB 2.1+ and Max Read Size can reach more than 64KB
(for example 1MB ~ 8MB), this patch check max_read instead of maxBuf to
determine whether server support readpages() and improve read performance
for page size 64KB & cache=strict & vers=2.1+, and for SMB1 it is more
cleaner to initialize server->max_read to server->maxBuf.
The client is a linux box with linux kernel 4.2.8,
page size 64KB (CONFIG_ARM64_64K_PAGES=y),
cpu arm 1.7GHz, and use mount.cifs as smb client.
The server is another linux box with linux kernel 4.2.8,
share a file '10G.img' with size 10GB,
and use samba-4.7.12 as smb server.
The client mount a share from the server with different
cache options: cache=strict and cache=none,
mount -tcifs //<server_ip>/Public /cache_strict -overs=3.0,cache=strict,username=<xxx>,password=<yyy>
mount -tcifs //<server_ip>/Public /cache_none -overs=3.0,cache=none,username=<xxx>,password=<yyy>
The client download a 10GbE file from the server across 1GbE network,
dd if=/cache_strict/10G.img of=/dev/null bs=1M count=10240
dd if=/cache_none/10G.img of=/dev/null bs=1M count=10240
Found that cache=strict (without patch) is slower read throughput and
smaller read IO size than cache=none.
cache=strict (without patch): read throughput 40MB/s, read IO size is 16KB
cache=strict (with patch): read throughput 113MB/s, read IO size is 1MB
cache=none: read throughput 109MB/s, read IO size is 1MB
Looks like if page size is 64KB,
cifs_set_ops() would use cifs_addr_ops_smallbuf instead of cifs_addr_ops,
/* check if server can support readpages */
if (cifs_sb_master_tcon(cifs_sb)->ses->server->maxBuf <
PAGE_SIZE + MAX_CIFS_HDR_SIZE)
inode->i_data.a_ops = &cifs_addr_ops_smallbuf;
else
inode->i_data.a_ops = &cifs_addr_ops;
maxBuf is came from 2 places, SMB2_negotiate() and CIFSSMBNegotiate(),
(SMB2_MAX_BUFFER_SIZE is 64KB)
SMB2_negotiate():
/* set it to the maximum buffer size value we can send with 1 credit */
server->maxBuf = min_t(unsigned int, le32_to_cpu(rsp->MaxTransactSize),
SMB2_MAX_BUFFER_SIZE);
CIFSSMBNegotiate():
server->maxBuf = le32_to_cpu(pSMBr->MaxBufferSize);
Page size 64KB and cache=strict lead to read_pages() use cifs_readpage()
instead of cifs_readpages(), and then cifs_read() using maximum read IO
size 16KB, which is much slower than maximum read IO size 1MB.
(CIFSMaxBufSize is 16KB by default)
/* FIXME: set up handlers for larger reads and/or convert to async */
rsize = min_t(unsigned int, cifs_sb->rsize, CIFSMaxBufSize);
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Jones Syue <jonessyue@qnap.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Add experimental support for allowing a swap file to be on an SMB3
mount. There are use cases where swapping over a secure network
filesystem is preferable. In some cases there are no local
block devices large enough, and network block devices can be
hard to setup and secure. And in some cases there are no
local block devices at all (e.g. with the recent addition of
remote boot over SMB3 mounts).
There are various enhancements that can be added later e.g.:
- doing a mandatory byte range lock over the swapfile (until
the Linux VFS is modified to notify the file system that an open
is for a swapfile, when the file can be opened "DENY_ALL" to prevent
others from opening it).
- pinning more buffers in the underlying transport to minimize memory
allocations in the TCP stack under the fs
- documenting how to create ACLs (on the server) to secure the
swapfile (or adding additional tools to cifs-utils to make it easier)
Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
See commit 349457ccf2
"Allow file systems to manually d_move() inside of ->rename()"
Lessens possibility of race conditions in rename
Signed-off-by: Steve French <stfrench@microsoft.com>
There are cases when we don't want to send the SMB2 flush operation
(e.g. when user specifies mount parm "nostrictsync") and it can be
a very expensive operation on the server. In most cases in order
to set mtime, we simply need to flush (write) the dirtry pages from
the client and send the writes to the server not also send a flush
protocol operation to the server.
Fixes: aa081859b1 ("cifs: flush before set-info if we have writeable handles")
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Check the AT_STATX_FORCE_SYNC flag and force an attribute
revalidation if requested by the caller, and if the caller
specificies AT_STATX_DONT_SYNC only revalidate cached attributes
if required. In addition do not flush writes in getattr (which
can be expensive) if size or timestamps not requested by the
caller.
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Smatch complains that "rc" could be uninitialized.
fs/cifs/inode.c:2206 cifs_getattr() error: uninitialized symbol 'rc'.
Changing it to "return 0;" improves readability as well.
Fixes: cc1baf98c8f6 ("cifs: do not ignore the SYNC flags in getattr")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
All other uses of cifs_dbg use defines so change this one.
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
To rename a file in SMB2 we open it with the DELETE access and do a
special SetInfo on it. If the handle is missing the DELETE bit the
server will fail the SetInfo with STATUS_ACCESS_DENIED.
We currently try to reuse any existing opened handle we have with
cifs_get_writable_path(). That function looks for handles with WRITE
access but doesn't check for DELETE, making rename() fail if it finds
a handle to reuse. Simple reproducer below.
To select handles with the DELETE bit, this patch adds a flag argument
to cifs_get_writable_path() and find_writable_file() and the existing
'bool fsuid_only' argument is converted to a flag.
The cifsFileInfo struct only stores the UNIX open mode but not the
original SMB access flags. Since the DELETE bit is not mapped in that
mode, this patch stores the access mask in cifs_fid on file open,
which is accessible from cifsFileInfo.
Simple reproducer:
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#define E(s) perror(s), exit(1)
int main(int argc, char *argv[])
{
int fd, ret;
if (argc != 3) {
fprintf(stderr, "Usage: %s A B\n"
"create&open A in write mode, "
"rename A to B, close A\n", argv[0]);
return 0;
}
fd = openat(AT_FDCWD, argv[1], O_WRONLY|O_CREAT|O_SYNC, 0666);
if (fd == -1) E("openat()");
ret = rename(argv[1], argv[2]);
if (ret) E("rename()");
ret = close(fd);
if (ret) E("close()");
return ret;
}
$ gcc -o bugrename bugrename.c
$ ./bugrename /mnt/a /mnt/b
rename(): Permission denied
Fixes: 8de9e86c67 ("cifs: create a helper to find a writeable handle by path name")
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
If from cifs_revalidate_dentry_attr() the SMB2/QUERY_INFO call fails with an
error, such as STATUS_SESSION_EXPIRED, causing the session to be reconnected
it is possible we will leak -EAGAIN back to the application even for
system calls such as stat() where this is not a valid error.
Fix this by re-trying the operation from within cifs_revalidate_dentry_attr()
if cifs_get_inode_info*() returns -EAGAIN.
This fixes stat() and possibly also other system calls that uses
cifs_revalidate_dentry*().
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org>
A number of the debug statements output file or directory mode
in hex. Change these to print using octal.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl49bNsACgkQiiy9cAdy
T1EGlQwArDJiHUV7W/WaoDZnusPPQqUT3ayqAHL0P8cDsjxLu3uNMkUISr0HdbxC
kqYahSTb+/BKQzoZhVe5wK3S8W6R8+wyaPJExRCL3brlIHVP/eC9uUjSgkT6QVDl
/vZCwxj7KmTK/S+ofji/XTl2f8f8BCw2biGVxwR2Jj5pwKI4wFIMFm7mDetTQRD4
bK0UR2Owiw4DpPXdwHlXPf9N06z0ETa1UdMXklIBgeK9B1eT1STD9q/iHJh3bLpO
klhbiq5eGRCcs9cBVTQcn6U+zGYBOcdJuhPGbAObEU+R2vNX06clydKlKy1oz1VL
4jbVVn9xuGZ9evFBC3h7Na1X7C3V28WcpfeRfFxZ157hNuQSNo5wiq0rF66EQ14U
hbmlx2S2ooyNKcnrj46SUw9zVLZ0xcx1Mw7kmoyHgI/vznW9fvV0Y2JXawJMPei5
VuQTgDLFsvnIIrUnrGBu2UXMzXghxLZ3SXJVKXuW3luvNRk82RAGHmIdty3OTgPp
DN9lhGvv
=F1qf
-----END PGP SIGNATURE-----
Merge tag '5.6-rc-smb3-plugfest-patches' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"13 cifs/smb3 patches, most from testing at the SMB3 plugfest this week:
- Important fix for multichannel and for modefromsid mounts.
- Two reconnect fixes
- Addition of SMB3 change notify support
- Backup tools fix
- A few additional minor debug improvements (tracepoints and
additional logging found useful during testing this week)"
* tag '5.6-rc-smb3-plugfest-patches' of git://git.samba.org/sfrench/cifs-2.6:
smb3: Add defines for new information level, FileIdInformation
smb3: print warning once if posix context returned on open
smb3: add one more dynamic tracepoint missing from strict fsync path
cifs: fix mode bits from dir listing when mounted with modefromsid
cifs: fix channel signing
cifs: add SMB3 change notification support
cifs: make multichannel warning more visible
cifs: fix soft mounts hanging in the reconnect code
cifs: Add tracepoints for errors on flush or fsync
cifs: log warning message (once) if out of disk space
cifs: fail i/o on soft mounts if sessionsetup errors out
smb3: fix problem with null cifs super block with previous patch
SMB3: Backup intent flag missing from some more ops
When "backup intent" is requested on the mount (e.g. backupuid or
backupgid mount options), the corresponding flag was missing from
some of the operations.
Change all operations to use the macro cifs_create_options() to
set the backup intent flag if needed.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
RHBZ 1336264
When we extend a file we must also force the size to be updated.
This fixes an issue with holetest in xfs-tests which performs the following
sequence :
1, create a new file
2, use fallocate mode==0 to populate the file
3, mmap the file
4, touch each page by reading the mmapped region.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
timestamp_truncate() is the replacement api for
timespec64_trunc. timestamp_truncate() additionally clamps
timestamps to make sure the timestamps lie within the
permitted range for the filesystem.
Truncate the timestamps in the struct cifs_attr at the
site of assignment to inode times. This
helps us use the right fs api timestamp_trucate() to
perform the truncation.
Also update the ktime_get_* api to match the one used in
current_time(). This allows for timestamps to be updated
the same way always.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: stfrench@microsoft.com
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
According to the comment in the code and commit log, some apps
expect atime >= mtime; but the introduced code results in
atime==mtime. Fix the comparison to guard against atime<mtime.
Fixes: 9b9c5bea0b ("cifs: do not return atime less than mtime")
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: stfrench@microsoft.com
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
We accidentally messed up the indenting on this if statement.
Fixes: 16c696a6c300 ("CIFS: refactor cifs_get_inode_info()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Make logic of cifs_get_inode() much clearer by moving code to sub
functions and adding comments.
Document the steps this function does.
cifs_get_inode_info() gets and updates a file inode metadata from its
file path.
* If caller already has raw info data from server they can pass it.
* If inode already exists (just need to update) caller can pass it.
Step 1: get raw data from server if none was passed
Step 2: parse raw data into intermediate internal cifs_fattr struct
Step 3: set fattr uniqueid which is later used for inode number. This
can sometime be done from raw data
Step 4: tweak fattr according to mount options (file_mode, acl to mode
bits, uid, gid, etc)
Step 5: update or create inode from final fattr struct
* add is_smb1_server() helper
* add is_inode_cache_good() helper
* move SMB1-backupcreds-getinfo-retry to separate func
cifs_backup_query_path_info().
* move set-uniqueid code to separate func cifs_set_fattr_ino()
* don't clobber uniqueid from backup cred retry
* fix some probable corner cases memleaks
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
cifs_setattr_nounix has two paths which miss free operations
for xid and fullpath.
Use goto cifs_setattr_exit like other paths to fix them.
CC: Stable <stable@vger.kernel.org>
Fixes: aa081859b1 ("cifs: flush before set-info if we have writeable handles")
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Currently the client indicates that a dentry is stale when inode
numbers or type types between a local inode and a remote file
don't match. If this is the case attributes is not being copied
from remote to local, so, it is already known that the local copy
has stale metadata. That's why the inode needs to be marked for
revalidation in order to tell the VFS to lookup the dentry again
before openning a file. This prevents unexpected stale errors
to be returned to the user space when openning a file.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
We need to populate an ACL (security descriptor open context)
on file and directory correct. This patch passes in the
mode. Followon patch will build the open context and the
security descriptor (from the mode) that goes in the open
context.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
When mounting with "modefromsid" retrieve mode bits from
special SID (S-1-5-88-3) on stat. Subsequent patch will fix
setattr (chmod) to save mode bits in S-1-5-88-3-<mode>
Note that when an ACE matching S-1-5-88-3 is not found, we
default the mode to an approximation based on the owner, group
and everyone permissions (as with the "cifsacl" mount option).
See See e.g.
https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/hh509017(v=ws.10)
Signed-off-by: Steve French <stfrench@microsoft.com>
Servers can defer destaging any data and updating the mtime until close().
This means that if we do a setinfo to modify the mtime while other handles
are open for write the server may overwrite our setinfo timestamps when
if flushes the file on close() of the writeable handle.
To solve this we add an explicit flush when the mtime is about to
be updated.
This fixes "cp -p" to preserve mtime when copying a file onto an SMB2 share.
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Field ia_valid is being debugged with the field name iavalid, fix this.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
SMB3 ACL support is needed for many use cases now and should not be
ifdeffed out, even for SMB1 (CIFS). Remove the CONFIG_CIFS_ACL
ifdef so ACL support is always built into cifs.ko
Signed-off-by: Steve French <stfrench@microsoft.com>
Useful for improved copy performance as well as for
applications which query allocated ranges of sparse
files.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
A path-based rename returning EBUSY will incorrectly try opening
the file with a cifs (NT Create AndX) operation on an smb2+ mount,
which causes the server to force a session close.
If the mount is smb2+, skip the fallback.
Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
We negotiate rsize mounts (and it can be overridden by user) to
typically 4MB, so using larger default I/O sizes from userspace
(changing to 1MB default i/o size returned by stat) the
performance is much better (and not just for long latency
network connections) in most use cases for SMB3 than the default I/O
size (which ends up being 128K for cp and can be even smaller for cp).
This can be 4x slower or worse depending on network latency.
By changing inode->blocksize from 32K (which was perhaps ok
for very old SMB1/CIFS) to a larger value, 1MB (but still less than
max size negotiated with the server which is 4MB, in order to minimize
risk) it significantly increases performance for the
noncached case, and slightly increases it for the cached case.
This can be changed by the user on mount (specifying bsize=
values from 16K to 16MB) to tune better for performance
for applications that depend on blocksize.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
This patch aims to address writeback code problems related to error
paths. In particular it respects EINTR and related error codes and
stores and returns the first error occurred during writeback.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Different servers have different set of file ids.
After failover, unique IDs will be different so we can't validate
them.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
This is not actually a bug but as Coverity points out we shouldn't
be doing an "|=" on a value which hasn't been set (although technically
it was memset to zero so isn't a bug) and so might as well change
"|=" to "=" in this line
Detected by CoverityScan, CID#728535 ("Unitialized scalar variable")
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Trivial fix to a spelling mistake of the error access name EACCESS,
rename to EACCES
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
In cases where queryinfo fails, we have cases in cifs (vers=1.0)
where with backupuid mounts we retry the query info with findfirst.
This doesn't work to some NetApp servers which don't support
WindowsXP (and later) infolevel 261 (SMB_FIND_FILE_ID_FULL_DIR_INFO)
so in this case use other info levels (in this case it will usually
be level 257, SMB_FIND_FILE_DIRECTORY_INFO).
(Also fixes some indentation)
See kernel bugzilla 201435
Signed-off-by: Steve French <stfrench@microsoft.com>
If backupuid mount option is sent, we can incorrectly retry
(on access denied on query info) with a cifs (FindFirst) operation
on an smb3 mount which causes the server to force the session close.
We set backup intent on open so no need for this fallback.
See kernel bugzilla 201435
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
In network file system it is fairly easy for server and client
atime vs. mtime to get confused (and atime updated less frequently)
which we noticed broke some apps which expect atime >= mtime
Also ignore relatime mount option (rather than error on it) since
relatime is basically what some network server fs are doing
(relatime).
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
When "backup intent" is requested on the mount (e.g. backupuid or
backupgid mount options), the corresponding flag needs to be set
on opens of directories (and files) but was missing in some
places causing access denied trying to enumerate and backup
servers.
Fixes kernel bugzilla #200953https://bugzilla.kernel.org/show_bug.cgi?id=200953
Reported-and-tested-by: <whh@rubrik.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
An earlier commit had a typo which prevented the
optimization from working:
commit 18dd8e1a65 ("Do not send SMB3 SET_INFO request if nothing is changing")
Thank you to Metze for noticing this. Also clear a
reserved field in the FILE_BASIC_INFO struct we send
that should be zero (all the other fields in that
struct were set or cleared explicitly already in
cifs_set_file_info).
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org> # 4.9.x+
Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
We really, really want to be encouraging use of secure dialects,
and SMB3.1.1 offers useful security features, and will soon
be the recommended dialect for many use cases. Simplify the code
by removing the CONFIG_CIFS_SMB311 ifdef so users don't disable
it in the build, and create compatibility and/or security issues
with modern servers - many of which have been supporting this
dialect for multiple years.
Also clarify some of the Kconfig text for cifs.ko about
SMB3.1.1 and current supported features in the module.
Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
In cifs, the timestamps are stored in memory in the cifs_fattr structure,
which uses the deprecated 'timespec' structure. Now that the VFS code
has moved on to 'timespec64', the next step is to change over the fattr
as well.
This also makes 32-bit and 64-bit systems behave the same way, and
no longer overflow the 32-bit time_t in year 2038.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
This is a late set of changes from Deepa Dinamani doing an automated
treewide conversion of the inode and iattr structures from 'timespec'
to 'timespec64', to push the conversion from the VFS layer into the
individual file systems.
There were no conflicts between this and the contents of linux-next
until just before the merge window, when we saw multiple problems:
- A minor conflict with my own y2038 fixes, which I could address
by adding another patch on top here.
- One semantic conflict with late changes to the NFS tree. I addressed
this by merging Deepa's original branch on top of the changes that
now got merged into mainline and making sure the merge commit includes
the necessary changes as produced by coccinelle.
- A trivial conflict against the removal of staging/lustre.
- Multiple conflicts against the VFS changes in the overlayfs tree.
These are still part of linux-next, but apparently this is no longer
intended for 4.18 [1], so I am ignoring that part.
As Deepa writes:
The series aims to switch vfs timestamps to use struct timespec64.
Currently vfs uses struct timespec, which is not y2038 safe.
The series involves the following:
1. Add vfs helper functions for supporting struct timepec64 timestamps.
2. Cast prints of vfs timestamps to avoid warnings after the switch.
3. Simplify code using vfs timestamps so that the actual
replacement becomes easy.
4. Convert vfs timestamps to use struct timespec64 using a script.
This is a flag day patch.
Next steps:
1. Convert APIs that can handle timespec64, instead of converting
timestamps at the boundaries.
2. Update internal data structures to avoid timestamp conversions.
Thomas Gleixner adds:
I think there is no point to drag that out for the next merge window.
The whole thing needs to be done in one go for the core changes which
means that you're going to play that catchup game forever. Let's get
over with it towards the end of the merge window.
[1] https://www.spinics.net/lists/linux-fsdevel/msg128294.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJbInZAAAoJEGCrR//JCVInReoQAIlVIIMt5ZX6wmaKbrjy9Itf
MfgbFihQ/djLnuSPVQ3nztcxF0d66BKHZ9puVjz6+mIHqfDvJTRwZs9nU+sOF/T1
g78fRkM1cxq6ZCkGYAbzyjyo5aC4PnSMP/NQLmwqvi0MXqqrbDoq5ZdP9DHJw39h
L9lD8FM/P7T29Fgp9tq/pT5l9X8VU8+s5KQG1uhB5hii4VL6pD6JyLElDita7rg+
Z7/V7jkxIGEUWF7vGaiR1QTFzEtpUA/exDf9cnsf51OGtK/LJfQ0oiZPPuq3oA/E
LSbt8YQQObc+dvfnGxwgxEg1k5WP5ekj/Wdibv/+rQKgGyLOTz6Q4xK6r8F2ahxs
nyZQBdXqHhJYyKr1H1reUH3mrSgQbE5U5R1i3My0xV2dSn+vtK5vgF21v2Ku3A1G
wJratdtF/kVBzSEQUhsYTw14Un+xhBLRWzcq0cELonqxaKvRQK9r92KHLIWNE7/v
c0TmhFbkZA+zR8HdsaL3iYf1+0W/eYy8PcvepyldKNeW2pVk3CyvdTfY2Z87G2XK
tIkK+BUWbG3drEGG3hxZ3757Ln3a9qWyC5ruD3mBVkuug/wekbI8PykYJS7Mx4s/
WNXl0dAL0Eeu1M8uEJejRAe1Q3eXoMWZbvCYZc+wAm92pATfHVcKwPOh8P7NHlfy
A3HkjIBrKW5AgQDxfgvm
=CZX2
-----END PGP SIGNATURE-----
Merge tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground
Pull inode timestamps conversion to timespec64 from Arnd Bergmann:
"This is a late set of changes from Deepa Dinamani doing an automated
treewide conversion of the inode and iattr structures from 'timespec'
to 'timespec64', to push the conversion from the VFS layer into the
individual file systems.
As Deepa writes:
'The series aims to switch vfs timestamps to use struct timespec64.
Currently vfs uses struct timespec, which is not y2038 safe.
The series involves the following:
1. Add vfs helper functions for supporting struct timepec64
timestamps.
2. Cast prints of vfs timestamps to avoid warnings after the switch.
3. Simplify code using vfs timestamps so that the actual replacement
becomes easy.
4. Convert vfs timestamps to use struct timespec64 using a script.
This is a flag day patch.
Next steps:
1. Convert APIs that can handle timespec64, instead of converting
timestamps at the boundaries.
2. Update internal data structures to avoid timestamp conversions'
Thomas Gleixner adds:
'I think there is no point to drag that out for the next merge
window. The whole thing needs to be done in one go for the core
changes which means that you're going to play that catchup game
forever. Let's get over with it towards the end of the merge window'"
* tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
pstore: Remove bogus format string definition
vfs: change inode times to use struct timespec64
pstore: Convert internal records to timespec64
udf: Simplify calls to udf_disk_stamp_to_time
fs: nfs: get rid of memcpys for inode times
ceph: make inode time prints to be long long
lustre: Use long long type to print inode time
fs: add timespec64_truncate()
RHBZ: 1566345
When truncating a file we always do this synchronously to the server.
Thus we need to make sure that the cached inode metadata is
marked as stale so that on next getattr we will refresh the metadata.
In this particular bug we want to ensure that both ctime and mtime
are updated and become visible to the application after a truncate.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reported-by: Xiaoli Feng <xifeng@redhat.com>
On 32-bit (e.g. with m68k-linux-gnu-gcc-4.1):
fs/cifs/inode.c: In function ‘simple_hashstr’:
fs/cifs/inode.c:713: warning: integer constant is too large for ‘long’ type
Fixes: 7ea884c77e ("smb3: Fix root directory when server returns inode number of zero")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Some servers return inode number zero for the root directory, which
causes ls to display incorrect data (missing "." and "..").
If the server returns zero for the inode number of the root directory,
fake an inode number for it.
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Modify end of cifs_root_iget function in fs/cifs/inode.c to call
free_xid(xid) instead of _free_xid(xid), thereby allowing debug
notification of this action when enabled.
Signed-off-by: Phillip Potter <phil@philpotter.co.uk>
Signed-off-by: Steve French <smfrench@gmail.com>
* Remove ses->ipc_tid.
* Make IPC$ regular tcon.
* Add a direct pointer to it in ses->tcon_ipc.
* Distinguish PIPE tcon from IPC tcon by adding a tcon->pipe flag. All
IPC tcons are pipes but not all pipes are IPC.
* All TreeConnect functions now cannot take a NULL tcon object.
The IPC tcon has the same lifetime as the session it belongs to. It is
created when the session is created and destroyed when the session is
destroyed.
Since no mounts directly refer to the IPC tcon, its refcount should
always be set to initialisation value (1). Thus we make sure
cifs_put_tcon() skips it.
If the mount request resulting in a new session being created requires
encryption, try to require it too for IPC.
* set SERVER_NAME_LENGTH to serverName actual size
The maximum length of an ipv6 string representation is defined in
INET6_ADDRSTRLEN as 45+1 for null but lets keep what we know works.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
This is a pure automated search-and-replace of the internal kernel
superblock flags.
The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.
Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.
The script to do this was:
# places to look in; re security/*: it generally should *not* be
# touched (that stuff parses mount(2) arguments directly), but
# there are two places where we really deal with superblock flags.
FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
include/linux/fs.h include/uapi/linux/bfs_fs.h \
security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
# the list of MS_... constants
SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
ACTIVE NOUSER"
SED_PROG=
for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
# we want files that contain at least one of MS_...,
# with fs/namespace.c and fs/pnode.c excluded.
L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
for f in $L; do sed -i $f $SED_PROG; done
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We weren't returning the creation time or the two easily supported
attributes (ENCRYPTED or COMPRESSED) for the getattr call to
allow statx to return these fields.
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>\
Acked-by: Jeff Layton <jlayton@poochiereds.net>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
The wait_bit*() types and APIs are mixed into wait.h, but they
are a pretty orthogonal extension of wait-queues.
Furthermore, only about 50 kernel files use these APIs, while
over 1000 use the regular wait-queue functionality.
So clean up the main wait.h by moving the wait-bit functionality
out of it, into a separate .h and .c file:
include/linux/wait_bit.h for types and APIs
kernel/sched/wait_bit.c for the implementation
Update all header dependencies.
This reduces the size of wait.h rather significantly, by about 30%.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Some minor cleanup of cifs query xattr functions (will also make
SMB3 xattr implementation cleaner as well).
Signed-off-by: Steve French <steve.french@primarydata.com>
CURRENT_TIME macro is not y2038 safe on 32 bit systems.
The patch replaces all the uses of CURRENT_TIME by current_time() for
filesystem times, and ktime_get_* functions for authentication
timestamps and timezone calculations.
This is also in preparation for the patch that transitions vfs
timestamps to use 64 bit time and hence make them y2038 safe.
CURRENT_TIME macro will be deleted before merging the aforementioned
change.
The inode timestamps read from the server are assumed to have correct
granularity and range.
The patch also assumes that the difference between server and client
times lie in the range INT_MIN..INT_MAX. This is valid because this is
the difference between current times between server and client, and the
largest timezone difference is in the range of one day.
All cifs timestamps currently use timespec representation internally.
Authentication and timezone timestamps can also be transitioned into
using timespec64 when all other timestamps for cifs is transitioned to
use timespec64.
Link: http://lkml.kernel.org/r/1491613030-11599-4-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Steve French <sfrench@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs 'statx()' update from Al Viro.
This adds the new extended stat() interface that internally subsumes our
previous stat interfaces, and allows user mode to specify in more detail
what kind of information it wants.
It also allows for some explicit synchronization information to be
passed to the filesystem, which can be relevant for network filesystems:
is the cached value ok, or do you need open/close consistency, or what?
From David Howells.
Andreas Dilger points out that the first version of the extended statx
interface was posted June 29, 2010:
https://www.spinics.net/lists/linux-fsdevel/msg33831.html
* 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
statx: Add a system call to make enhanced file info available
Add a system call to make extended file information available, including
file creation and some attribute flags where available through the
underlying filesystem.
The getattr inode operation is altered to take two additional arguments: a
u32 request_mask and an unsigned int flags that indicate the
synchronisation mode. This change is propagated to the vfs_getattr*()
function.
Functions like vfs_stat() are now inline wrappers around new functions
vfs_statx() and vfs_statx_fd() to reduce stack usage.
========
OVERVIEW
========
The idea was initially proposed as a set of xattrs that could be retrieved
with getxattr(), but the general preference proved to be for a new syscall
with an extended stat structure.
A number of requests were gathered for features to be included. The
following have been included:
(1) Make the fields a consistent size on all arches and make them large.
(2) Spare space, request flags and information flags are provided for
future expansion.
(3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an
__s64).
(4) Creation time: The SMB protocol carries the creation time, which could
be exported by Samba, which will in turn help CIFS make use of
FS-Cache as that can be used for coherency data (stx_btime).
This is also specified in NFSv4 as a recommended attribute and could
be exported by NFSD [Steve French].
(5) Lightweight stat: Ask for just those details of interest, and allow a
netfs (such as NFS) to approximate anything not of interest, possibly
without going to the server [Trond Myklebust, Ulrich Drepper, Andreas
Dilger] (AT_STATX_DONT_SYNC).
(6) Heavyweight stat: Force a netfs to go to the server, even if it thinks
its cached attributes are up to date [Trond Myklebust]
(AT_STATX_FORCE_SYNC).
And the following have been left out for future extension:
(7) Data version number: Could be used by userspace NFS servers [Aneesh
Kumar].
Can also be used to modify fill_post_wcc() in NFSD which retrieves
i_version directly, but has just called vfs_getattr(). It could get
it from the kstat struct if it used vfs_xgetattr() instead.
(There's disagreement on the exact semantics of a single field, since
not all filesystems do this the same way).
(8) BSD stat compatibility: Including more fields from the BSD stat such
as creation time (st_btime) and inode generation number (st_gen)
[Jeremy Allison, Bernd Schubert].
(9) Inode generation number: Useful for FUSE and userspace NFS servers
[Bernd Schubert].
(This was asked for but later deemed unnecessary with the
open-by-handle capability available and caused disagreement as to
whether it's a security hole or not).
(10) Extra coherency data may be useful in making backups [Andreas Dilger].
(No particular data were offered, but things like last backup
timestamp, the data version number and the DOS archive bit would come
into this category).
(11) Allow the filesystem to indicate what it can/cannot provide: A
filesystem can now say it doesn't support a standard stat feature if
that isn't available, so if, for instance, inode numbers or UIDs don't
exist or are fabricated locally...
(This requires a separate system call - I have an fsinfo() call idea
for this).
(12) Store a 16-byte volume ID in the superblock that can be returned in
struct xstat [Steve French].
(Deferred to fsinfo).
(13) Include granularity fields in the time data to indicate the
granularity of each of the times (NFSv4 time_delta) [Steve French].
(Deferred to fsinfo).
(14) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags.
Note that the Linux IOC flags are a mess and filesystems such as Ext4
define flags that aren't in linux/fs.h, so translation in the kernel
may be a necessity (or, possibly, we provide the filesystem type too).
(Some attributes are made available in stx_attributes, but the general
feeling was that the IOC flags were to ext[234]-specific and shouldn't
be exposed through statx this way).
(15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer,
Michael Kerrisk].
(Deferred, probably to fsinfo. Finding out if there's an ACL or
seclabal might require extra filesystem operations).
(16) Femtosecond-resolution timestamps [Dave Chinner].
(A __reserved field has been left in the statx_timestamp struct for
this - if there proves to be a need).
(17) A set multiple attributes syscall to go with this.
===============
NEW SYSTEM CALL
===============
The new system call is:
int ret = statx(int dfd,
const char *filename,
unsigned int flags,
unsigned int mask,
struct statx *buffer);
The dfd, filename and flags parameters indicate the file to query, in a
similar way to fstatat(). There is no equivalent of lstat() as that can be
emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags. There is
also no equivalent of fstat() as that can be emulated by passing a NULL
filename to statx() with the fd of interest in dfd.
Whether or not statx() synchronises the attributes with the backing store
can be controlled by OR'ing a value into the flags argument (this typically
only affects network filesystems):
(1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this
respect.
(2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise
its attributes with the server - which might require data writeback to
occur to get the timestamps correct.
(3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a
network filesystem. The resulting values should be considered
approximate.
mask is a bitmask indicating the fields in struct statx that are of
interest to the caller. The user should set this to STATX_BASIC_STATS to
get the basic set returned by stat(). It should be noted that asking for
more information may entail extra I/O operations.
buffer points to the destination for the data. This must be 256 bytes in
size.
======================
MAIN ATTRIBUTES RECORD
======================
The following structures are defined in which to return the main attribute
set:
struct statx_timestamp {
__s64 tv_sec;
__s32 tv_nsec;
__s32 __reserved;
};
struct statx {
__u32 stx_mask;
__u32 stx_blksize;
__u64 stx_attributes;
__u32 stx_nlink;
__u32 stx_uid;
__u32 stx_gid;
__u16 stx_mode;
__u16 __spare0[1];
__u64 stx_ino;
__u64 stx_size;
__u64 stx_blocks;
__u64 __spare1[1];
struct statx_timestamp stx_atime;
struct statx_timestamp stx_btime;
struct statx_timestamp stx_ctime;
struct statx_timestamp stx_mtime;
__u32 stx_rdev_major;
__u32 stx_rdev_minor;
__u32 stx_dev_major;
__u32 stx_dev_minor;
__u64 __spare2[14];
};
The defined bits in request_mask and stx_mask are:
STATX_TYPE Want/got stx_mode & S_IFMT
STATX_MODE Want/got stx_mode & ~S_IFMT
STATX_NLINK Want/got stx_nlink
STATX_UID Want/got stx_uid
STATX_GID Want/got stx_gid
STATX_ATIME Want/got stx_atime{,_ns}
STATX_MTIME Want/got stx_mtime{,_ns}
STATX_CTIME Want/got stx_ctime{,_ns}
STATX_INO Want/got stx_ino
STATX_SIZE Want/got stx_size
STATX_BLOCKS Want/got stx_blocks
STATX_BASIC_STATS [The stuff in the normal stat struct]
STATX_BTIME Want/got stx_btime{,_ns}
STATX_ALL [All currently available stuff]
stx_btime is the file creation time, stx_mask is a bitmask indicating the
data provided and __spares*[] are where as-yet undefined fields can be
placed.
Time fields are structures with separate seconds and nanoseconds fields
plus a reserved field in case we want to add even finer resolution. Note
that times will be negative if before 1970; in such a case, the nanosecond
fields will also be negative if not zero.
The bits defined in the stx_attributes field convey information about a
file, how it is accessed, where it is and what it does. The following
attributes map to FS_*_FL flags and are the same numerical value:
STATX_ATTR_COMPRESSED File is compressed by the fs
STATX_ATTR_IMMUTABLE File is marked immutable
STATX_ATTR_APPEND File is append-only
STATX_ATTR_NODUMP File is not to be dumped
STATX_ATTR_ENCRYPTED File requires key to decrypt in fs
Within the kernel, the supported flags are listed by:
KSTAT_ATTR_FS_IOC_FLAGS
[Are any other IOC flags of sufficient general interest to be exposed
through this interface?]
New flags include:
STATX_ATTR_AUTOMOUNT Object is an automount trigger
These are for the use of GUI tools that might want to mark files specially,
depending on what they are.
Fields in struct statx come in a number of classes:
(0) stx_dev_*, stx_blksize.
These are local system information and are always available.
(1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino,
stx_size, stx_blocks.
These will be returned whether the caller asks for them or not. The
corresponding bits in stx_mask will be set to indicate whether they
actually have valid values.
If the caller didn't ask for them, then they may be approximated. For
example, NFS won't waste any time updating them from the server,
unless as a byproduct of updating something requested.
If the values don't actually exist for the underlying object (such as
UID or GID on a DOS file), then the bit won't be set in the stx_mask,
even if the caller asked for the value. In such a case, the returned
value will be a fabrication.
Note that there are instances where the type might not be valid, for
instance Windows reparse points.
(2) stx_rdev_*.
This will be set only if stx_mode indicates we're looking at a
blockdev or a chardev, otherwise will be 0.
(3) stx_btime.
Similar to (1), except this will be set to 0 if it doesn't exist.
=======
TESTING
=======
The following test program can be used to test the statx system call:
samples/statx/test-statx.c
Just compile and run, passing it paths to the files you want to examine.
The file is built automatically if CONFIG_SAMPLES is enabled.
Here's some example output. Firstly, an NFS directory that crosses to
another FSID. Note that the AUTOMOUNT attribute is set because transiting
this directory will cause d_automount to be invoked by the VFS.
[root@andromeda ~]# /tmp/test-statx -A /warthog/data
statx(/warthog/data) = 0
results=7ff
Size: 4096 Blocks: 8 IO Block: 1048576 directory
Device: 00:26 Inode: 1703937 Links: 125
Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041
Access: 2016-11-24 09:02:12.219699527+0000
Modify: 2016-11-17 10:44:36.225653653+0000
Change: 2016-11-17 10:44:36.225653653+0000
Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------)
Secondly, the result of automounting on that directory.
[root@andromeda ~]# /tmp/test-statx /warthog/data
statx(/warthog/data) = 0
results=7ff
Size: 4096 Blocks: 8 IO Block: 1048576 directory
Device: 00:27 Inode: 2 Links: 125
Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041
Access: 2016-11-24 09:02:12.219699527+0000
Modify: 2016-11-17 10:44:36.225653653+0000
Change: 2016-11-17 10:44:36.225653653+0000
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fix up affected files that include this signal functionality via sched.h.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
inode_change_ok() will be resposible for clearing capabilities and IMA
extended attributes and as such will need dentry. Give it as an argument
to inode_change_ok() instead of an inode. Also rename inode_change_ok()
to setattr_prepare() to better relect that it does also some
modifications in addition to checks.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Use d_fsdata instead, which is the same size. Introduce helpers to hide
the typecasts.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Steve French <sfrench@samba.org>
if, when mounting //HOST/share/sub/dir/foo we can query /sub/dir/foo but
not any of the path components above:
- store the /sub/dir/foo prefix in the cifs super_block info
- in the superblock, set root dentry to the subpath dentry (instead of
the share root)
- set a flag in the superblock to remember it
- use prefixpath when building path from a dentry
fixes bso#8950
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 7196ac113a ("Fix to check Unique id and FileType when client
refer file directly.") checks whether the uniqueid of an inode has
changed when getting the inode info, but only when using the UNIX
extensions. Add a similar check for SMB2+, since this can be done
without an extra network roundtrip.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Jan Stancek reported that I wrecked things for him by fixing things for
Vladimir :/
His report was due to an UNINTERRUPTIBLE wait getting -EINTR, which
should not be possible, however my previous patch made this possible by
unconditionally checking signal_pending().
We cannot use current->state as was done previously, because the
instruction after the store to that variable it can be changed. We must
instead pass the initial state along and use that.
Fixes: 68985633bc ("sched/wait: Fix signal handling in bit wait helpers")
Reported-by: Jan Stancek <jstancek@redhat.com>
Reported-by: Chris Mason <clm@fb.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Chris Mason <clm@fb.com>
Reviewed-by: Paul Turner <pjt@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: tglx@linutronix.de
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: hpa@zytor.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The error paths in set_file_size for cifs and smb3 are incorrect.
In the unlikely event that a server did not support set file info
of the file size, the code incorrectly falls back to trying SMBWriteX
(note that only the original core SMB Write, used for example by DOS,
can set the file size this way - this actually does not work for the more
recent SMBWriteX). The idea was since the old DOS SMB Write could set
the file size if you write zero bytes at that offset then use that if
server rejects the normal set file info call.
Fortunately the SMBWriteX will never be sent on the wire (except when
file size is zero) since the length and offset fields were reversed
in the two places in this function that call SMBWriteX causing
the fall back path to return an error. It is also important to never call
an SMB request from an SMB2/sMB3 session (which theoretically would
be possible, and can cause a brief session drop, although the client
recovers) so this should be fixed. In practice this path does not happen
with modern servers but the error fall back to SMBWriteX is clearly wrong.
Removing the calls to SMBWriteX in the error paths in cifs_set_file_size
Pointed out by PaX/grsecurity team
Signed-off-by: Steve French <steve.french@primarydata.com>
Reported-by: PaX Team <pageexec@freemail.hu>
CC: Emese Revfy <re.emese@gmail.com>
CC: Brad Spengler <spender@grsecurity.net>
CC: Stable <stable@vger.kernel.org>
When you refer file directly on cifs client,
(e.g. ls -li <filename>, cd <dir>, stat <filename>)
the function return old inode number and filetype from old inode cache,
though server has different inode number or filetype.
When server is Windows, cifs client has same problem.
When Server is Windows
, This patch fixes bug in different filetype,
but does not fix bug in different inode number.
Because QUERY_PATH_INFO response by Windows does not include inode number(Index Number) .
BUG INFO
https://bugzilla.kernel.org/show_bug.cgi?id=90021https://bugzilla.kernel.org/show_bug.cgi?id=90031
Reported-by: Nakajima Akira <nakajima.akira@nttcom.co.jp>
Signed-off-by: Nakajima Akira <nakajima.akira@nttcom.co.jp>
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Coverity reports a warning due to unitialized attr structure in one
code path.
Reported by Coverity (CID 728535)
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Jeff Layton <jlayton@samba.org>
Now that we never use the backing_dev_info pointer in struct address_space
we can simply remove it and save 4 to 8 bytes in every inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mac server returns that they support CIFS Unix Extensions but
doesn't actually support QUERY_FILE_UNIX_BASIC so mount fails.
Workaround this problem by disabling use of Unix CIFS protocol
extensions if server returns an EOPNOTSUPP error on
QUERY_FILE_UNIX_BASIC during mount.
Signed-off-by: Steve French <smfrench@gmail.com>
This is a bigger patch, but its size is mostly due to
a single change for how we check for remapping illegal characters
in file names - a lot of repeated, small changes to
the way callers request converting file names.
The final patch in the series does the following:
1) changes default behavior for cifs to be more intuitive.
Currently we do not map by default to seven reserved characters,
ie those valid in POSIX but not in NTFS/CIFS/SMB3/Windows,
unless a mount option (mapchars) is specified. Change this
to by default always map and map using the SFM maping
(like the Mac uses) unless the server negotiates the CIFS Unix
Extensions (like Samba does when mounting with the cifs protocol)
when the remapping of the characters is unnecessary. This should
help SMB3 mounts in particular since Samba will likely be
able to implement this mapping with its new "vfs_fruit" module
as it will be doing for the Mac.
2) if the user specifies the existing "mapchars" mount option then
use the "SFU" (Microsoft Services for Unix, SUA) style mapping of
the seven characters instead.
3) if the user specifies "nomapposix" then disable SFM/MAC style mapping
(so no character remapping would be used unless the user specifies
"mapchars" on mount as well, as above).
4) change all the places in the code that check for the superblock
flag on the mount which is set by mapchars and passed in on all
path based operation and change it to use a small function call
instead to set the mapping type properly (and check for the
mapping type in the cifs unicode functions)
Signed-off-by: Steve French <smfrench@gmail.com>
The "sfu" mount option did not work on SMB2/SMB3 mounts.
With these changes when the "sfu" mount option is passed in
on an smb2/smb2.1/smb3 mount the client can emulate (and
recognize) fifo and device (character and device files).
In addition the "sfu" mount option should not conflict
with "mfsymlinks" (symlink emulation) as we will never
create "sfu" style symlinks, but using "sfu" mount option
will allow us to recognize existing symlinks, created with
Microsoft "Services for Unix" (SFU and SUA).
To enable the "sfu" mount option for SMB2/SMB3 the calling
syntax of the generic cifs/smb2/smb3 sync_read and sync_write
protocol dependent function needed to be changed (we
don't have a file struct in all cases), but this actually
ended up simplifying the code a little.
Signed-off-by: Steve French <smfrench@gmail.com>
CIFS servers process nlink counts differently for files and directories.
In cifs_rename() if we the request fails on the existing target, we
try to remove it through cifs_unlink() but this is not what we want
to do for directories. As the result the following sequence of commands
mkdir {1,2}; mv -T 1 2; rmdir {1,2}; mkdir {1,2}; echo foo > 2/bar
and XFS test generic/023 fail with -ENOENT error. That's why the second
mkdir reuses the existing inode (target inode of the mv -T command) with
S_DEAD flag.
Fix this by checking whether the target is directory or not and
calling cifs_rmdir() rather than cifs_unlink() for directories.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Pull cifs fixes from Steve French:
"Most important fixes in this set include three SMB3 fixes for stable
(including fix for possible kernel oops), and a workaround to allow
writes to Mac servers (only cifs dialect, not more current SMB2.1,
worked to Mac servers). Also fallocate support added, and lease fix
from Jeff"
* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
[SMB3] Enable fallocate -z support for SMB3 mounts
enable fallocate punch hole ("fallocate -p") for SMB3
Incorrect error returned on setting file compressed on SMB2
CIFS: Fix wrong directory attributes after rename
CIFS: Fix SMB2 readdir error handling
[CIFS] Possible null ptr deref in SMB2_tcon
[CIFS] Workaround MacOS server problem with SMB2.1 write response
cifs: handle lease F_UNLCK requests properly
Cleanup sparse file support by creating worker function for it
Add sparse file support to SMB2/SMB3 mounts
Add missing definitions for CIFS File System Attributes
cifs: remove unused function cifs_oplock_break_wait
When we requests rename we also need to update attributes
of both source and target parent directories. Not doing it
causes generic/309 xfstest to fail on SMB2 mounts. Fix this
by marking these directories for force revalidating.
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Pull vfs updates from Al Viro:
"Stuff in here:
- acct.c fixes and general rework of mnt_pin mechanism. That allows
to go for delayed-mntput stuff, which will permit mntput() on deep
stack without worrying about stack overflows - fs shutdown will
happen on shallow stack. IOW, we can do Eric's umount-on-rmdir
series without introducing tons of stack overflows on new mntput()
call chains it introduces.
- Bruce's d_splice_alias() patches
- more Miklos' rename() stuff.
- a couple of regression fixes (stable fodder, in the end of branch)
and a fix for API idiocy in iov_iter.c.
There definitely will be another pile, maybe even two. I'd like to
get Eric's series in this time, but even if we miss it, it'll go right
in the beginning of for-next in the next cycle - the tricky part of
prereqs is in this pile"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits)
fix copy_tree() regression
__generic_file_write_iter(): fix handling of sync error after DIO
switch iov_iter_get_pages() to passing maximal number of pages
fs: mark __d_obtain_alias static
dcache: d_splice_alias should detect loops
exportfs: update Exporting documentation
dcache: d_find_alias needn't recheck IS_ROOT && DCACHE_DISCONNECTED
dcache: remove unused d_find_alias parameter
dcache: d_obtain_alias callers don't all want DISCONNECTED
dcache: d_splice_alias should ignore DCACHE_DISCONNECTED
dcache: d_splice_alias mustn't create directory aliases
dcache: close d_move race in d_splice_alias
dcache: move d_splice_alias
namei: trivial fix to vfs_rename_dir comment
VFS: allow ->d_manage() to declare -EISDIR in rcu_walk mode.
cifs: support RENAME_NOREPLACE
hostfs: support rename flags
shmem: support RENAME_EXCHANGE
shmem: support RENAME_NOREPLACE
btrfs: add RENAME_NOREPLACE
...
This flag gives CIFS the ability to support its native rename semantics.
Implementation is simple: just bail out before trying to hack around the
noreplace semantics.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Steve French <smfrench@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
It is currently not possible for various wait_on_bit functions
to implement a timeout.
While the "action" function that is called to do the waiting
could certainly use schedule_timeout(), there is no way to carry
forward the remaining timeout after a false wake-up.
As false-wakeups a clearly possible at least due to possible
hash collisions in bit_waitqueue(), this is a real problem.
The 'action' function is currently passed a pointer to the word
containing the bit being waited on. No current action functions
use this pointer. So changing it to something else will be a
little noisy but will have no immediate effect.
This patch changes the 'action' function to take a pointer to
the "struct wait_bit_key", which contains a pointer to the word
containing the bit so nothing is really lost.
It also adds a 'private' field to "struct wait_bit_key", which
is initialized to zero.
An action function can now implement a timeout with something
like
static int timed_out_waiter(struct wait_bit_key *key)
{
unsigned long waited;
if (key->private == 0) {
key->private = jiffies;
if (key->private == 0)
key->private -= 1;
}
waited = jiffies - key->private;
if (waited > 10 * HZ)
return -EAGAIN;
schedule_timeout(waited - 10 * HZ);
return 0;
}
If any other need for context in a waiter were found it would be
easy to use ->private for some other purpose, or even extend
"struct wait_bit_key".
My particular need is to support timeouts in nfs_release_page()
to avoid deadlocks with loopback mounted NFS.
While wait_on_bit_timeout() would be a cleaner interface, it
will not meet my need. I need the timeout to be sensitive to
the state of the connection with the server, which could change.
So I need to use an 'action' interface.
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steve French <sfrench@samba.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140707051604.28027.41257.stgit@notabene.brown
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().
So:
Rename wait_on_bit and wait_on_bit_lock to
wait_on_bit_action and wait_on_bit_lock_action
to make it explicit that they need an action function.
Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
which are *not* given an action function but implicitly use
a standard one.
The decision to error-out if a signal is pending is now made
based on the 'mode' argument rather than being encoded in the action
function.
All instances of the old wait_on_bit and wait_on_bit_lock which
can use the new version have been changed accordingly and their
action functions have been discarded.
wait_on_bit{_lock} does not return any specific error code in the
event of a signal so the caller must check for non-zero and
interpolate their own error code as appropriate.
The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"
The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.
A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack. So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).
Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS. CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steve French <sfrench@samba.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull CIFS fixes from Steve French.
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Fix memory leaks in SMB2_open
cifs: ensure that vol->username is not NULL before running strlen on it
Clarify SMB2/SMB3 create context and add missing ones
Do not send ClientGUID on SMB2.02 dialect
cifs: Set client guid on per connection basis
fs/cifs/netmisc.c: convert printk to pr_foo()
fs/cifs/cifs.c: replace seq_printf by seq_puts
Update cifs version number to 2.03
fs: cifs: new helper: file_inode(file)
cifs: fix potential races in cifs_revalidate_mapping
cifs: new helper function: cifs_revalidate_mapping
cifs: convert booleans in cifsInodeInfo to a flags field
cifs: fix cifs_uniqueid_to_ino_t not to ever return 0
The handling of the CIFS_INO_INVALID_MAPPING flag is racy. It's possible
for two tasks to attempt to revalidate the mapping at the same time. The
first sees that CIFS_INO_INVALID_MAPPING is set. It clears the flag and
then calls invalidate_inode_pages2 to start shooting down the pagecache.
While that's going on, another task checks the flag and sees that it's
clear. It then ends up trusting the pagecache to satisfy a read when it
shouldn't.
Fix this by adding a bitlock to ensure that the clearing of the flag is
atomic with respect to the actual cache invalidation. Also, move the
other existing users of cifs_invalidate_mapping to use a new
cifs_zap_mapping() function that just sets the INVALID_MAPPING bit and
then uses the standard codepath to handle the invalidation.
Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Steve French <smfrench@gmail.com>