Connect the new VFS clone_range, copy_range, and dedupe_range features
to the existing reflink capability of ocfs2. Compared to the existing
ocfs2 reflink ioctl We have to do things a little differently to support
the VFS semantics (we can clone subranges of a file but we don't clone
xattrs), but the VFS ioctls are more broadly supported.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: Convert inline data files to extents files before reflinking,
and fix i_blocks so that stat(2) output is correct.
v3: Make zero-length dedupe consistent with btrfs behavior.
v4: Use VFS double-inode lock routines and remove MAX_DEDUPE_LEN.
When ocfs2 shares blocks from one file to another, it's necessary to
charge that many blocks to the quota because ocfs2 tallies block charges
according to the number of blocks mapped, not the number of physical
blocks used.
Without this patch, reflinking X blocks and then CoWing all of them
causes quota usage to *decrease* by X as seen in generic/305.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
generic/188 triggered a dmesg stack trace because the dio completion
was casting a buffer head to an on-disk inode, which is whacky.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Always unlock the inode when completing dio writes, even if an error
has occurrred. The caller already checks the inode and unlocks it
if needed, so we might as well reduce contention.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
ocfs2_dio_end_io_write eats whatever errors may happen,
which means that write errors do not propagate to userspace.
Fix that.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
When we're adding the refcount flag to an extent, we have to budget
enough space to handle a full extent btree split in addition to
whatever modifications have to be made to the refcount btree. We
don't currently do this, with the result that generic/186 crashes
when we need an extent split but not a refcount split because meta_ac
never gets allocated.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The swapfile mechanism calls bmap once to find all the swap file
mappings, which means that we cannot properly support CoW remapping.
Therefore, error out if the swap code tries to call bmap on a
refcounted file.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Replace the open-coded inode refcount flag test with a helper function
to reduce the potential for bugs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
... and don't zero anything on short copy; just unlock
and return 0 if that has happened on non-uptodate page.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If we had a short copy into an uptodate page, there's no reason
whatsoever to zero anything; OTOH, if that page had _not_ been
uptodate, we must have been trying to overwrite it completely
and got a short copy. In that case, overwriting the end with
zeroes, marking uptodate and sending to server is just plain
wrong. Just unlock, keep it non-uptodate and return 0.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
a) the page is uptodate - ->write_begin() would either fail (in which
case we don't reach ->write_end()), or unstuff the inode, or find the
page already uptodate, or do a successful call of stuffed_readpage(),
which would've made it uptodate
b) zeroing the tail in pagecache is wrong. kill -9 at the right time
while writing unmodified file contents to the same file should _not_
leave us in a situation when read() from the file will be reporting
it full of zeroes. Especially since that effect will be transient -
at some later point the page will be evicted and then we'll be back
to the real file contents.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
don't zero on short copies; if the page was uptodate it's just plain
wrong, and if it wasn't we'll be better off just returning 0 and
buggering off.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
What matters when deciding if we should make a page uptodate is
not how much we _wanted_ to copy, but how much we actually have
copied. As it is, on architectures that do not zero tail on
short copy we can leave uninitialized data in page marked uptodate.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Hoist both the XFS reflink inode state and preparation code and the XFS
file blocks compare functions into the VFS so that ocfs2 can take
advantage of it for reflink and dedupe.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
A clone is a perfectly fine implementation of a file copy, so most
file systems just implement the copy that way. Instead of duplicating
this logic move it to the VFS. Currently btrfs and XFS implement copies
the same way as clones and there is no behavior change for them, cifs
only implements clones and grow support for copy_file_range with this
patch. NFS implements both, so this will allow copy_file_range to work
on servers that only implement CLONE and be lot more efficient on servers
that implements CLONE and COPY.
Signed-off-by: Christoph Hellwig <hch@lst.de>
The function path_is_under() doesn't modify the paths pointed by its
arguments but only browse them. Constifying this pointers make a cleaner
interface to be used by (future) code which may only have access to
const struct path pointers (e.g. LSM hooks).
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull overlayfs fix from Miklos Szeredi:
"This fixes a regression introduced in 4.8"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fix d_real() for stacked fs
The ER records are printed without explicit log level presuming line
continuation until "\n". After the commit 4bcc595ccd (printk:
reinstate KERN_CONT for printing continuation lines), the ER records are
printed a character per line.
Adding KERN_CONT to appropriate printk statements restores the printout
behavior.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Handling of recursion in d_real() is completely broken. Recursion is only
done in the 'inode != NULL' case. But when opening the file we have
'inode == NULL' hence d_real() will return an overlay dentry. This won't
work since overlayfs doesn't define its own file operations, so all file
ops will fail.
Fix by doing the recursion first and the check against the inode second.
Bash script to reproduce the issue written by Quentin:
- 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - -
tmpdir=$(mktemp -d)
pushd ${tmpdir}
mkdir -p {upper,lower,work}
echo -n 'rocks' > lower/ksplice
mount -t overlay level_zero upper -o lowerdir=lower,upperdir=upper,workdir=work
cat upper/ksplice
tmpdir2=$(mktemp -d)
pushd ${tmpdir2}
mkdir -p {upper,work}
mount -t overlay level_one upper -o lowerdir=${tmpdir}/upper,upperdir=upper,workdir=work
ls -l upper/ksplice
cat upper/ksplice
- 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - -
Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 2d902671ce ("vfs: merge .d_select_inode() into .d_real()")
Cc: <stable@vger.kernel.org> # v4.8+
Commit 2211d5ba5c ("posix_acl: xattr representation cleanups")
removes the typedefs and the zero-length a_entries array in struct
posix_acl_xattr_header, and uses bare struct posix_acl_xattr_header
and struct posix_acl_xattr_entry directly.
But it failed to iterate over posix acl slots when converting posix
acls to CIFS format, which results in several test failures in
xfstests (generic/053 generic/105) when testing against a samba v1
server, starting from v4.9-rc1 kernel. e.g.
[root@localhost xfstests]# diff -u tests/generic/105.out /root/xfstests/results//generic/105.out.bad
--- tests/generic/105.out 2016-09-19 16:33:28.577962575 +0800
+++ /root/xfstests/results//generic/105.out.bad 2016-10-22 15:41:15.201931110 +0800
@@ -1,3 +1,4 @@
QA output created by 105
-rw-r--r-- root
+setfacl: subdir: Invalid argument
-rw-r--r-- root
Fix it by introducing a new "ace" var, like what
cifs_copy_posix_acl() does, and iterating posix acl xattr entries
over it in the for loop.
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Commit 4fcd1813e6 ("Fix reconnect to not defer smb3 session reconnect
long after socket reconnect") changes the behaviour of the SMB2 echo
service and causes it to renegotiate after a socket reconnect. However
under default settings, the echo service could take up to 120 seconds to
be scheduled.
The patch forces the echo service to be called immediately resulting a
negotiate call being made immediately on reconnect.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Andy Lutromirski's new virtually mapped kernel stack allocations moves
kernel stacks the vmalloc area. This triggers the bug
kernel BUG at ./include/linux/scatterlist.h:140!
at calc_seckey()->sg_init()
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Botched calculation of number of pages. As the result,
we were dropping pieces when doing splice to pipe from
e.g. 9p.
Reported-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Stable Bugfixes:
- Hide array-bounds warning
Bugfixes:
- Keep a reference on lock states while checking
- Handle NFS4ERR_OLD_STATEID in nfs4_reclaim_open_state
- Don't call close if the open stateid has already been cleared
- Fix CLOSE rases with OPEN
- Fix a regression in DELEGRETURN
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJYNhGKAAoJENfLVL+wpUDrGgEP/0okAGQfb7yHVNYjDpMmVh7u
6T1Vh+xbIMsGmuLXPOJH3FRFDnPWCrZO77K+l1y5oMl1fW/hA5h07yt0g0wT94+u
if1wunZ6bak6KFeevo4xphpqXCjLhwpe801SbBcJPY6D6YxMckobHR8NcuzTjFab
Kc9OAjnpIzS2lJBThaeyavGGnrlhNvH+Le+zEgMv/bSBTiPSymLlpj12a88cuHRF
hx2vBao3UuR1vaTaZ5Zdp954DtNXNo7Pikye11cvVJVhesNwpZe37SszcRZ1U6P4
o4LnYf/ImkjDrcRyvFRxc6bu/Q1jLBuAYZjB4oMcx7YQW8rJqcS/UkEpGzOfER3i
3NQXFqacIAGhULfJxF8W0vPGzKM74koa0HRRI34C10qZAPe06Iy8slkdIjM4t2IX
ASJI+uyrbIqTQ/x3FObWlqvw4TCOntYFpOsHF6G8M0uj+tX+3iXjpmwDGsJDVyFE
y+egnnVn9LmGGfg1SBU2VBKL2945e/VAWfHtDGmJYgEwNDiqtutoIMDn+szESX60
yGLPJdIL3O7pTWmDXdSSpUJZ+wqa90rrU34kGmk3njydaNHeA1SEhcNTi2Ha5ALb
NcVD0omnhrZUFE5MRY0OtmHRwhsaa9CYlMyqzb5SEeb46Z3KUm1KX9qEy4I4rZHG
C4MlTY5AScHqqNXmT8Pu
=YhQv
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.9-4' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client bugfixes from Anna Schumaker:
"Most of these fix regressions or races, but there is one patch for
stable that Arnd sent me
Stable bugfix:
- Hide array-bounds warning
Bugfixes:
- Keep a reference on lock states while checking
- Handle NFS4ERR_OLD_STATEID in nfs4_reclaim_open_state
- Don't call close if the open stateid has already been cleared
- Fix CLOSE rases with OPEN
- Fix a regression in DELEGRETURN"
* tag 'nfs-for-4.9-4' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFSv4.x: hide array-bounds warning
NFSv4.1: Keep a reference on lock states while checking
NFSv4.1: Handle NFS4ERR_OLD_STATEID in nfs4_reclaim_open_state
NFSv4: Don't call close if the open stateid has already been cleared
NFSv4: Fix CLOSE races with OPEN
NFSv4.1: Fix a regression in DELEGRETURN
A correct bugfix introduced a harmless warning that shows up with gcc-7:
fs/nfs/callback.c: In function 'nfs_callback_up':
fs/nfs/callback.c:214:14: error: array subscript is outside array bounds [-Werror=array-bounds]
What happens here is that the 'minorversion == 0' check tells the
compiler that we assume minorversion can be something other than 0,
but when CONFIG_NFS_V4_1 is disabled that would be invalid and
result in an out-of-bounds access.
The added check for IS_ENABLED(CONFIG_NFS_V4_1) tells gcc that this
really can't happen, which makes the code slightly smaller and also
avoids the warning.
The bugfix that introduced the warning is marked for stable backports,
we want this one backported to the same releases.
Fixes: 98b0f80c23 ("NFSv4.x: Fix a refcount leak in nfs_callback_up_net")
Cc: stable@vger.kernel.org # v3.7+
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
While walking the list of lock_states, keep a reference on each
nfs4_lock_state to be checked, otherwise the lock state could be removed
while the check performs TEST_STATEID and possible FREE_STATEID.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
panic the kernel) and some fixes for CONFIG_VMAP_STACK.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlgxCMoACgkQ8vlZVpUN
gaOX3Af/QOphB5pKrKijhDK9H40nKS6lHtL7klJpvRafUMtVxBDOP3dsRISyGMdF
w+gQQQv+eFEPefwGcYzdO4PN7FFVirAF9RS/NTFSIB/c8V6FfHzn/DeiftU7CLRW
ljTP7y8M9eo35TsU8s9D7wfbyfY55MEANiAP8vnpx4JKDb86I/8Eaa6YS91v17vp
/7TKSUt7PE6UUp7mgTRCX8vK9SxJJ8Xvg2hSzulfrO1DdsfW61RQYXwif+biR85T
uxFPnV0yvji2EU4cpeIekPqJKUb9Av0aIbSwg19QqcAE0xqxvtSRBKlYnF2IRTuv
OXoaC30d4UcQrNCkxPDAdH/0BMdcNQ==
=y+5G
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"A security fix (so a maliciously corrupted file system image won't
panic the kernel) and some fixes for CONFIG_VMAP_STACK"
* tag 'ext4_for_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: sanity check the block and cluster size at mount time
fscrypto: don't use on-stack buffer for key derivation
fscrypto: don't use on-stack buffer for filename encryption
With the new (in 4.9) option to use a virtually-mapped stack
(CONFIG_VMAP_STACK), stack buffers cannot be used as input/output for
the scatterlist crypto API because they may not be directly mappable to
struct page. get_crypt_info() was using a stack buffer to hold the
output from the encryption operation used to derive the per-file key.
Fix it by using a heap buffer.
This bug could most easily be observed in a CONFIG_DEBUG_SG kernel
because this allowed the BUG in sg_set_buf() to be triggered.
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
With the new (in 4.9) option to use a virtually-mapped stack
(CONFIG_VMAP_STACK), stack buffers cannot be used as input/output for
the scatterlist crypto API because they may not be directly mappable to
struct page. For short filenames, fname_encrypt() was encrypting a
stack buffer holding the padded filename. Fix it by encrypting the
filename in-place in the output buffer, thereby making the temporary
buffer unnecessary.
This bug could most easily be observed in a CONFIG_DEBUG_SG kernel
because this allowed the BUG in sg_set_buf() to be triggered.
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Now that we're doing TEST_STATEID in nfs4_reclaim_open_state(), we can have
a NFS4ERR_OLD_STATEID returned from nfs41_open_expired() . Instead of
marking state recovery as failed, mark the state for recovery again.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Ensure we test to see if the open stateid is actually set, before we
send a CLOSE.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If the reply to a successful CLOSE call races with an OPEN to the same
file, we can end up scribbling over the stateid that represents the
new open state.
The race looks like:
Client Server
====== ======
CLOSE stateid A on file "foo"
CLOSE stateid A, return stateid C
OPEN file "foo"
OPEN "foo", return stateid B
Receive reply to OPEN
Reset open state for "foo"
Associate stateid B to "foo"
Receive CLOSE for A
Reset open state for "foo"
Replace stateid B with C
The fix is to examine the argument of the CLOSE, and check for a match
with the current stateid "other" field. If the two do not match, then
the above race occurred, and we should just ignore the CLOSE.
Reported-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
We don't want to call nfs4_free_revoked_stateid() in the case where
the delegreturn was successful.
Reported-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Pull vfs fixes from Al Viro:
"A couple of regression fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix iov_iter_advance() for ITER_PIPE
xattr: Fix setting security xattrs on sockfs
Without ".owner = THIS_MODULE" it is possible to crash the kernel
by unloading the Orangefs module while someone is reading debugfs
files.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYLfvEAAoJEM9EDqnrzg2+sHEQAJo4jn/sAQvO04ujaMrViLmy
5+V93F7jwGFeLvAwjMvPAeBb+UmlgqjVi0VT85RzEe6eNOKN9qlj9ZNDutOfnbhr
H6qu8AQsbO0znSTQuJA1M2Hca9h66EnN0pT8xW4wat1cCdAf6X6HcFcr1lZIRKZd
E17EygXi+IW0c0evIq4UBsD0DfTZgtC4ONrR9N7+zprlg2PVX35So6Lr0ODceJQs
StWHrZW9hDZ6KR8WocupuHPR8brOe+P5PU14fPzR1+EH7BsTf8uxWK7CfTE5ov0C
UNkNeh81BOkwIQDFoPCJ5asaipdi5RRNTIQekhhQ2GnaaCdmCKln8OLjqDZZOmDj
KRGB4mdPcCb3XlvMH3SaXNmyhmjt2cTS0/TQPexrTqjSNmbXmnzJOCguweoTIJ5w
CgEnsrNp8GwlZo12Z8JkFGxC39ifjH4F+KFetU+eUNjw9Tce+zHwgEvsAMqDhWw8
FJQWy+snG7m8ooytRObWPepchnd2XHkrJv4yu8uw3GirM+YTlxvuWnB54hVH17FQ
0vKYhdAXBUmeeyyNKApBSGQezPWD9hfAY5Di7JGJlaTiai3pVxgXd8YY4DGXHj3t
ebPpxEnlWrRLC5Cazd0yC9CoR8azQp9zvRgfPuPEM4wJSjUFVfmasmFg7s99h3Zq
vnTqfV/uQwLm9f+3CfNB
=s21f
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.9-rc5-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs fix from Mike Marshall:
"orangefs: add .owner to debugfs file_operations
Without ".owner = THIS_MODULE" it is possible to crash the kernel by
unloading the Orangefs module while someone is reading debugfs files"
* tag 'for-linus-4.9-rc5-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: add .owner to debugfs file_operations