build_sit_info() allocate all bitmaps for each segment one by one,
it's quite low efficiency, this pach changes to allocate large
continuous memory at a time, and divide it and assign for each bitmaps
of segment. For large size image, it can expect improving its mount
speed.
Signed-off-by: Chen Gong <gongchen4@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Support two generic fs ioctls FS_IOC_{GET,SET}FSLABEL, letting
f2fs pass generic/492 testcase.
Fixes were made by Eric where:
- f2fs: fix buffer overruns in FS_IOC_{GET, SET}FSLABEL
utf16s_to_utf8s() and utf8s_to_utf16s() take the number of characters,
not the number of bytes.
- f2fs: fix copying too many bytes in FS_IOC_SETFSLABEL
Userspace provides a null-terminated string, so don't assume that the
full FSLABEL_MAX bytes can always be copied.
- f2fs: add missing authorization check in FS_IOC_SETFSLABEL
FS_IOC_SETFSLABEL modifies the filesystem superblock, so it shouldn't be
allowed to regular users. Require CAP_SYS_ADMIN, like xfs and btrfs do.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
There is one case can cause data corruption.
- write 4k to fileA
- fsync fileA, 4k data is writebacked to lbaA
- write 4k to fileA
- kworker flushs 4k to lbaB; dnode contain lbaB didn't be persisted yet
- write 4k to fileB
- kworker flush 4k to lbaA due to SSR
- SPOR -> dnode with lbaA will be recovered, however lbaA contains fileB's
data
One solution is tracking all fsynced file's block history, and disallow
SSR overwrite on newly invalidated block on that file.
However, during recovery, no matter the dnode is flushed or fsynced, all
previous dnodes until last fsynced one in node chain can be recovered,
that means we need to record all block change in flushed dnode, which
will cause heavy cost, so let's just use simple fix by forbidding SSR
overwrite directly.
Fixes: 5b6c6be2d8 ("f2fs: use SSR for warm node as well")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If CONFIG_F2FS_FS=y but CONFIG_NLS=m, building fails:
fs/f2fs/file.o: In function `f2fs_ioctl':
file.c:(.text+0xb86f): undefined reference to `utf16s_to_utf8s'
file.c:(.text+0xe651): undefined reference to `utf8s_to_utf16s'
Select CONFIG_NLS to fix this.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 61a3da4d5ef8 ("f2fs: support FS_IOC_{GET,SET}FSLABEL")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Pavel Machek reported:
"We normally use -EUCLEAN to signal filesystem corruption. Plus, it is
good idea to report it to the syslog and mark filesystem as "needing
fsck" if filesystem can do that."
Still we need improve the original patch with:
- use unlikely keyword
- add message print
- return EUCLEAN
However, after rethink this patch, I don't think we should add such
condition check here as below reasons:
- We have already checked the field in f2fs_sanity_check_ckpt(),
- If there is fs corrupt or security vulnerability, there is nothing
to guarantee the field is integrated after the check, unless we do
the check before each of its use, however no filesystem does that.
- We only have similar check for bitmap, which was added due to there
is bitmap corruption happened on f2fs' runtime in product.
- There are so many key fields in SB/CP/NAT did have such check
after f2fs_sanity_check_{sb,cp,..}.
So I propose to revert this unneeded check.
This reverts commit 56f3ce6751.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We do not need to set the SBI_NEED_FSCK flag in the error paths, if we
return error here, we will not update the checkpoint flag, so the code
is useless, just remove it.
Signed-off-by: Lihong Kou <koulihong@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In mkfs, we have counted quota file's node number in cp.valid_node_count,
so we have to avoid wrong substraction of quota node number in
.available_nid/.avail_node_count calculation.
f2fs_write_check_point_pack()
{
..
set_cp(valid_node_count, 1 + c.quota_inum + c.lpf_inum);
Fixes: 292c196a36 ("f2fs: reserve nid resource for quota sysfile")
Fixes: 7b63f72f73 ("f2fs: fix to do sanity check on valid node/block count")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We will do the same check in generic_write_checks.
if (iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT)
return -EINVAL;
just remove the same check in f2fs_file_write_iter.
Signed-off-by: Lihong Kou <koulihong@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
During defragment, we missed to trigger fragmented blocks migration
for below condition:
In defragment region:
- total number of valid blocks is smaller than 512;
- the tail part of the region are all holes;
In addtion, return zero to user via range->len if there is no
fragmented blocks.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
EOPNOTSUPP is widely used as error number indicating operation is
not supported in syscall, and ENOTSUPP was defined and only used
for NFSv3 protocol, so use EOPNOTSUPP instead.
Fixes: 0a2aa8fbb9 ("f2fs: refactor __exchange_data_block for speed up")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Modeled after commit b886ee3e77 ("ext4: Support case-insensitive file
name lookups")
"""
This patch implements the actual support for case-insensitive file name
lookups in f2fs, based on the feature bit and the encoding stored in the
superblock.
A filesystem that has the casefold feature set is able to configure
directories with the +F (F2FS_CASEFOLD_FL) attribute, enabling lookups
to succeed in that directory in a case-insensitive fashion, i.e: match
a directory entry even if the name used by userspace is not a byte per
byte match with the disk name, but is an equivalent case-insensitive
version of the Unicode string. This operation is called a
case-insensitive file name lookup.
The feature is configured as an inode attribute applied to directories
and inherited by its children. This attribute can only be enabled on
empty directories for filesystems that support the encoding feature,
thus preventing collision of file names that only differ by case.
* dcache handling:
For a +F directory, F2Fs only stores the first equivalent name dentry
used in the dcache. This is done to prevent unintentional duplication of
dentries in the dcache, while also allowing the VFS code to quickly find
the right entry in the cache despite which equivalent string was used in
a previous lookup, without having to resort to ->lookup().
d_hash() of casefolded directories is implemented as the hash of the
casefolded string, such that we always have a well-known bucket for all
the equivalencies of the same string. d_compare() uses the
utf8_strncasecmp() infrastructure, which handles the comparison of
equivalent, same case, names as well.
For now, negative lookups are not inserted in the dcache, since they
would need to be invalidated anyway, because we can't trust missing file
dentries. This is bad for performance but requires some leveraging of
the vfs layer to fix. We can live without that for now, and so does
everyone else.
* on-disk data:
Despite using a specific version of the name as the internal
representation within the dcache, the name stored and fetched from the
disk is a byte-per-byte match with what the user requested, making this
implementation 'name-preserving'. i.e. no actual information is lost
when writing to storage.
DX is supported by modifying the hashes used in +F directories to make
them case/encoding-aware. The new disk hashes are calculated as the
hash of the full casefolded string, instead of the string directly.
This allows us to efficiently search for file names in the htree without
requiring the user to provide an exact name.
* Dealing with invalid sequences:
By default, when a invalid UTF-8 sequence is identified, ext4 will treat
it as an opaque byte sequence, ignoring the encoding and reverting to
the old behavior for that unique file. This means that case-insensitive
file name lookup will not work only for that file. An optional bit can
be set in the superblock telling the filesystem code and userspace tools
to enforce the encoding. When that optional bit is set, any attempt to
create a file name using an invalid UTF-8 sequence will fail and return
an error to userspace.
* Normalization algorithm:
The UTF-8 algorithms used to compare strings in f2fs is implemented
in fs/unicode, and is based on a previous version developed by
SGI. It implements the Canonical decomposition (NFD) algorithm
described by the Unicode specification 12.1, or higher, combined with
the elimination of ignorable code points (NFDi) and full
case-folding (CF) as documented in fs/unicode/utf8_norm.c.
NFD seems to be the best normalization method for F2FS because:
- It has a lower cost than NFC/NFKC (which requires
decomposing to NFD as an intermediary step)
- It doesn't eliminate important semantic meaning like
compatibility decompositions.
Although:
- This implementation is not completely linguistic accurate, because
different languages have conflicting rules, which would require the
specialization of the filesystem to a given locale, which brings all
sorts of problems for removable media and for users who use more than
one language.
"""
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Add charset encoding to f2fs to support casefolding. It is modeled after
the same feature introduced in commit c83ad55eaa ("ext4: include charset
encoding information in the superblock")
Currently this is not compatible with encryption, similar to the current
ext4 imlpementation. This will change in the future.
>From the ext4 patch:
"""
The s_encoding field stores a magic number indicating the encoding
format and version used globally by file and directory names in the
filesystem. The s_encoding_flags defines policies for using the charset
encoding, like how to handle invalid sequences. The magic number is
mapped to the exact charset table, but the mapping is specific to ext4.
Since we don't have any commitment to support old encodings, the only
encoding I am supporting right now is utf8-12.1.0.
The current implementation prevents the user from enabling encoding and
per-directory encryption on the same filesystem at the same time. The
incompatibility between these features lies in how we do efficient
directory searches when we cannot be sure the encryption of the user
provided fname will match the actual hash stored in the disk without
decrypting every directory entry, because of normalization cases. My
quickest solution is to simply block the concurrent use of these
features for now, and enable it later, once we have a better solution.
"""
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
vfree() don't wish to be called from interrupt context, move it
out of spin_lock_irqsave() coverage.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In fill_super() and put_super(), f2fs_destroy_stats() is called
in prior to f2fs_destroy_segment_manager(), so if current
sbi can still be visited in global stat list, SM_I(sbi) should be
released yet.
For this reason, SM_I(sbi) does not need to be checked in
update_general_status().
Thank Chao Yu for advice.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Atomic write needs page cache to cache data of transaction,
direct IO should never be allowed in atomic write, detect
and deny it when open atomic write file.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
With quota_ino feature on, generic/232 reports an inconsistence issue
on the image.
The root cause is that the testcase tries to:
- use quotactl to shutdown journalled quota based on sysfile;
- and then use quotactl to enable/turn on quota based on specific file
(aquota.user or aquota.group).
Eventually, quota sysfile will be out-of-update due to following specific
file creation.
Change as below to fix this issue:
- deny enabling quota based on specific file if quota sysfile exists.
- set SBI_QUOTA_NEED_REPAIR once sysfile based quota shutdowns via
ioctl.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
It needs to return -EIO if filesystem has been shutdown, fix the
miss case in f2fs_setxattr().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We missed to call f2fs_is_checkpoint_ready() in several places, it may
allow space allocation even when free space was exhausted during
checkpoint is disabled, fix to add them.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Adjust f2fs_fiemap() to support fiemap() on directory inode.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
=============================================================================
BUG discard_cmd (Tainted: G B OE ): Objects remaining in discard_cmd on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
INFO: Slab 0xffffe1ac481d22c0 objects=36 used=2 fp=0xffff936b4748bf50 flags=0x2ffff0000000100
Call Trace:
dump_stack+0x63/0x87
slab_err+0xa1/0xb0
__kmem_cache_shutdown+0x183/0x390
shutdown_cache+0x14/0x110
kmem_cache_destroy+0x195/0x1c0
f2fs_destroy_segment_manager_caches+0x21/0x40 [f2fs]
exit_f2fs_fs+0x35/0x641 [f2fs]
SyS_delete_module+0x155/0x230
? vtime_user_exit+0x29/0x70
do_syscall_64+0x6e/0x160
entry_SYSCALL64_slow_path+0x25/0x25
INFO: Object 0xffff936b4748b000 @offset=0
INFO: Object 0xffff936b4748b070 @offset=112
kmem_cache_destroy discard_cmd: Slab cache still has objects
Call Trace:
dump_stack+0x63/0x87
kmem_cache_destroy+0x1b4/0x1c0
f2fs_destroy_segment_manager_caches+0x21/0x40 [f2fs]
exit_f2fs_fs+0x35/0x641 [f2fs]
SyS_delete_module+0x155/0x230
do_syscall_64+0x6e/0x160
entry_SYSCALL64_slow_path+0x25/0x25
Recovery can cache discard commands, so in error path of fill_super(),
we need give a chance to handle them, otherwise it will lead to leak
of discard_cmd slab cache.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
On a quota disabled image, with fault injection, SBI_QUOTA_NEED_REPAIR
will be set incorrectly in error path of f2fs_evict_inode(), fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=204193
A null pointer dereference bug is triggered in f2fs under kernel-5.1.3.
kasan_report.cold+0x5/0x32
f2fs_write_end_io+0x215/0x650
bio_endio+0x26e/0x320
blk_update_request+0x209/0x5d0
blk_mq_end_request+0x2e/0x230
lo_complete_rq+0x12c/0x190
blk_done_softirq+0x14a/0x1a0
__do_softirq+0x119/0x3e5
irq_exit+0x94/0xe0
call_function_single_interrupt+0xf/0x20
During umount, we will access NULL sbi->node_inode pointer in
f2fs_write_end_io():
f2fs_bug_on(sbi, page->mapping == NODE_MAPPING(sbi) &&
page->index != nid_of_node(page));
The reason is if disable_checkpoint mount option is on, meta dirty
pages can remain during umount, and then be flushed by iput() of
meta_inode, however node_inode has been iput()ed before
meta_inode's iput().
Since checkpoint is disabled, all meta/node datas are useless and
should be dropped in next mount, so in umount, let's adjust
drop_inode() to give a hint to iput_final() to drop all those dirty
datas correctly.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If IO alignment feature is turned on after remount, we didn't
initialize mempool of it, it turns out we will encounter panic
during IO submission due to access NULL mempool pointer.
This feature should be set only at mount time, so simply deny
configuring during remount.
This fixes bug reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=204135
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Since 07173c3ec2 ("block: enable multipage bvecs"), one bio vector
can store multi pages, so that we can not calculate max IO size of
bio as PAGE_SIZE * bio->bi_max_vecs. However IO alignment feature of
f2fs always has that assumption, so finally, it may cause panic during
IO submission as below stack.
kernel BUG at fs/f2fs/data.c:317!
RIP: 0010:__submit_merged_bio+0x8b0/0x8c0
Call Trace:
f2fs_submit_page_write+0x3cd/0xdd0
do_write_page+0x15d/0x360
f2fs_outplace_write_data+0xd7/0x210
f2fs_do_write_data_page+0x43b/0xf30
__write_data_page+0xcf6/0x1140
f2fs_write_cache_pages+0x3ba/0xb40
f2fs_write_data_pages+0x3dd/0x8b0
do_writepages+0xbb/0x1e0
__writeback_single_inode+0xb6/0x800
writeback_sb_inodes+0x441/0x910
wb_writeback+0x261/0x650
wb_workfn+0x1f9/0x7a0
process_one_work+0x503/0x970
worker_thread+0x7d/0x820
kthread+0x1ad/0x210
ret_from_fork+0x35/0x40
This patch adds one extra condition to check left space in bio while
trying merging page to bio, to avoid panic.
This bug was reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=204043
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Wrap merge condition into function for readability, no logic change.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
- Fix crashes when the attr fork isn't present due to errors but inode
inactivation tries to zap the attr data anyway.
- Convert more directory corruption debugging asserts to actual
EFSCORRUPTED returns instead of blowing up later on.
- Don't fail writeback just because we ran out of memory allocating
metadata log data.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl1RlXoACgkQ+H93GTRK
tOtc7A/+JIidhI/MHQLs7Ab9GW+PsHHBMSbVTV4Ge+SlfZtPNI38zrC1MC5LWvvV
bndOpRjLm4nOJcB7fsoEWufTs1dKOIUjk2yQi8x47ZvE+B/RcA4b6IDhwpbAI8GW
kt1RLNec9kpzhxCFFPzsXT9MwjEvvOvTeXfxaXTmiuB2kbJkR5dTlCUS2nUDnqsG
FGdmOUDjy1uVfFcSrp75KT/iYaqW08cG+uY/eUHRm+YMUKI8hF1t+n8cDnSg96VX
IN2DT1d3dTWiiF+JUZnMhVwJvPgV95DOf+yYy/F7qOcJUEmQ9tD6+0Ml/cI/AeLG
zERxHXM9A9Jy8S+2xkvf0J/+HStwfviWNToK3pbMIM1ZsoMTi9q8VgbB3AaFiijf
C4Q4T3W0jC44om8X/Ta/c+G/64Tj8yenzLDeTHvtQkoq77QPBam/aYjBc79oYvHi
r+R61kHNto+YjJsRbkwgF/S+bzru1qY9Ccr0LJZrUkSzh4d6p94fbQc+NX4L2sv7
WzAc+kOR/7qgVgy4gVr3ju0d89kP/Xn/0e0Ma0V8CSZlX5yg1dMLew5TJq693UYX
xjLGD2ltOoFEN8e7/WXI0/ktvvSCAQalmz+sPgJvTlosUhpGXky85ced1PSrKiEV
l0tREpmawDo9WVvC/06yBj97Op6PDdb4CovDcyLT6Yt3v1aBZT0=
=ivN3
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.3-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
- Fix crashes when the attr fork isn't present due to errors but inode
inactivation tries to zap the attr data anyway.
- Convert more directory corruption debugging asserts to actual
EFSCORRUPTED returns instead of blowing up later on.
- Don't fail writeback just because we ran out of memory allocating
metadata log data.
* tag 'xfs-5.3-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: don't crash on null attr fork xfs_bmapi_read
xfs: remove more ondisk directory corruption asserts
fs: xfs: xfs_log: Don't use KM_MAYFAIL at xfs_log_reserve().
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl1UELcACgkQ+7dXa6fL
C2viWw//eDLvElBjaQsabfpMOVGkf02t+zCzNfKSC0KdM1+GUZ1FyXQ0UAtgwICY
Sp01h/RZa0V07sfYP7R5kL4/KIMdODmhrP0iiHDpoMjKCL7qR9tFbJDAcHtH8xz2
52UV2dmdDBI/wdw/i5dn6M02SoYAQMl1XT49SkzhFSELVchkpraGsf1vf4yITeVe
eI1TaOxI+TUaeH5f6+KWp6c8K8q70p3KfrR2VmCWkBrD7PNg9lp19pVnz8tdofYu
xURHQbJulSqM+mY7pcNBOi2iWy3dCLjBTkVJIwIhZcZqLThACY38SSaPtmdhgif4
wcyyZUtd8EGPzPPqbfCx7ycTIIDtL/r98XtGyiTJBKrCK+flZONdu0g/oIzvJ/Wu
hV4+ButxCuMakbLOe+Hew3lhHFOy7m9XZtOURzxzZSm9uazHDMxnw4ocxIOs24F1
qus1sG0+rlVDcMYjo2tKEAzOl/ZejJ/NUTd60ANIWKTHply2/2/5dH94B0yLwDnp
tfifBrBkyqFB4XUKGvqvvJczl0d7+zsEScs4VQLVO/WhATjj6jNnrYKgwvBS5pCM
890qUzj3TRW7ciZLi0THMEHBlEfbEWhNCaggAqieIvbKv7t4Kh2cUBaIsxo4IYqU
PBZZhFXRul5ocTJrV9pScl4RbzxE5V0j9cwSiiWnzZL1sQucIgQ=
=zivP
-----END PGP SIGNATURE-----
Merge tag 'afs-fixes-20190814' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull afs fixes from David Howells:
- Fix the CB.ProbeUuid handler to generate its reply correctly.
- Fix a mix up in indices when parsing a Volume Location entry record.
- Fix a potential NULL-pointer deref when cleaning up a read request.
- Fix the expected data version of the destination directory in
afs_rename().
- Fix afs_d_revalidate() to only update d_fsdata if it's not the same
as the directory data version to reduce the likelihood of overwriting
the result of a competing operation. (d_fsdata carries the directory
DV or the least-significant word thereof).
- Fix the tracking of the data-version on a directory and make sure
that dentry objects get properly initialised, updated and
revalidated.
Also fix rename to update d_fsdata to match the new directory's DV if
the dentry gets moved over and unhash the dentry to stop
afs_d_revalidate() from interfering.
* tag 'afs-fixes-20190814' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix missing dentry data version updating
afs: Only update d_fsdata if different in afs_d_revalidate()
afs: Fix off-by-one in afs_rename() expected data version calculation
fs: afs: Fix a possible null-pointer dereference in afs_put_read()
afs: Fix loop index mixup in afs_deliver_vl_get_entry_by_name_u()
afs: Fix the CB.ProbeUuid service handler to reply correctly
If you use lseek or similar (e.g. pread) to access a location in a
seq_file file that is within a record, rather than at a record boundary,
then the first read will return the remainder of the record, and the
second read will return the whole of that same record (instead of the
next record). When seeking to a record boundary, the next record is
correctly returned.
This bug was introduced by a recent patch (identified below). Before
that patch, seq_read() would increment m->index when the last of the
buffer was returned (m->count == 0). After that patch, we rely on
->next to increment m->index after filling the buffer - but there was
one place where that didn't happen.
Link: https://lkml.kernel.org/lkml/877e7xl029.fsf@notabene.neil.brown.name/
Fixes: 1f4aace60b ("fs/seq_file.c: simplify seq_file iteration code and interface")
Signed-off-by: NeilBrown <neilb@suse.com>
Reported-by: Sergei Turchanov <turchanov@farpost.com>
Tested-by: Sergei Turchanov <turchanov@farpost.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Markus Elfring <Markus.Elfring@web.de>
Cc: <stable@vger.kernel.org> [4.19+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zorro Lang reported a crash in generic/475 if we try to inactivate a
corrupt inode with a NULL attr fork (stack trace shortened somewhat):
RIP: 0010:xfs_bmapi_read+0x311/0xb00 [xfs]
RSP: 0018:ffff888047f9ed68 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff888047f9f038 RCX: 1ffffffff5f99f51
RDX: 0000000000000002 RSI: 0000000000000008 RDI: 0000000000000012
RBP: ffff888002a41f00 R08: ffffed10005483f0 R09: ffffed10005483ef
R10: ffffed10005483ef R11: ffff888002a41f7f R12: 0000000000000004
R13: ffffe8fff53b5768 R14: 0000000000000005 R15: 0000000000000001
FS: 00007f11d44b5b80(0000) GS:ffff888114200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000ef6000 CR3: 000000002e176003 CR4: 00000000001606e0
Call Trace:
xfs_dabuf_map.constprop.18+0x696/0xe50 [xfs]
xfs_da_read_buf+0xf5/0x2c0 [xfs]
xfs_da3_node_read+0x1d/0x230 [xfs]
xfs_attr_inactive+0x3cc/0x5e0 [xfs]
xfs_inactive+0x4c8/0x5b0 [xfs]
xfs_fs_destroy_inode+0x31b/0x8e0 [xfs]
destroy_inode+0xbc/0x190
xfs_bulkstat_one_int+0xa8c/0x1200 [xfs]
xfs_bulkstat_one+0x16/0x20 [xfs]
xfs_bulkstat+0x6fa/0xf20 [xfs]
xfs_ioc_bulkstat+0x182/0x2b0 [xfs]
xfs_file_ioctl+0xee0/0x12a0 [xfs]
do_vfs_ioctl+0x193/0x1000
ksys_ioctl+0x60/0x90
__x64_sys_ioctl+0x6f/0xb0
do_syscall_64+0x9f/0x4d0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f11d39a3e5b
The "obvious" cause is that the attr ifork is null despite the inode
claiming an attr fork having at least one extent, but it's not so
obvious why we ended up with an inode in that state.
Reported-by: Zorro Lang <zlang@redhat.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204031
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Continue our game of replacing ASSERTs for corrupt ondisk metadata with
EFSCORRUPTED returns.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
- Fix dax_layout_busy_page() to not discard private cow pages of fs/dax
private mappings.
- Update the memremap_pages core to properly cleanup on behalf of
internal reference-count users like device-dax.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJdUGLxAAoJEB7SkWpmfYgC62IP/3aHwBbdedlXves4NQ0QhN5z
3jooOqpayfgkdPZp4U2XnNolsbKDME6h7m7Mn+GzzRSCfNGsdcnLHDWwXt4tTWE8
rvDNPat+22oWSVqnZOTb5GfnZKGmAbk5eC2HI9HT2VPf/BWMIoU6/QhvIzeLCEf7
g72XTEouitGgRk2Cn8Wi3+y+fvbMdur/0qofBH9rxfQEgTWiDtCJvxZ5KyH18hlt
qbvJR0CG2mxIxEbM1qx/1/HysXgs3UeTJVzHioF5SLdGcyQP14Djp0MCuqTGiB6l
aEgMYSJcca7nilJNMcc2gNEsNNuga6UDWaF52FJuAoy+3vs837iexP6L+VbthqTT
70vAvEOnDyzgKe/jll8INjzSc+RDCbMXoFdSmGXTt9KHVMbZ+taGOeEIaY+UkYKk
g1BAWiZZEedJZXZmltGPDOXuPdzmK1uMR13gjz/FS298ffmznsqfEAKkSuzxWsKH
vheQnQjbEQoRNk4uI/mjxz/XYCEgnVqXX/9OlQ3WrJHWdOtBLEfykAem2RqeDGc7
QF3EZ+IGCD4xPRJWbWXtJdUK7EtGTPrzyiKHEPnXsj8xYDo6oAYK+erychiSI7y5
9htSyCWNZ5ccE8hqewKD5qN1XNqX6XmmmxgMX4i2w9fSdWFgzAkWPw/xTvw6BkfC
Bn5b/9LcFIfTX9/oAlwh
=Se2X
-----END PGP SIGNATURE-----
Merge tag 'dax-fixes-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull dax fixes from Dan Williams:
"A filesystem-dax and device-dax fix for v5.3.
The filesystem-dax fix is tagged for stable as the implementation has
been mistakenly throwing away all cow pages on any truncate or hole
punch operation as part of the solution to coordinate device-dma vs
truncate to dax pages.
The device-dax change fixes up a regression this cycle from the
introduction of a common 'internal per-cpu-ref' implementation.
Summary:
- Fix dax_layout_busy_page() to not discard private cow pages of
fs/dax private mappings.
- Update the memremap_pages core to properly cleanup on behalf of
internal reference-count users like device-dax"
* tag 'dax-fixes-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
mm/memremap: Fix reuse of pgmap instances with internal references
dax: dax_layout_busy_page() should not unmap cow pages
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl1NmNIQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpouEEADA1e2HH9AO8QGWBVSeFi5ZdRnEXsJUfNGD
d576M9empGI/hkbui0yY2tuodnMwmhUj641GTRC2dzl7RRyGcTMmqtGYPcyczlpU
+6Yil3+XVNYTXRpUsKWs32H9aubNZl/L3rCcJImGvMgLW5YtEjAZJFIFIzXWWwDJ
aZpTtnOH1+D3HH6HT35xk+aytSYZ7LsZ7X3LI9ZumKOUd2HJZGUkfWLXxgSuuUh/
/WaBEQ9xzDNARmfx9Qb/2wSAE7XOInupPr86fI9dnmXHZ8rwhsvHRIZEvNIIqF5Z
KzbF+rGJZ2bizKFpVFdlrIIyfBleFlQGFzYnGrs/+47zAl/CGicqmuAhGcOdGQXd
wyj6G66iBcBIQC9hsYhPtglyQSk95tzsPZLZa7/1TlhEKl3mpor4w30NrLz/P5oy
gdIivDhKP7aRFzBWw/2O4TOn3HhGxnZWVni6icOHE/pBQLBW12Ulc1nT0SiJ8RAt
PYhMCFwz0Xc7pYuEfGZKwNSCIrx4i7f+spWEAt0Fget/KaB993zm+K4OzGbfBv6r
FrZ5g/2MTVBJITmJVN2LDysF4EVEOzTULX+bCQOOWC7wmE/poCyNWj8gCpunIZsY
V5DAB+cbEw/OLfpdiogXvCcxrbukKHV8AVdMdLYB5yEAuPN0JMqcunMjECxpNAha
1wsv86ljvw==
=Au30
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20190809' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Revert of a bcache patch that caused an oops for some (Coly)
- ata rb532 unused warning fix (Gustavo)
- AoE kernel crash fix (He)
- Error handling fixup for blkdev_get() (Jan)
- libata read/write translation and SFF PIO fix (me)
- Use after free and error handling fix for O_DIRECT fragments. There's
still a nowait + sync oddity in there, we'll nail that start next
week. If all else fails, I'll queue a revert of the NOWAIT change.
(me)
- Loop GFP_KERNEL -> GFP_NOIO deadlock fix (Mikulas)
- Two BFQ regression fixes that caused crashes (Paolo)
* tag 'for-linus-20190809' of git://git.kernel.dk/linux-block:
bcache: Revert "bcache: use sysfs_match_string() instead of __sysfs_match_string()"
loop: set PF_MEMALLOC_NOIO for the worker thread
bdev: Fixup error handling in blkdev_get()
block, bfq: handle NULL return value by bfq_init_rq()
block, bfq: move update of waker and woken list to queue freeing
block, bfq: reset last_completed_rq_bfqq if the pointed queue is freed
block: aoe: Fix kernel crash due to atomic sleep when exiting
libata: add SG safety checks in SFF pio transfers
libata: have ata_scsi_rw_xlat() fail invalid passthrough requests
block: fix O_DIRECT error handling for bio fragments
ata: rb532_cf: Fix unused variable warning in rb532_pata_driver_probe
It turns out that the current version of gfs2_metadata_walker suffers
from multiple problems that can cause gfs2_hole_size to report an
incorrect size. This will confuse fiemap as well as lseek with the
SEEK_DATA flag.
Fix that by changing gfs2_hole_walker to compute the metapath to the
first data block after the hole (if any), and compute the hole size
based on that.
Fixes xfstest generic/490.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Cc: stable@vger.kernel.org # v4.18+
Highlights include:
Stable fixes:
- NFSv4: Ensure we check the return value of update_open_stateid() so we
correctly track active open state.
- NFSv4: Fix for delegation state recovery to ensure we recover all open
modes that are active.
- NFSv4: Fix an Oops in nfs4_do_setattr
Bugfixes:
- NFS: Fix regression whereby fscache errors are appearing on 'nofsc' mounts
- NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()
- NFSv4: Fix a credential refcount leak in nfs41_check_delegation_stateid
- pNFS: Report errors from the call to nfs4_select_rw_stateid()
- NFSv4: Various other delegation and open stateid recovery fixes
- NFSv4: Fix state recovery behaviour when server connection times out
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl1LKhEACgkQZwvnipYK
APIAuw/9HnKwnJYKvkAv/Pg2eBQZAgwjchc/uPsfteSPr8PMFS889rsqvDoGrAI4
VjZRh7Jsp/FPAlLzZKCnLF/fxKE83qxgS3MP14of9IoRv2gznsW7jexy48AhU/5t
Ae4Wgu3GJJ0IIrr8hbrkJRkBUoYUMLguCNNaZC7LDLzEVQ0wNDAVpdsZ+gdnCcrw
zhnFnz72p2h95tfL5QkJ+OYrAu4ikdlSjx2oOdLsUGFEAnTehpUPd3DPDiCQbctx
zPHSGukj+8tsPJ+EUVuj7ouDJqTDyMFVe1eKRJWMIq22bUAM1GBtVVMw8uFXJi5i
9WFUJIezHhkh3Hdx82ptUmt3u1hRuSolaKDICeIR2Kob0gUArqk0KR7upgVMS3Fn
INm/c4Zsqsa1ABevQTLWqz+nVPUPRFGmEZfjvBwkmYlkKnqbjWxXQRkROt8UJS3O
3vfK1hUEIUyt4uI2yHusru5nIQ3pv/h1WAwpfuSQFw+nEvC6YcssECz8uOhKEnEr
UWnUxRP66YVL4L+VAsajzAArBDQ8cUU3bv6q0x2IEWA8CHHy0BH1MGM6cumVW8YQ
rwj0KqR4+aH0u5g8EpOkODAJfuxo10oJi6MNlr2+OSn60tdmi/P1K8eg4ERODqya
vGqgftjTfmViYfoaWpt2NDqAbKOu4wXovb6jWqbP2lUsh+ttojg=
=IXqc
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-5.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client fixes from Trond Myklebust:
"Highlights include:
Stable fixes:
- NFSv4: Ensure we check the return value of update_open_stateid() so
we correctly track active open state.
- NFSv4: Fix for delegation state recovery to ensure we recover all
open modes that are active.
- NFSv4: Fix an Oops in nfs4_do_setattr
Fixes:
- NFS: Fix regression whereby fscache errors are appearing on 'nofsc'
mounts
- NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()
- NFSv4: Fix a credential refcount leak in nfs41_check_delegation_stateid
- pNFS: Report errors from the call to nfs4_select_rw_stateid()
- NFSv4: Various other delegation and open stateid recovery fixes
- NFSv4: Fix state recovery behaviour when server connection times
out"
* tag 'nfs-for-5.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Ensure state recovery handles ETIMEDOUT correctly
NFS: Fix regression whereby fscache errors are appearing on 'nofsc' mounts
NFSv4: Fix an Oops in nfs4_do_setattr
NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()
NFSv4: Check the return value of update_open_stateid()
NFSv4.1: Only reap expired delegations
NFSv4.1: Fix open stateid recovery
NFSv4: Report the error from nfs4_select_rw_stateid()
NFSv4: When recovering state fails with EAGAIN, retry the same recovery
NFSv4: Print an error in the syslog when state is marked as irrecoverable
NFSv4: Fix delegation state recovery
NFSv4: Fix a credential refcount leak in nfs41_check_delegation_stateid
Pull cifs fixes from Steve French:
"Six small SMB3 fixes, two for stable"
* tag '5.3-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
SMB3: Kernel oops mounting a encryptData share with CONFIG_DEBUG_VIRTUAL
smb3: update TODO list of missing features
smb3: send CAP_DFS capability during session setup
SMB3: Fix potential memory leak when processing compound chain
SMB3: Fix deadlock in validate negotiate hits reconnect
cifs: fix rmmod regression in cifs.ko caused by force_sig changes
Commit 89e524c04f ("loop: Fix mount(2) failure due to race with
LOOP_SET_FD") converted blkdev_get() to use the new helpers for
finishing claiming of a block device. However the conversion botched the
error handling in blkdev_get() and thus the bdev has been marked as held
even in case __blkdev_get() returned error. This led to occasional
warnings with block/001 test from blktests like:
kernel: WARNING: CPU: 5 PID: 907 at fs/block_dev.c:1899 __blkdev_put+0x396/0x3a0
Correct the error handling.
CC: stable@vger.kernel.org
Fixes: 89e524c04f ("loop: Fix mount(2) failure due to race with LOOP_SET_FD")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
0eb6ddfb86 tried to fix this up, but introduced a use-after-free
of dio. Additionally, we still had an issue with error handling,
as reported by Darrick:
"I noticed a regression in xfs/747 (an unreleased xfstest for the
xfs_scrub media scanning feature) on 5.3-rc3. I'll condense that down
to a simpler reproducer:
error-test: 0 209 linear 8:48 0
error-test: 209 1 error
error-test: 210 6446894 linear 8:48 210
Basically we have a ~3G /dev/sdd and we set up device mapper to fail IO
for sector 209 and to pass the io to the scsi device everywhere else.
On 5.3-rc3, performing a directio pread of this range with a < 1M buffer
(in other words, a request for fewer than MAX_BIO_PAGES bytes) yields
EIO like you'd expect:
pread64(3, 0x7f880e1c7000, 1048576, 0) = -1 EIO (Input/output error)
pread: Input/output error
+++ exited with 0 +++
But doing it with a larger buffer succeeds(!):
pread64(3, "XFSB\0\0\20\0\0\0\0\0\0\fL\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1146880, 0) = 1146880
read 1146880/1146880 bytes at offset 0
1 MiB, 1 ops; 0.0009 sec (1.124 GiB/sec and 1052.6316 ops/sec)
+++ exited with 0 +++
(Note that the part of the buffer corresponding to the dm-error area is
uninitialized)
On 5.3-rc2, both commands would fail with EIO like you'd expect. The
only change between rc2 and rc3 is commit 0eb6ddfb86 ("block: Fix
__blkdev_direct_IO() for bio fragments").
AFAICT we end up in __blkdev_direct_IO with a 1120K buffer, which gets
split into two bios: one for the first BIO_MAX_PAGES worth of data (1MB)
and a second one for the 96k after that."
Fix this by noting that it's always safe to dereference dio if we get
BLK_QC_T_EAGAIN returned, as end_io hasn't been run for that case. So
we can safely increment the dio size before calling submit_bio(), and
then decrement it on failure (not that it really matters, as the bio
and dio are going away).
For error handling, return to the original method of just using 'ret'
for tracking the error, and the size tracking in dio->size.
Fixes: 0eb6ddfb86 ("block: Fix __blkdev_direct_IO() for bio fragments")
Fixes: 6a43074e2f ("block: properly handle IOCB_NOWAIT for async O_DIRECT IO")
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ensure that the state recovery code handles ETIMEDOUT correctly,
and also that we set RPC_TASK_TIMEOUT when recovering open state.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Pull networking fixes from David Miller:
"Yeah I should have sent a pull request last week, so there is a lot
more here than usual:
1) Fix memory leak in ebtables compat code, from Wenwen Wang.
2) Several kTLS bug fixes from Jakub Kicinski (circular close on
disconnect etc.)
3) Force slave speed check on link state recovery in bonding 802.3ad
mode, from Thomas Falcon.
4) Clear RX descriptor bits before assigning buffers to them in
stmmac, from Jose Abreu.
5) Several missing of_node_put() calls, mostly wrt. for_each_*() OF
loops, from Nishka Dasgupta.
6) Double kfree_skb() in peak_usb can driver, from Stephane Grosjean.
7) Need to hold sock across skb->destructor invocation, from Cong
Wang.
8) IP header length needs to be validated in ipip tunnel xmit, from
Haishuang Yan.
9) Use after free in ip6 tunnel driver, also from Haishuang Yan.
10) Do not use MSI interrupts on r8169 chips before RTL8168d, from
Heiner Kallweit.
11) Upon bridge device init failure, we need to delete the local fdb.
From Nikolay Aleksandrov.
12) Handle erros from of_get_mac_address() properly in stmmac, from
Martin Blumenstingl.
13) Handle concurrent rename vs. dump in netfilter ipset, from Jozsef
Kadlecsik.
14) Setting NETIF_F_LLTX on mac80211 causes complete breakage with
some devices, so revert. From Johannes Berg.
15) Fix deadlock in rxrpc, from David Howells.
16) Fix Kconfig deps of enetc driver, we must have PHYLIB. From Yue
Haibing.
17) Fix mvpp2 crash on module removal, from Matteo Croce.
18) Fix race in genphy_update_link, from Heiner Kallweit.
19) bpf_xdp_adjust_head() stopped working with generic XDP when we
fixes generic XDP to support stacked devices properly, fix from
Jesper Dangaard Brouer.
20) Unbalanced RCU locking in rt6_update_exception_stamp_rt(), from
David Ahern.
21) Several memory leaks in new sja1105 driver, from Vladimir Oltean"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (214 commits)
net: dsa: sja1105: Fix memory leak on meta state machine error path
net: dsa: sja1105: Fix memory leak on meta state machine normal path
net: dsa: sja1105: Really fix panic on unregistering PTP clock
net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as well
net: dsa: sja1105: Fix broken learning with vlan_filtering disabled
net: dsa: qca8k: Add of_node_put() in qca8k_setup_mdio_bus()
net: sched: sample: allow accessing psample_group with rtnl
net: sched: police: allow accessing police->params with rtnl
net: hisilicon: Fix dma_map_single failed on arm64
net: hisilicon: fix hip04-xmit never return TX_BUSY
net: hisilicon: make hip04_tx_reclaim non-reentrant
tc-testing: updated vlan action tests with batch create/delete
net sched: update vlan action for batched events operations
net: stmmac: tc: Do not return a fragment entry
net: stmmac: Fix issues when number of Queues >= 4
net: stmmac: xgmac: Fix XGMAC selftests
be2net: disable bh with spin_lock in be_process_mcc
net: cxgb3_main: Fix a resource leak in a error path in 'init_one()'
net: ethernet: sun4i-emac: Support phy-handle property for finding PHYs
net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
...
Fix kernel oops when mounting a encryptData CIFS share with
CONFIG_DEBUG_VIRTUAL
Signed-off-by: Sebastien Tisserant <stisserant@wallix.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
We had a report of a server which did not do a DFS referral
because the session setup Capabilities field was set to 0
(unlike negotiate protocol where we set CAP_DFS). Better to
send it session setup in the capabilities as well (this also
more closely matches Windows client behavior).
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
When a reconnect happens in the middle of processing a compound chain
the code leaks a buffer from the memory pool. Fix this by properly
checking for a return code and freeing buffers in case of error.
Also maintain a buf variable to be equal to either smallbuf or bigbuf
depending on a response buffer size while parsing a chain and when
returning to the caller.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Currently we skip SMB2_TREE_CONNECT command when checking during
reconnect because Tree Connect happens when establishing
an SMB session. For SMB 3.0 protocol version the code also calls
validate negotiate which results in SMB2_IOCL command being sent
over the wire. This may deadlock on trying to acquire a mutex when
checking for reconnect. Fix this by skipping SMB2_IOCL command
when doing the reconnect check.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
Vivek:
"As of now dax_layout_busy_page() calls unmap_mapping_range() with last
argument as 1, which says even unmap cow pages. I am wondering who needs
to get rid of cow pages as well.
I noticed one interesting side affect of this. I mount xfs with -o dax and
mmaped a file with MAP_PRIVATE and wrote some data to a page which created
cow page. Then I called fallocate() on that file to zero a page of file.
fallocate() called dax_layout_busy_page() which unmapped cow pages as well
and then I tried to read back the data I wrote and what I get is old
data from persistent memory. I lost the data I had written. This
read basically resulted in new fault and read back the data from
persistent memory.
This sounds wrong. Are there any users which need to unmap cow pages
as well? If not, I am proposing changing it to not unmap cow pages.
I noticed this while while writing virtio_fs code where when I tried
to reclaim a memory range and that corrupted the executable and I
was running from virtio-fs and program got segment violation."
Dan:
"In fact the unmap_mapping_range() in this path is only to synchronize
against get_user_pages_fast() and force it to call back into the
filesystem to re-establish the mapping. COW pages should be left
untouched by dax_layout_busy_page()."
Cc: <stable@vger.kernel.org>
Fixes: 5fac7408d8 ("mm, fs, dax: handle layout changes to pinned dax mappings")
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Link: https://lore.kernel.org/r/20190802192956.GA3032@redhat.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Fixes: 72abe3bcf0 ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig")
The global change from force_sig caused module unloading of cifs.ko
to fail (since the cifsd process could not be killed, "rmmod cifs"
now would always fail)
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Eric W. Biederman <ebiederm@xmission.com>
People are reporing seeing fscache errors being reported concerning
duplicate cookies even in cases where they are not setting up fscache
at all. The rule needs to be that if fscache is not enabled, then it
should have no side effects at all.
To ensure this is the case, we disable fscache completely on all superblocks
for which the 'fsc' mount option was not set. In order to avoid issues
with '-oremount', we also disable the ability to turn fscache on via
remount.
Fixes: f1fe29b4a0 ("NFS: Use i_writecount to control whether...")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200145
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Steve Dickson <steved@redhat.com>
Cc: David Howells <dhowells@redhat.com>