Commit Graph

61950 Commits

Author SHA1 Message Date
Al Viro
ab88dca311 nfs: get rid of mount_info ->fill_super()
The only possible values are nfs_fill_super and nfs_clone_super.  The
latter is used only when crossing into a submount and it is almost
identical to the former; the only differences are
	* ->s_time_gran unconditionally set to 1 (even for v2 mounts).
Regression dating back to 2012, actually.
	* ->s_blocksize/->s_blocksize_bits set to that of parent.

Rather than messing with the method, stash ->s_blocksize_bits in
mount_info in submount case and after the (now unconditional)
call of nfs_fill_super() override ->s_blocksize/->s_blocksize_bits
if that has been set.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
0c38f2131d nfs: don't pass nfs_subversion to ->create_server()
pick it from mount_info

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
1bc3a2cbf2 nfs: unexport nfs_fs_mount_common()
Make it static, even.  And remove a stale extern of (long-gone)
nfs_xdev_mount_common() from internal.h, while we are at it.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
82eaed2bee nfs: merge xdev and remote file_system_type
they are identical now...

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
a55d3297be nfs: don't bother passing nfs_subversion to ->try_mount() and nfs_fs_mount_common()
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
6a3f7a399e nfs: stash nfs_subversion reference into nfs_mount_info
That will allow to get rid of passing those references around in
quite a few places.  Moreover, that will allow to merge xdev and
remote file_system_type.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
250d69f6a4 nfs: lift setting mount_info from nfs_xdev_mount()
Do it in nfs_do_submount() instead.  As a side benefit, nfs_clone_data
doesn't need ->fh and ->fattr anymore.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
4e357761bd nfs4: fold nfs_do_root_mount/nfs_follow_remote_path
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
6654f8e246 nfs: don't bother setting/restoring export_path around do_nfs_root_mount()
nothing in it will be looking at that thing anyway

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
15a9c4eff6 nfs: fold nfs4_remote_fs_type and nfs4_remote_referral_fs_type
They are identical now.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
7643c12e95 nfs: lift setting mount_info from nfs4_remote{,_referral}_mount
Do that (fhandle allocation, setting struct server up) in
nfs4_referral_mount() and nfs4_try_mount() resp. and pass the
server and pointer to mount_info into nfs_do_root_mount() so that
nfs4_remote_referral_mount()/nfs_remote_mount() could be merged.

Since we are moving stuff from ->mount() instances to the points
prior to vfs_kern_mount() that would trigger those, we need to
make sure that do_nfs_root_mount() will do the corresponding
cleanup itself if it doesn't trigger those ->mount() instances.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
d0b779d47c nfs: stash server into struct nfs_mount_info
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Al Viro
444a52960c saner calling conventions for nfs_fs_mount_common()
Allow it to take ERR_PTR() for server and return ERR_CAST() of it in
such case.  All callers used to open-code that...

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Linus Torvalds
e033e7d4a8 Merge branch 'dhowells' (patches from DavidH)
Merge misc fixes from David Howells.

Two afs fixes and a key refcounting fix.

* dhowells:
  afs: Fix afs_lookup() to not clobber the version on a new dentry
  afs: Fix use-after-loss-of-ref
  keys: Fix request_key() cache
2020-01-14 09:56:31 -08:00
David Howells
f52b83b0b1 afs: Fix afs_lookup() to not clobber the version on a new dentry
Fix afs_lookup() to not clobber the version set on a new dentry by
afs_do_lookup() - especially as it's using the wrong version of the
version (we need to use the one given to us by whatever op the dir
contents correspond to rather than what's in the afs_vnode).

Fixes: 9dd0b82ef5 ("afs: Fix missing dentry data version updating")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-14 09:40:06 -08:00
David Howells
40a708bd62 afs: Fix use-after-loss-of-ref
afs_lookup() has a tracepoint to indicate the outcome of
d_splice_alias(), passing it the inode to retrieve the fid from.
However, the function gave up its ref on that inode when it called
d_splice_alias(), which may have failed and dropped the inode.

Fix this by caching the fid.

Fixes: 80548b0399 ("afs: Add more tracepoints")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-14 09:40:06 -08:00
Linus Torvalds
9fb7007de8 Char/Misc patch for 5.5-rc6
Here is a single fix, for the chrdev core, for 5.5-rc6
 
 There's been a long-standing race condition triggered by syzbot, and
 occasionally real people, in the chrdev open() path.  Will finally took
 the time to track it down and fix it for real before the holidays.
 
 Here's that one patch, it's been in linux-next for a while with no
 reported issues and it does fix the reported problem.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXhjcRA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykIyQCfcrNOyyFktEj7/qiVJrMLbzVWoWYAoMHtNQcG
 3IYmNNJ+eXXJEiOgeZ4J
 =J0bS
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc fix from Greg KH:
 "Here is a single fix, for the chrdev core, for 5.5-rc6

  There's been a long-standing race condition triggered by syzbot, and
  occasionally real people, in the chrdev open() path. Will finally took
  the time to track it down and fix it for real before the holidays.

  Here's that one patch, it's been in linux-next for a while with no
  reported issues and it does fix the reported problem"

* tag 'char-misc-5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  chardev: Avoid potential use-after-free in 'chrdev_open()'
2020-01-10 13:25:24 -08:00
Linus Torvalds
4e4cd21c64 block-5.5-2020-01-10
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl4YvdoQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpoOuD/0eZkvtim/ZyEzj081PZUkslWwjNvEZV9+o
 iYWMp0PBDyYgR79ca86EVTevcMiVxPnKpQl5DT+p9L1JzZ7dFc8U7fTpygjwbzx0
 FdHlPQt+oN4TsGl3hpTGGnw2ArbCHnqqj31ahgo87zo0a01xv33C3QeGFyXEIYoR
 F0QQ5E7EAyT2umKKflX9PWnrbOQZ91p2P3m+AE0TtOXMUgTi2KJKHUFu+G5OOwZB
 dM41GvyZDY9WA7bUlFeOp0mRZsiGkfEsI59VP6AR8ZkxwsOeHLrVB5iBEGPiTDL2
 dUwLwbGrLYFtwLEh4yd0aKt9++H2RZjJwi4ssyaDkkWCMQHECQXwd34DBmrV/qia
 hgh/4DV0E1X3MZFYOk44zp8kwjgpmU9MCH3dFU0bWnzm9WrvtS9uBDjgEDkn6zty
 xONSQeyHWVFQFwIjG260YEbuTplOTFP5rNWEf2CHWMHuk9kp8kfATWt9wlazYhtz
 OUELfWmkrGk8nqMN4Ee+ty582I8gxk48IGwiJHOYh1gMHHgFnJbgr97Pe2NCLLee
 9elkJnUQSdXuF314uznrAf7XLiEC0hfHGnPCTD8pAMf2DYOGKNeivh2wtdhd98cu
 AvHew9qnI2C/oahY+DaE4wunP4VlNZ/ZeNAl0h8KG7uNBCA4uFtyBMNimnKVNblw
 7KDsQ3vDIg==
 =1bsG
 -----END PGP SIGNATURE-----

Merge tag 'block-5.5-2020-01-10' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A few fixes that should go into this round.

  This pull request contains two NVMe fixes via Keith, removal of a dead
  function, and a fix for the bio op for read truncates (Ming)"

* tag 'block-5.5-2020-01-10' of git://git.kernel.dk/linux-block:
  nvmet: fix per feat data len for get_feature
  nvme: Translate more status codes to blk_status_t
  fs: move guard_bio_eod() after bio_set_op_attrs
  block: remove unused mp_bvec_last_segment
2020-01-10 12:05:26 -08:00
Linus Torvalds
30b6487d15 io_uring-5.5-2020-01-10
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl4YvQsQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpkf2D/9nY2CFUfVtYKv2nNpFTXSnTIT2r9WZEXfI
 vD75OF/w3RxRUJHyJNKw8YIP8fRDmpqzvryaPjb33o1nXwu2ioKe1Yrd63vbXd41
 vr6X+jAp7iieO1PoG/QFb7p7ZBuhnFFKor/1scoD+4SQEOTtmwroeL6ECw1LdQGA
 A8F5nQzHVzrfhXNconPlXuXKVpoErP4AQ4ls5YmQI3AugK4XEYeRpRuGWuecdjJM
 In3TJ47heNPOCU1Eo7MO7lkFBR+CPi4uqsdFgsYJl7F1KiuwHL0e7qsWvmUMxz+9
 Olq0hAVFAXLY6J8uEz+dBKN7Xu0zdsG2ILfOJrDUjbGQ+/aPb++08W162wxg1HlP
 +WKjpZtQpEQ3lUwk/e2fplSTC9bCA/90h/tmwjL3Olbl9+H5mxnovTevcxDX+L6x
 sBfHRlDPo8r1r71jXOjCKSYYLwEBHGmAA1DTXQt11386oIwP4RFXaregbkxRxvAu
 qqaChgnm9wb6wyApxYnU8r14qtRwN7lpK0STJB+krqoMywKpeJPirjoTDgMeovgf
 1VYDDHz/PYkSWo+nRQWtXPK0k6sk2K2YnM5t4O0LZC2Pskc6C7V5PT7ZUyTw/Hfi
 7IbWu5FCzsgCwC4UEhCcd9kRtFFQceZGW/SA1+VU98EyVUKrewXmq0NpmSiNfscV
 0cdAHuVHhw==
 =TzX1
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.5-2020-01-10' of git://git.kernel.dk/linux-block

Pull io_uring fix from Jens Axboe:
 "Single fix for this series, fixing a regression with the short read
  handling.

  This just removes it, as it cannot safely be done for all cases"

* tag 'io_uring-5.5-2020-01-10' of git://git.kernel.dk/linux-block:
  io_uring: remove punt of short reads to async context
2020-01-10 12:03:12 -08:00
Linus Torvalds
bef1d88263 pstore fix for rare error path
- Fix label allocation lifetime/visibility to avoid further mistakes
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl4YAM8WHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJl9mD/9il1pBZmLTQFVzfLeVnDemQ1t1
 zhPywuR/FzE6KXX4kqaRJZ2ttLsHFaGqttbegQ5Jh8g5SdHZgMWRIz9y9Mb/re4g
 5ny19VH4dvJHY6CYQxl3SlL+uSx29dNQOC4XGRtVwLrs6XeCIoWhXw1KI6ahiuVW
 AK1lelPlLIdVOLrvDn3qHy6iQBXgEA0PSosxUEjxC43ekoIRwKwsMKCeT3awHrOY
 uTamIU8vXwqCl+eFsgoQBoq8dgBvNI121DbKB2rToC2Pb3cE84Ez2Ul7hrqDQqzo
 0wLPYJhvzMDqCaRwc5a42PqYFmRxSZrNHVbAnA9n/gEFPluJTklsq8fOEkXYml16
 BKiC96PcKYfJZdEePbNx3tQVtXYj/mlncLKb8FxIlybyC+P92XGlHWDkrXyva8uY
 6XyTyGpkEN1zwZnA+R6otvaOX8+I5XuX4vsRRlTaDd+1Y6SY7HxvMjxFuTko0yfY
 y/X0rfrhzvtrXUCoC+bkaA73EwaSXpie+8goFWQz9Erjpl7o0lH4lCIqm7J0hynM
 0kmlusSEfqNQ8HBnbP25BVjW4Ws/ASrr5yGDNTigsaYeCdHkwYy+Ses50I9gMWuB
 oNz7jUblC0rPlO1geBQTFpec2u+9sS7Y+xhYhh+h8GLwCX09zOB8gUTE+GBgVhsY
 ptjh+yD7ybBiXrX8jQ==
 =biFc
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore fix from Kees Cook:
 "Cengiz Can forwarded a Coverity report about more problems with a rare
  pstore initialization error path, so the allocation lifetime was
  rearranged to avoid needing to share the kfree() responsibilities
  between caller and callee"

* tag 'pstore-v5.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore/ram: Regularize prz label allocation lifetime
2020-01-09 21:42:05 -08:00
Ming Lei
83c9c54716 fs: move guard_bio_eod() after bio_set_op_attrs
Commit 85a8ce62c2 ("block: add bio_truncate to fix guard_bio_eod")
adds bio_truncate() for handling bio EOD. However, bio_truncate()
doesn't use the passed 'op' parameter from guard_bio_eod's callers.

So bio_trunacate() may retrieve wrong 'op', and zering pages may
not be done for READ bio.

Fixes this issue by moving guard_bio_eod() after bio_set_op_attrs()
in submit_bh_wbc() so that bio_truncate() can always retrieve correct
op info.

Meantime remove the 'op' parameter from guard_bio_eod() because it isn't
used any more.

Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Fixes: 85a8ce62c2 ("block: add bio_truncate to fix guard_bio_eod")
Signed-off-by: Ming Lei <ming.lei@redhat.com>

Fold in kerneldoc and bio_op() change.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-09 08:16:12 -07:00
Kees Cook
e163fdb3f7 pstore/ram: Regularize prz label allocation lifetime
In my attempt to fix a memory leak, I introduced a double-free in the
pstore error path. Instead of trying to manage the allocation lifetime
between persistent_ram_new() and its callers, adjust the logic so
persistent_ram_new() always takes a kstrdup() copy, and leaves the
caller's allocation lifetime up to the caller. Therefore callers are
_always_ responsible for freeing their label. Before, it only needed
freeing when the prz itself failed to allocate, and not in any of the
other prz failure cases, which callers would have no visibility into,
which is the root design problem that lead to both the leak and now
double-free bugs.

Reported-by: Cengiz Can <cengiz@kernel.wtf>
Link: https://lore.kernel.org/lkml/d4ec59002ede4aaf9928c7f7526da87c@kernel.wtf
Fixes: 8df955a32a ("pstore/ram: Fix error-path memory leak in persistent_ram_new() callers")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-01-08 17:05:45 -08:00
Jens Axboe
eacc6dfaea io_uring: remove punt of short reads to async context
We currently punt any short read on a regular file to async context,
but this fails if the short read is due to running into EOF. This is
especially problematic since we only do the single prep for commands
now, as we don't reset kiocb->ki_pos. This can result in a 4k read on
a 1k file returning zero, as we detect the short read and then retry
from async context. At the time of retry, the position is now 1k, and
we end up reading nothing, and hence return 0.

Instead of trying to patch around the fact that short reads can be
legitimate and won't succeed in case of retry, remove the logic to punt
a short read to async context. Simply return it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-07 13:08:56 -07:00
Will Deacon
68faa679b8 chardev: Avoid potential use-after-free in 'chrdev_open()'
'chrdev_open()' calls 'cdev_get()' to obtain a reference to the
'struct cdev *' stashed in the 'i_cdev' field of the target inode
structure. If the pointer is NULL, then it is initialised lazily by
looking up the kobject in the 'cdev_map' and so the whole procedure is
protected by the 'cdev_lock' spinlock to serialise initialisation of
the shared pointer.

Unfortunately, it is possible for the initialising thread to fail *after*
installing the new pointer, for example if the subsequent '->open()' call
on the file fails. In this case, 'cdev_put()' is called, the reference
count on the kobject is dropped and, if nobody else has taken a reference,
the release function is called which finally clears 'inode->i_cdev' from
'cdev_purge()' before potentially freeing the object. The problem here
is that a racing thread can happily take the 'cdev_lock' and see the
non-NULL pointer in the inode, which can result in a refcount increment
from zero and a warning:

  |  ------------[ cut here ]------------
  |  refcount_t: addition on 0; use-after-free.
  |  WARNING: CPU: 2 PID: 6385 at lib/refcount.c:25 refcount_warn_saturate+0x6d/0xf0
  |  Modules linked in:
  |  CPU: 2 PID: 6385 Comm: repro Not tainted 5.5.0-rc2+ #22
  |  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
  |  RIP: 0010:refcount_warn_saturate+0x6d/0xf0
  |  Code: 05 55 9a 15 01 01 e8 9d aa c8 ff 0f 0b c3 80 3d 45 9a 15 01 00 75 ce 48 c7 c7 00 9c 62 b3 c6 08
  |  RSP: 0018:ffffb524c1b9bc70 EFLAGS: 00010282
  |  RAX: 0000000000000000 RBX: ffff9e9da1f71390 RCX: 0000000000000000
  |  RDX: ffff9e9dbbd27618 RSI: ffff9e9dbbd18798 RDI: ffff9e9dbbd18798
  |  RBP: 0000000000000000 R08: 000000000000095f R09: 0000000000000039
  |  R10: 0000000000000000 R11: ffffb524c1b9bb20 R12: ffff9e9da1e8c700
  |  R13: ffffffffb25ee8b0 R14: 0000000000000000 R15: ffff9e9da1e8c700
  |  FS:  00007f3b87d26700(0000) GS:ffff9e9dbbd00000(0000) knlGS:0000000000000000
  |  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  |  CR2: 00007fc16909c000 CR3: 000000012df9c000 CR4: 00000000000006e0
  |  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  |  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  |  Call Trace:
  |   kobject_get+0x5c/0x60
  |   cdev_get+0x2b/0x60
  |   chrdev_open+0x55/0x220
  |   ? cdev_put.part.3+0x20/0x20
  |   do_dentry_open+0x13a/0x390
  |   path_openat+0x2c8/0x1470
  |   do_filp_open+0x93/0x100
  |   ? selinux_file_ioctl+0x17f/0x220
  |   do_sys_open+0x186/0x220
  |   do_syscall_64+0x48/0x150
  |   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  |  RIP: 0033:0x7f3b87efcd0e
  |  Code: 89 54 24 08 e8 a3 f4 ff ff 8b 74 24 0c 48 8b 3c 24 41 89 c0 44 8b 54 24 08 b8 01 01 00 00 89 f4
  |  RSP: 002b:00007f3b87d259f0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
  |  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3b87efcd0e
  |  RDX: 0000000000000000 RSI: 00007f3b87d25a80 RDI: 00000000ffffff9c
  |  RBP: 00007f3b87d25e90 R08: 0000000000000000 R09: 0000000000000000
  |  R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffe188f504e
  |  R13: 00007ffe188f504f R14: 00007f3b87d26700 R15: 0000000000000000
  |  ---[ end trace 24f53ca58db8180a ]---

Since 'cdev_get()' can already fail to obtain a reference, simply move
it over to use 'kobject_get_unless_zero()' instead of 'kobject_get()',
which will cause the racing thread to return -ENXIO if the initialising
thread fails unexpectedly.

Cc: Hillf Danton <hdanton@sina.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: syzbot+82defefbbd8527e1c2cb@syzkaller.appspotmail.com
Signed-off-by: Will Deacon <will@kernel.org>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191219120203.32691-1-will@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-06 20:10:26 +01:00
Gang He
b73eba2a86 ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less
Because ocfs2_get_dlm_debug() function is called once less here, ocfs2
file system will trigger the system crash, usually after ocfs2 file
system is unmounted.

This system crash is caused by a generic memory corruption, these crash
backtraces are not always the same, for exapmle,

    ocfs2: Unmounting device (253,16) on (node 172167785)
    general protection fault: 0000 [#1] SMP PTI
    CPU: 3 PID: 14107 Comm: fence_legacy Kdump:
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
    RIP: 0010:__kmalloc+0xa5/0x2a0
    Code: 00 00 4d 8b 07 65 4d 8b
    RSP: 0018:ffffaa1fc094bbe8 EFLAGS: 00010286
    RAX: 0000000000000000 RBX: d310a8800d7a3faf RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000dc0 RDI: ffff96e68fc036c0
    RBP: d310a8800d7a3faf R08: ffff96e6ffdb10a0 R09: 00000000752e7079
    R10: 000000000001c513 R11: 0000000004091041 R12: 0000000000000dc0
    R13: 0000000000000039 R14: ffff96e68fc036c0 R15: ffff96e68fc036c0
    FS:  00007f699dfba540(0000) GS:ffff96e6ffd80000(0000) knlGS:00000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000055f3a9d9b768 CR3: 000000002cd1c000 CR4: 00000000000006e0
    Call Trace:
     ext4_htree_store_dirent+0x35/0x100 [ext4]
     htree_dirblock_to_tree+0xea/0x290 [ext4]
     ext4_htree_fill_tree+0x1c1/0x2d0 [ext4]
     ext4_readdir+0x67c/0x9d0 [ext4]
     iterate_dir+0x8d/0x1a0
     __x64_sys_getdents+0xab/0x130
     do_syscall_64+0x60/0x1f0
     entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f699d33a9fb

This regression problem was introduced by commit e581595ea2 ("ocfs: no
need to check return value of debugfs_create functions").

Link: http://lkml.kernel.org/r/20191225061501.13587-1-ghe@suse.com
Fixes: e581595ea2 ("ocfs: no need to check return value of debugfs_create functions")
Signed-off-by: Gang He <ghe@suse.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>	[5.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04 13:55:09 -08:00
Kai Li
397eac17f8 ocfs2: call journal flush to mark journal as empty after journal recovery when mount
If journal is dirty when mount, it will be replayed but jbd2 sb log tail
cannot be updated to mark a new start because journal->j_flag has
already been set with JBD2_ABORT first in journal_init_common.

When a new transaction is committed, it will be recored in block 1
first(journal->j_tail is set to 1 in journal_reset).  If emergency
restart happens again before journal super block is updated
unfortunately, the new recorded trans will not be replayed in the next
mount.

The following steps describe this procedure in detail.
1. mount and touch some files
2. these transactions are committed to journal area but not checkpointed
3. emergency restart
4. mount again and its journals are replayed
5. journal super block's first s_start is 1, but its s_seq is not updated
6. touch a new file and its trans is committed but not checkpointed
7. emergency restart again
8. mount and journal is dirty, but trans committed in 6 will not be
replayed.

This exception happens easily when this lun is used by only one node.
If it is used by multi-nodes, other node will replay its journal and its
journal super block will be updated after recovery like what this patch
does.

ocfs2_recover_node->ocfs2_replay_journal.

The following jbd2 journal can be generated by touching a new file after
journal is replayed, and seq 15 is the first valid commit, but first seq
is 13 in journal super block.

logdump:
  Block 0: Journal Superblock
  Seq: 0   Type: 4 (JBD2_SUPERBLOCK_V2)
  Blocksize: 4096   Total Blocks: 32768   First Block: 1
  First Commit ID: 13   Start Log Blknum: 1
  Error: 0
  Feature Compat: 0
  Feature Incompat: 2 block64
  Feature RO compat: 0
  Journal UUID: 4ED3822C54294467A4F8E87D2BA4BC36
  FS Share Cnt: 1   Dynamic Superblk Blknum: 0
  Per Txn Block Limit    Journal: 0    Data: 0

  Block 1: Journal Commit Block
  Seq: 14   Type: 2 (JBD2_COMMIT_BLOCK)

  Block 2: Journal Descriptor
  Seq: 15   Type: 1 (JBD2_DESCRIPTOR_BLOCK)
  No. Blocknum        Flags
   0. 587             none
  UUID: 00000000000000000000000000000000
   1. 8257792         JBD2_FLAG_SAME_UUID
   2. 619             JBD2_FLAG_SAME_UUID
   3. 24772864        JBD2_FLAG_SAME_UUID
   4. 8257802         JBD2_FLAG_SAME_UUID
   5. 513             JBD2_FLAG_SAME_UUID JBD2_FLAG_LAST_TAG
  ...
  Block 7: Inode
  Inode: 8257802   Mode: 0640   Generation: 57157641 (0x3682809)
  FS Generation: 2839773110 (0xa9437fb6)
  CRC32: 00000000   ECC: 0000
  Type: Regular   Attr: 0x0   Flags: Valid
  Dynamic Features: (0x1) InlineData
  User: 0 (root)   Group: 0 (root)   Size: 7
  Links: 1   Clusters: 0
  ctime: 0x5de5d870 0x11104c61 -- Tue Dec  3 11:37:20.286280801 2019
  atime: 0x5de5d870 0x113181a1 -- Tue Dec  3 11:37:20.288457121 2019
  mtime: 0x5de5d870 0x11104c61 -- Tue Dec  3 11:37:20.286280801 2019
  dtime: 0x0 -- Thu Jan  1 08:00:00 1970
  ...
  Block 9: Journal Commit Block
  Seq: 15   Type: 2 (JBD2_COMMIT_BLOCK)

The following is journal recovery log when recovering the upper jbd2
journal when mount again.

syslog:
  ocfs2: File system on device (252,1) was not unmounted cleanly, recovering it.
  fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 0
  fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 1
  fs/jbd2/recovery.c:(do_one_pass, 449): Starting recovery pass 2
  fs/jbd2/recovery.c:(jbd2_journal_recover, 278): JBD2: recovery, exit status 0, recovered transactions 13 to 13

Due to first commit seq 13 recorded in journal super is not consistent
with the value recorded in block 1(seq is 14), journal recovery will be
terminated before seq 15 even though it is an unbroken commit, inode
8257802 is a new file and it will be lost.

Link: http://lkml.kernel.org/r/20191217020140.2197-1-li.kai4@h3c.com
Signed-off-by: Kai Li <li.kai4@h3c.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Changwei Ge <gechangwei@live.cn>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04 13:55:09 -08:00
Randy Dunlap
e39e773ad1 fs/posix_acl.c: fix kernel-doc warnings
Fix kernel-doc warnings in fs/posix_acl.c.
Also fix one typo (setgit -> setgid).

  fs/posix_acl.c:647: warning: Function parameter or member 'inode' not described in 'posix_acl_update_mode'
  fs/posix_acl.c:647: warning: Function parameter or member 'mode_p' not described in 'posix_acl_update_mode'
  fs/posix_acl.c:647: warning: Function parameter or member 'acl' not described in 'posix_acl_update_mode'

Link: http://lkml.kernel.org/r/29b0dc46-1f28-a4e5-b1d0-ba2b65629779@infradead.org
Fixes: 073931017b ("posix_acl: Clear SGID bit when setting file permissions")

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04 13:55:09 -08:00
Eric Biggers
213921f967 fs/namespace.c: make to_mnt_ns() static
Make to_mnt_ns() static to address the following 'sparse' warning:

    fs/namespace.c:1731:22: warning: symbol 'to_mnt_ns' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20191209234830.156260-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04 13:55:09 -08:00
Eric Biggers
7bebd69ecf fs/nsfs.c: include headers for missing declarations
Include linux/proc_fs.h and fs/internal.h to address the following
'sparse' warnings:

    fs/nsfs.c:41:32: warning: symbol 'ns_dentry_operations' was not declared. Should it be static?
    fs/nsfs.c:145:5: warning: symbol 'open_related_ns' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20191209234822.156179-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04 13:55:09 -08:00
Eric Biggers
b16155a0b0 fs/direct-io.c: include fs/internal.h for missing prototype
Include fs/internal.h to address the following 'sparse' warning:

    fs/direct-io.c:591:5: warning: symbol 'sb_init_dio_done_wq' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20191209234544.128302-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04 13:55:09 -08:00
Linus Torvalds
3a562aee72 for-5.5-rc4-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl4PenMACgkQxWXV+ddt
 WDtsyg/6A86Da1zpLQKHiQtfhH629oeL/lFI0Cz+52xcgKSAUX8pHrWzKawSVpTT
 OTgksskHREPDjyDQytxAUKJnuplFEF7sCFNhzhsjQ7cvhC0lLDmW0VQr1Wob1/c4
 miW18YXN2DrYFQqdCeypvq1FItglxuBmsa9eITYzytRZcATY+3b506FhT/d8NIiH
 f3nnChUiNuyfAiez1SC4FavDpVma1O6SpPMk29wUN/+/Dnd4aBrfvhuygy7xyIOK
 rzotRR5xagQDpei+99lT2hys4Pv0yEOajoGmgbgpaZNP0vgQcBLaF5UTeMv/Ib2j
 i+muu1rWi4R6lS3hh30kRSifKmPF7I9JB1dbd5gPfZiERlDQk+qZWrPFv9l0lf7M
 R3jn04EaPVNr0dtftRFF2VvpIlUQYgIvwZHx4Td6Oy9XO0X4n5vlKxYg9aHmybJ2
 Vni13ad2oWNLo2Dd1eUYrEJ/QwGc1pq7JU9HiOST7yEJv1FN++AF4qTYTyvc7Yl4
 HrN349/2k2J9vU88BIlXIxYG73AL9lIpGF2ROiEw+xn4oOevDP/5HCP8ZvELoNVM
 NqhJGoURu+AhZ7Rm3RYAYySShjQPF+qw/Dnz8sowQsEFUQ990E1wUxWZwTOP6+JQ
 cdwzYiXYIExTghfQdUDdwNlpCXUbSuJxFOGjKmhBuzA7HHwGf1Q=
 =jGvW
 -----END PGP SIGNATURE-----

Merge tag 'for-5.5-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few fixes for btrfs:

   - blkcg accounting problem with compression that could stall writes

   - setting up blkcg bio for compression crashes due to NULL bdev
     pointer

   - fix possible infinite loop in writeback for nocow files (here
     possible means almost impossible, 13 things that need to happen to
     trigger it)"

* tag 'for-5.5-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix infinite loop during nocow writeback due to race
  btrfs: fix compressed write bio blkcg attribution
  btrfs: punt all bios created in btrfs_submit_compressed_write()
2020-01-03 12:20:21 -08:00
Linus Torvalds
b6b4aafc99 block-5.5-20200103
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl4PauoQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpkqMD/4zcRGP8LhkVG3I8QMqmhhJNyZb5WP5O/mV
 ekWYYzxaPhn/WQjM17Y6MBuvPTqUk1R/FiZdjW+OPUOrXUPsasSSKmLvcpiHIRHY
 YTMhzNNRpainakM7q+BwhfeNm/T3d0JXMfN8oufIfrdFHy7+bcZupctv0aQgnLdz
 Z8f40nR8t/uYfRrMzCjEM9mY2P9qelwP3ylnloS2XC0r+O/j/tBZEtnormhFwnfa
 NGx732WyHBK4Qt9bRK8FzmuUmUmhxU+MWHcR5erCCzeFEY4DnzzthFuKGvZ3Ai0f
 qWMjYJBrWHvNJL6M4Dm3zE/tVWIFg3//zo7buuZD29Ms3GoqGj88mbrS/BFKNp1+
 U5cShEMqd/7y5RUB0nOzRnfGVVHz/2WRRKvhc0/cFHulJb0q28u5pHhmlrduvxr/
 VogoSiimo7qLVGiBeXgfubuyB3nvzGvDICgiqt3rIVgpgMytoiBz53HgnNLB8QtO
 CtYDJOm8sCTtnkEPshij9Ly0dfIYUVKz7QIx65qSfcv+vxag/WmWuG5woRtqtB6N
 AoFqf8PpHXHtVrokbqAG4j4R7QrLSrYSTh/vDa6muTaiu7fe5Gb2gXKWPbIGU5n+
 b+2oKqlpVN1ZqOvhUv/PXW0U/nMx2j5Untank1ebHd+416nfNqM5lN6jsfZ3HnaR
 tpST+JtLDg==
 =ll2G
 -----END PGP SIGNATURE-----

Merge tag 'block-5.5-20200103' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Three fixes in here:

   - Fix for a missing split on default memory boundary mask (4G) (Ming)

   - Fix for multi-page read bio truncate (Ming)

   - Fix for null_blk zone close request handling (Damien)"

* tag 'block-5.5-20200103' of git://git.kernel.dk/linux-block:
  null_blk: Fix REQ_OP_ZONE_CLOSE handling
  block: fix splitting segments on boundary masks
  block: add bio_truncate to fix guard_bio_eod
2020-01-03 12:11:30 -08:00
Jan Stancek
15f0ec941f mm/hugetlbfs: fix for_each_hstate() loop in init_hugetlbfs_fs()
LTP memfd_create04 started failing for some huge page sizes
after v5.4-10135-gc3bfc5dd73c6.

The problem is the check introduced to for_each_hstate() loop that
should skip default_hstate_idx.  Since it doesn't update 'i' counter,
all subsequent huge page sizes are skipped as well.

Fixes: 8fc312b32b ("mm/hugetlbfs: fix error handling when setting up mounts")
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-03 10:39:08 -08:00
Linus Torvalds
278b14eb92 pstore bug fixes
- always reset circular buffer state when writing new dump (Aleksandr Yashkin)
 - fix rare error-path memory leak (Kees Cook)
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl4OWFsWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJg3FD/0Vp/Qf0zfjGlV657kNhbYMKYSM
 h/ItNhFovnRnxobY2CzjZyYoU9ZCsv2bEhplILlJxG84l+nb3nUubHGhUVcoQ6iE
 aDJAdz632qB+R8l/Ouk0xUp/UnVvoheA68RCaeY6R7G/vAHqceoGD72Ji5kFe1T8
 QxPqVAJAIn+hdBHtZVhuueqINQ+3nCUUwE4Yu0veqEG5wcf+awGDjDrha2ZsMyfM
 Zl54co7l5Z5MwLLTxXO8RRFXlGtmMXnLQcKUwdPIxi+ZBZRan2pdj2zX5Sr0eGA2
 HtZchn9Bc5bbwERRobbwWzQeqbwfoRHMOWFq7mUHyaho/5Pbv67A0lOib2cuXG+U
 WjpXV9nLiAbVqIH0wph4Vppx5acDI7o0JRS0oct/f7trZE58zXfHLJHLqPrblV8J
 3pr989yGBEYI8YCf2dJXIKQlT0+s24Iyby0RmkCwQC/DkRWf9+a6BBYLHvGS96Y2
 gZ4wKGJRXbDP3m4NuojYRv3DwF71jtSw5bR1kLAdjEygbtxG064mTcACH+DsScKK
 6zBbOA2qBf1ReFdz4KVN9TAOnPIsTgeBG5ttmzGgkKZUUAe9+65AJH8jS22NSD2S
 mn4S2hXio5hV3bMUJtItEzMjYwMkdeyac8EubPlLH4DoNdZ01ER6L07x83m5cPjl
 wZ845V9VZ14i4bkq8w==
 =yG+c
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore bug fixes from Kees Cook:

 - always reset circular buffer state when writing new dump (Aleksandr
   Yashkin)

 - fix rare error-path memory leak (Kees Cook)

* tag 'pstore-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore/ram: Write new dumps to start of recycled zones
  pstore/ram: Fix error-path memory leak in persistent_ram_new() callers
2020-01-02 16:39:51 -08:00
Dominik Brodowski
74f1a29910 Revert "fs: remove ksys_dup()"
This reverts commit 8243186f0c ("fs: remove ksys_dup()") and the
subsequent fix for it in commit 2d3145f8d2 ("early init: fix error
handling when opening /dev/console").

Trying to use filp_open() and f_dupfd() instead of pseudo-syscalls
caused more trouble than what is worth it: it requires accessing vfs
internals and it turns out there were other bugs in it too.

In particular, the file reference counting was wrong - because unlike
the original "open+2*dup" sequence it used "filp_open+3*f_dupfd" and
thus had an extra leaked file reference.

That in turn then caused odd problems with Androidx86 long after boot
becaue of how the extra reference to the console kept the session active
even after all file descriptors had been closed.

Reported-by: youling 257 <youling257@gmail.com>
Cc: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-02 16:15:33 -08:00
Aleksandr Yashkin
9e5f1c1980 pstore/ram: Write new dumps to start of recycled zones
The ram_core.c routines treat przs as circular buffers. When writing a
new crash dump, the old buffer needs to be cleared so that the new dump
doesn't end up in the wrong place (i.e. at the end).

The solution to this problem is to reset the circular buffer state before
writing a new Oops dump.

Signed-off-by: Aleksandr Yashkin <a.yashkin@inango-systems.com>
Signed-off-by: Nikolay Merinov <n.merinov@inango-systems.com>
Signed-off-by: Ariel Gilman <a.gilman@inango-systems.com>
Link: https://lore.kernel.org/r/20191223133816.28155-1-n.merinov@inango-systems.com
Fixes: 896fc1f0c4 ("pstore/ram: Switch to persistent_ram routines")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-01-02 12:30:50 -08:00
Kees Cook
8df955a32a pstore/ram: Fix error-path memory leak in persistent_ram_new() callers
For callers that allocated a label for persistent_ram_new(), if the call
fails, they must clean up the allocation.

Suggested-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Fixes: 1227daa43b ("pstore/ram: Clarify resource reservation labels")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/20191211191353.14385-1-navid.emamdoost@gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-01-02 12:30:39 -08:00
Filipe Manana
de7999afed Btrfs: fix infinite loop during nocow writeback due to race
When starting writeback for a range that covers part of a preallocated
extent, due to a race with writeback for another range that also covers
another part of the same preallocated extent, we can end up in an infinite
loop.

Consider the following example where for inode 280 we have two dirty
ranges:

  range A, from 294912 to 303103, 8192 bytes
  range B, from 348160 to 438271, 90112 bytes

and we have the following file extent item layout for our inode:

  leaf 38895616 gen 24544 total ptrs 29 free space 13820 owner 5
      (...)
      item 27 key (280 108 200704) itemoff 14598 itemsize 53
          extent data disk bytenr 0 nr 0 type 1 (regular)
          extent data offset 0 nr 94208 ram 94208
      item 28 key (280 108 294912) itemoff 14545 itemsize 53
          extent data disk bytenr 10433052672 nr 81920 type 2 (prealloc)
          extent data offset 0 nr 81920 ram 81920

Then the following happens:

1) Writeback starts for range B (from 348160 to 438271), execution of
   run_delalloc_nocow() starts;

2) The first iteration of run_delalloc_nocow()'s whil loop leaves us at
   the extent item at slot 28, pointing to the prealloc extent item
   covering the range from 294912 to 376831. This extent covers part of
   our range;

3) An ordered extent is created against that extent, covering the file
   range from 348160 to 376831 (28672 bytes);

4) We adjust 'cur_offset' to 376832 and move on to the next iteration of
   the while loop;

5) The call to btrfs_lookup_file_extent() leaves us at the same leaf,
   pointing to slot 29, 1 slot after the last item (the extent item
   we processed in the previous iteration);

6) Because we are a slot beyond the last item, we call btrfs_next_leaf(),
   which releases the search path before doing a another search for the
   last key of the leaf (280 108 294912);

7) Right after btrfs_next_leaf() released the path, and before it did
   another search for the last key of the leaf, writeback for the range
   A (from 294912 to 303103) completes (it was previously started at
   some point);

8) Upon completion of the ordered extent for range A, the prealloc extent
   we previously found got split into two extent items, one covering the
   range from 294912 to 303103 (8192 bytes), with a type of regular extent
   (and no longer prealloc) and another covering the range from 303104 to
   376831 (73728 bytes), with a type of prealloc and an offset of 8192
   bytes. So our leaf now has the following layout:

     leaf 38895616 gen 24544 total ptrs 31 free space 13664 owner 5
         (...)
         item 27 key (280 108 200704) itemoff 14598 itemsize 53
             extent data disk bytenr 0 nr 0 type 1
             extent data offset 0 nr 8192 ram 94208
         item 28 key (280 108 208896) itemoff 14545 itemsize 53
             extent data disk bytenr 10433142784 nr 86016 type 1
             extent data offset 0 nr 86016 ram 86016
         item 29 key (280 108 294912) itemoff 14492 itemsize 53
             extent data disk bytenr 10433052672 nr 81920 type 1
             extent data offset 0 nr 8192 ram 81920
         item 30 key (280 108 303104) itemoff 14439 itemsize 53
             extent data disk bytenr 10433052672 nr 81920 type 2
             extent data offset 8192 nr 73728 ram 81920

9) After btrfs_next_leaf() returns, we have our path pointing to that same
   leaf and at slot 30, since it has a key we didn't have before and it's
   the first key greater then the key that was previously the last key of
   the leaf (key (280 108 294912));

10) The extent item at slot 30 covers the range from 303104 to 376831
    which is in our target range, so we process it, despite having already
    created an ordered extent against this extent for the file range from
    348160 to 376831. This is because we skip to the next extent item only
    if its end is less than or equals to the start of our delalloc range,
    and not less than or equals to the current offset ('cur_offset');

11) As a result we compute 'num_bytes' as:

    num_bytes = min(end + 1, extent_end) - cur_offset;
              = min(438271 + 1, 376832) - 376832 = 0

12) We then call create_io_em() for a 0 bytes range starting at offset
    376832;

13) Then create_io_em() enters an infinite loop because its calls to
    btrfs_drop_extent_cache() do nothing due to the 0 length range
    passed to it. So no existing extent maps that cover the offset
    376832 get removed, and therefore calls to add_extent_mapping()
    return -EEXIST, resulting in an infinite loop. This loop from
    create_io_em() is the following:

    do {
        btrfs_drop_extent_cache(BTRFS_I(inode), em->start,
                                em->start + em->len - 1, 0);
        write_lock(&em_tree->lock);
        ret = add_extent_mapping(em_tree, em, 1);
        write_unlock(&em_tree->lock);
        /*
         * The caller has taken lock_extent(), who could race with us
         * to add em?
         */
    } while (ret == -EEXIST);

Also, each call to btrfs_drop_extent_cache() triggers a warning because
the start offset passed to it (376832) is smaller then the end offset
(376832 - 1) passed to it by -1, due to the 0 length:

  [258532.052621] ------------[ cut here ]------------
  [258532.052643] WARNING: CPU: 0 PID: 9987 at fs/btrfs/file.c:602 btrfs_drop_extent_cache+0x3f4/0x590 [btrfs]
  (...)
  [258532.052672] CPU: 0 PID: 9987 Comm: fsx Tainted: G        W         5.4.0-rc7-btrfs-next-64 #1
  [258532.052673] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
  [258532.052691] RIP: 0010:btrfs_drop_extent_cache+0x3f4/0x590 [btrfs]
  (...)
  [258532.052695] RSP: 0018:ffffb4be0153f860 EFLAGS: 00010287
  [258532.052700] RAX: ffff975b445ee360 RBX: ffff975b44eb3e08 RCX: 0000000000000000
  [258532.052700] RDX: 0000000000038fff RSI: 0000000000039000 RDI: ffff975b445ee308
  [258532.052700] RBP: 0000000000038fff R08: 0000000000000000 R09: 0000000000000001
  [258532.052701] R10: ffff975b513c5c10 R11: 00000000e3c0cfa9 R12: 0000000000039000
  [258532.052703] R13: ffff975b445ee360 R14: 00000000ffffffef R15: ffff975b445ee308
  [258532.052705] FS:  00007f86a821de80(0000) GS:ffff975b76a00000(0000) knlGS:0000000000000000
  [258532.052707] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [258532.052708] CR2: 00007fdacf0f3ab4 CR3: 00000001f9d26002 CR4: 00000000003606f0
  [258532.052712] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [258532.052717] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [258532.052717] Call Trace:
  [258532.052718]  ? preempt_schedule_common+0x32/0x70
  [258532.052722]  ? ___preempt_schedule+0x16/0x20
  [258532.052741]  create_io_em+0xff/0x180 [btrfs]
  [258532.052767]  run_delalloc_nocow+0x942/0xb10 [btrfs]
  [258532.052791]  btrfs_run_delalloc_range+0x30b/0x520 [btrfs]
  [258532.052812]  ? find_lock_delalloc_range+0x221/0x250 [btrfs]
  [258532.052834]  writepage_delalloc+0xe4/0x140 [btrfs]
  [258532.052855]  __extent_writepage+0x110/0x4e0 [btrfs]
  [258532.052876]  extent_write_cache_pages+0x21c/0x480 [btrfs]
  [258532.052906]  extent_writepages+0x52/0xb0 [btrfs]
  [258532.052911]  do_writepages+0x23/0x80
  [258532.052915]  __filemap_fdatawrite_range+0xd2/0x110
  [258532.052938]  btrfs_fdatawrite_range+0x1b/0x50 [btrfs]
  [258532.052954]  start_ordered_ops+0x57/0xa0 [btrfs]
  [258532.052973]  ? btrfs_sync_file+0x225/0x490 [btrfs]
  [258532.052988]  btrfs_sync_file+0x225/0x490 [btrfs]
  [258532.052997]  __x64_sys_msync+0x199/0x200
  [258532.053004]  do_syscall_64+0x5c/0x250
  [258532.053007]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [258532.053010] RIP: 0033:0x7f86a7dfd760
  (...)
  [258532.053014] RSP: 002b:00007ffd99af0368 EFLAGS: 00000246 ORIG_RAX: 000000000000001a
  [258532.053016] RAX: ffffffffffffffda RBX: 0000000000000ec9 RCX: 00007f86a7dfd760
  [258532.053017] RDX: 0000000000000004 RSI: 000000000000836c RDI: 00007f86a8221000
  [258532.053019] RBP: 0000000000021ec9 R08: 0000000000000003 R09: 00007f86a812037c
  [258532.053020] R10: 0000000000000001 R11: 0000000000000246 R12: 00000000000074a3
  [258532.053021] R13: 00007f86a8221000 R14: 000000000000836c R15: 0000000000000001
  [258532.053032] irq event stamp: 1653450494
  [258532.053035] hardirqs last  enabled at (1653450493): [<ffffffff9dec69f9>] _raw_spin_unlock_irq+0x29/0x50
  [258532.053037] hardirqs last disabled at (1653450494): [<ffffffff9d4048ea>] trace_hardirqs_off_thunk+0x1a/0x20
  [258532.053039] softirqs last  enabled at (1653449852): [<ffffffff9e200466>] __do_softirq+0x466/0x6bd
  [258532.053042] softirqs last disabled at (1653449845): [<ffffffff9d4c8a0c>] irq_exit+0xec/0x120
  [258532.053043] ---[ end trace 8476fce13d9ce20a ]---

Which results in flooding dmesg/syslog since btrfs_drop_extent_cache()
uses WARN_ON() and not WARN_ON_ONCE().

So fix this issue by changing run_delalloc_nocow()'s loop to move to the
next extent item when the current extent item ends at at offset less than
or equals to the current offset instead of the start offset.

Fixes: 80ff385665 ("Btrfs: update nodatacow code v2")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-12-30 16:13:20 +01:00
Dennis Zhou
46bcff2bfc btrfs: fix compressed write bio blkcg attribution
Bio attribution is handled at bio_set_dev() as once we have a device, we
have a corresponding request_queue and then can derive the current css.
In special cases, we want to attribute to bio to someone else. This can
be done by calling bio_associate_blkg_from_css() or
kthread_associate_blkcg() depending on the scenario. Btrfs does this for
compressed writeback as they are handled by kworkers, so the latter can
be done here.

Commit 1a41802701 ("btrfs: drop bio_set_dev where not needed") removes
early bio_set_dev() calls prior to submit_stripe_bio(). This breaks the
above assumption that we'll have a request_queue when we are doing
association. To fix this, switch to using kthread_associate_blkcg().

Without this, we crash in btrfs/024:

  [ 3052.093088] BUG: kernel NULL pointer dereference, address: 0000000000000510
  [ 3052.107013] #PF: supervisor read access in kernel mode
  [ 3052.107014] #PF: error_code(0x0000) - not-present page
  [ 3052.107015] PGD 0 P4D 0
  [ 3052.107021] Oops: 0000 [#1] SMP
  [ 3052.138904] CPU: 42 PID: 201270 Comm: kworker/u161:0 Kdump: loaded Not tainted 5.5.0-rc1-00062-g4852d8ac90a9 #712
  [ 3052.138905] Hardware name: Quanta Tioga Pass Single Side 01-0032211004/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
  [ 3052.138912] Workqueue: btrfs-delalloc btrfs_work_helper
  [ 3052.191375] RIP: 0010:bio_associate_blkg_from_css+0x1e/0x3c0
  [ 3052.191379] RSP: 0018:ffffc900210cfc90 EFLAGS: 00010282
  [ 3052.191380] RAX: 0000000000000000 RBX: ffff88bfe5573c00 RCX: 0000000000000000
  [ 3052.191382] RDX: ffff889db48ec2f0 RSI: ffff88bfe5573c00 RDI: ffff889db48ec2f0
  [ 3052.191386] RBP: 0000000000000800 R08: 0000000000203bb0 R09: ffff889db16b2400
  [ 3052.293364] R10: 0000000000000000 R11: ffff88a07fffde80 R12: ffff889db48ec2f0
  [ 3052.293365] R13: 0000000000001000 R14: ffff889de82bc000 R15: ffff889e2b7bdcc8
  [ 3052.293367] FS:  0000000000000000(0000) GS:ffff889ffba00000(0000) knlGS:0000000000000000
  [ 3052.293368] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 3052.293369] CR2: 0000000000000510 CR3: 0000000002611001 CR4: 00000000007606e0
  [ 3052.293370] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [ 3052.293371] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [ 3052.293372] PKRU: 55555554
  [ 3052.293376] Call Trace:
  [ 3052.402552]  btrfs_submit_compressed_write+0x137/0x390
  [ 3052.402558]  submit_compressed_extents+0x40f/0x4c0
  [ 3052.422401]  btrfs_work_helper+0x246/0x5a0
  [ 3052.422408]  process_one_work+0x200/0x570
  [ 3052.438601]  ? process_one_work+0x180/0x570
  [ 3052.438605]  worker_thread+0x4c/0x3e0
  [ 3052.438614]  kthread+0x103/0x140
  [ 3052.460735]  ? process_one_work+0x570/0x570
  [ 3052.460737]  ? kthread_mod_delayed_work+0xc0/0xc0
  [ 3052.460744]  ret_from_fork+0x24/0x30

Fixes: 1a41802701 ("btrfs: drop bio_set_dev where not needed")
Reported-by: Chris Murphy <chris@colorremedies.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-12-30 16:07:19 +01:00
Dennis Zhou
7b62e66cbb btrfs: punt all bios created in btrfs_submit_compressed_write()
Compressed writes happen in the background via kworkers. However, this
causes bios to be attributed to root bypassing any cgroup limits from
the actual writer. We tag the first bio with REQ_CGROUP_PUNT, which will
punt the bio to an appropriate cgroup specific workqueue and attribute
the IO properly. However, if btrfs_submit_compressed_write() creates a
new bio, we don't tag it the same way. Add the appropriate tagging for
subsequent bios.

Fixes: ec39f7696c ("Btrfs: use REQ_CGROUP_PUNT for worker thread submitted bios")
Reviewed-by: Chris Mason <clm@fb.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-12-30 16:07:16 +01:00
Linus Torvalds
d75663868d File locking fix for v5.5
-----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEES8DXskRxsqGE6vXTAA5oQRlWghUFAl4IsqoTHGpsYXl0b25A
 a2VybmVsLm9yZwAKCRAADmhBGVaCFWSCD/9/iWJiaigNR8TtWeVPHDVCi/JvXXCY
 Wr7djgNwykiMeClUhUGBc2uRg1b5zwDnVFIs9jmDus9fGXMgSYIysyQOjDpxuXiP
 d8jByAAFznTeQCFXBaXmaMUh+74uILG7USi42Lm0v/ySjNAMKdtODclITIiL9Knb
 JeiB/2ZKF6oSHizSd5Rdh+59cgOKAZae/o4jWCmCSVil/Ye/wkUPbLulSyscNlGg
 0PlWMKeGKWKhcr49DeAAcpV0yeErPzvrv1P3eTxoXsozBP+ScfgSpMZT0HFqZw7Y
 yVKyLngxOWKsF1m5Bbx1ChsTfVzW82j8yNICOkSnCxtD8Ii2ZrAEKLWQAD4lnQMV
 6CJOenMxqlbtcpDxtZxXbDotlIZ+ttls3eebPcKPjfzmy01GUgho2VNjx4mxEHQH
 crs5l/l7u/CQfs0/2Eh+IspRbTRCPxt4jdZZu7fPv8idS9idR5NdHUL6VqWTYg/H
 9i7tQasgkOY+tr5AZZMJSL2LjQEc9cygVZ45vvajbD6IzGBDEK+/SQb7FgELZmac
 yI2frQEWr2f8PEiwG9zBquvxF5rbN9/+qXxpE1rScZBCoBywwuGinFC5xpaie7oY
 hjqcMVkyM/eBkt3ozS8nPMqWlnkYv26fS739Mta21QvpsWx77yyGU94ZyTIcn5J7
 emmGavC6iK+Ygg==
 =AIbz
 -----END PGP SIGNATURE-----

Merge tag 'locks-v5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull /proc/locks formatting fix from Jeff Layton:
 "This is a trivial fix for a _very_ long standing bug in /proc/locks
  formatting. Ordinarily, I'd wait for the merge window for something
  like this, but it is making it difficult to validate some overlayfs
  fixes.

  I've also gone ahead and marked this for stable"

* tag 'locks-v5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  locks: print unsigned ino in /proc/locks
2019-12-29 09:50:57 -08:00
Linus Torvalds
cc2f36ec71 One performance fix for large directory searches, and one minor style cleanup noticed by Clang
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCAAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl4IM0YACgkQiiy9cAdy
 T1H17gwAhs0ThN18BmSGJ0ftl4Va0rpcELMp3ZR7cy9yA5JpgI1/ikd7p+VKxTbx
 bk3WP9YL5aTcoo6icNZZpgaQmgmqaNpYT5W8cezOyp5Lp+1sxD/CfH4Wpdnssmn1
 S42+XpEtEg+Jw6Sg3wadyWbwmkOjJaDm8mZ266T3gaGjkOKHHEvzVi5YN8iWwW3+
 bkvByKOzCWUigGQIRHzc5Ubbpp652sz07xAKuja5irz29BLobkGUtfD0uXyj7o4g
 dajCzJR8j99bdpeMX2igr28Z002CiZpJRrafdjvoOB9nhoJliIr/PLfdeLZtQUTc
 vHvKH6URVnFOFxb21MdfKKy7NKH6KtyZyJdtg95SuOLFr0Myfl9i9EjJ3suPONnD
 VUb2MDMVpZ2wyJDHZbXWG84XQoW8hJEcADEkzzBwVIMcvyVMr6gOh107WZTlWmgM
 /n+78mU95u9ueA/RCVFXC040J5eEGTO0R/uno4oSMLEBCLdSwm5l/vU38n7FRmP2
 I5JEsee4
 =cuQx
 -----END PGP SIGNATURE-----

Merge tag '5.5-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "One performance fix for large directory searches, and one minor style
  cleanup noticed by Clang"

* tag '5.5-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Optimize readdir on reparse points
  cifs: Adjust indentation in smb2_open_file
2019-12-29 09:48:47 -08:00
Amir Goldstein
98ca480a8f locks: print unsigned ino in /proc/locks
An ino is unsigned, so display it as such in /proc/locks.

Cc: stable@vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
2019-12-29 09:00:58 -05:00
Ming Lei
85a8ce62c2 block: add bio_truncate to fix guard_bio_eod
Some filesystem, such as vfat, may send bio which crosses device boundary,
and the worse thing is that the IO request starting within device boundaries
can contain more than one segment past EOD.

Commit dce30ca9e3 ("fs: fix guard_bio_eod to check for real EOD errors")
tries to fix this issue by returning -EIO for this situation. However,
this way lets fs user code lose chance to handle -EIO, then sync_inodes_sb()
may hang for ever.

Also the current truncating on last segment is dangerous by updating the
last bvec, given bvec table becomes not immutable any more, and fs bio
users may not retrieve the truncated pages via bio_for_each_segment_all() in
its .end_io callback.

Fixes this issue by supporting multi-segment truncating. And the
approach is simpler:

- just update bio size since block layer can make correct bvec with
the updated bio size. Then bvec table becomes really immutable.

- zero all truncated segments for read bio

Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Fixed-by: dce30ca9e3 ("fs: fix guard_bio_eod to check for real EOD errors")
Reported-by: syzbot+2b9e54155c8c25d8d165@syzkaller.appspotmail.com
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-28 09:44:56 -07:00
Linus Torvalds
534121d289 io_uring-5.5-20191226
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl4E+aYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprKmEAC6tcPlb2BB+7fOuj44uAdE+RInqMxbfD3w
 Tj9KpF47e02DUvBTtwDJDHJ9QT4PlJhd66M1xrp3IMUV13PKQt9OFfc6TH38Jz/9
 mhHDGNj1s+GVLRH22PQtFjyMgzHA6+UF4NxHLDJ62c2CtrCVswFRUiWrSR8LgvDp
 EkVELGEpi080ffton9nhyy3ylOCcpCu1xX1mOCg5EhcqzFQnZMlFaj9PDFrNhQzT
 e8fdl/nGoKtxZ/x6V8Oso02r/K1XievV4dfrAtOZg4jiqp/3G2eiqoGGcYnShSDU
 qulKLGsuHK51Lay8AGEaw3haeMn1PKCNe+xv0uCubHdf2iMyBdpjCLsLpTlhmtF/
 DkfP13H8k3/nUP9Y8FHt9+Ld56qpdqi/77ngCF84Ed4MFXKYkwyFFyHLMaBCw5zk
 Z07qISAbj3UeRPug+8iBKpzNBUXvXqqOGHp2h0faXz+C0yG0l7HOkhZ3m+dDD6vN
 6ABrMrS/ZuWdiW4PiJUejW81rlRKJaCgmTXMjjQCpgFeUqj6flB4sALp3amY9v7r
 CZVL67wBZ4u4YeKW0q8j4Lh3DrT5M7IGPP2uT9tw0FiamgYByC2rV/SAecxbh8f5
 NQJbL4uyJwcusOorRvaKOWU9KQaz5Q6dx9auHnmLC3A0WsEFxjgtVwG3iqw8zOeF
 8E7Lk3kPcA==
 =L2kR
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.5-20191226' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:

 - Removal of now unused busy wqe list (Hillf)

 - Add cond_resched() to io-wq work processing (Hillf)

 - And then the series that I hinted at from last week, which removes
   the sqe from the io_kiocb and keeps all sqe handling on the prep
   side. This guarantees that an opcode can't do the wrong thing and
   read the sqe more than once. This is unchanged from last week, no
   issues have been observed with this in testing. Hence I really think
   we should fold this into 5.5.

* tag 'io_uring-5.5-20191226' of git://git.kernel.dk/linux-block:
  io-wq: add cond_resched() to worker thread
  io-wq: remove unused busy list from io_sqe
  io_uring: pass in 'sqe' to the prep handlers
  io_uring: standardize the prep methods
  io_uring: read 'count' for IORING_OP_TIMEOUT in prep handler
  io_uring: move all prep state for IORING_OP_{SEND,RECV}_MGS to prep handler
  io_uring: move all prep state for IORING_OP_CONNECT to prep handler
  io_uring: add and use struct io_rw for read/writes
  io_uring: use u64_to_user_ptr() consistently
2019-12-27 11:17:08 -08:00
Hillf Danton
fd1c4bc6e9 io-wq: add cond_resched() to worker thread
Reschedule the current IO worker to cut the risk that it is becoming
a cpu hog.

Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-24 09:14:29 -07:00
Hillf Danton
1f424e8bd1 io-wq: remove unused busy list from io_sqe
Commit e61df66c69 ("io-wq: ensure free/busy list browsing see all
items") added a list for io workers in addition to the free and busy
lists, not only making worker walk cleaner, but leaving the busy list
unused. Let's remove it.

Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-23 08:23:54 -07:00
Paulo Alcantara (SUSE)
046aca3c25 cifs: Optimize readdir on reparse points
When listing a directory with thounsands of files and most of them are
reparse points, we simply marked all those dentries for revalidation
and then sending additional (compounded) create/getinfo/close requests
for each of them.

Instead, upon receiving a response from an SMB2_QUERY_DIRECTORY
(FileIdFullDirectoryInformation) command, the directory entries that
have a file attribute of FILE_ATTRIBUTE_REPARSE_POINT will contain an
EaSize field with a reparse tag in it, so we parse it and mark the
dentry for revalidation only if it is a DFS or a symlink.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-12-23 09:04:44 -06:00
Nathan Chancellor
7935799e04 cifs: Adjust indentation in smb2_open_file
Clang warns:

../fs/cifs/smb2file.c:70:3: warning: misleading indentation; statement
is not part of the previous 'if' [-Wmisleading-indentation]
         if (oparms->tcon->use_resilient) {
         ^
../fs/cifs/smb2file.c:66:2: note: previous statement is here
        if (rc)
        ^
1 warning generated.

This warning occurs because there is a space after the tab on this line.
Remove it so that the indentation is consistent with the Linux kernel
coding style and clang no longer warns.

Fixes: 592fafe644 ("Add resilienthandles mount parm")
Link: https://github.com/ClangBuiltLinux/linux/issues/826
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-12-23 09:04:44 -06:00
Linus Torvalds
9efa3ed504 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Eric's s_inodes softlockup fixes + Jan's fix for recent regression
  from pipe rework"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: call fsnotify_sb_delete after evict_inodes
  fs: avoid softlockups in s_inodes iterators
  pipe: Fix bogus dereference in iov_iter_alignment()
2019-12-22 17:00:04 -08:00