Commit Graph

69673 Commits

Author SHA1 Message Date
Qu Wenruo
0b3dcd131d btrfs: fix comment for btrfs ordered extent flag bits
There is small error in comment about BTRFS_ORDERED_* flags, added in
commit 3c198fe064 ("btrfs: rework the order of
btrfs_ordered_extent::flags") but the fixup did not get merged in time.

The 4 types are for ordered extent itself, not for direct io.
Only 3 types support direct io, REGULAR/NOCOW/PREALLOC.

Fix the comment to reflect that.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19 17:25:14 +02:00
Linus Torvalds
0c93ac6940 readdir: make sure to verify directory entry for legacy interfaces too
This does the directory entry name verification for the legacy
"fillonedir" (and compat) interface that goes all the way back to the
dark ages before we had a proper dirent, and the readdir() system call
returned just a single entry at a time.

Nobody should use this interface unless you still have binaries from
1991, but let's do it right.

This came up during discussions about unsafe_copy_to_user() and proper
checking of all the inputs to it, as the networking layer is looking to
use it in a few new places.  So let's make sure the _old_ users do it
all right and proper, before we add new ones.

See also commit 8a23eb804c ("Make filldir[64]() verify the directory
entry filename is valid") which did the proper modern interfaces that
people actually use. It had a note:

    Note that I didn't bother adding the checks to any legacy interfaces
    that nobody uses.

which this now corrects.  Note that we really don't care about POSIX and
the presense of '/' in a directory entry, but verify_dirent_name() also
ends up doing the proper name length verification which is what the
input checking discussion was about.

[ Another option would be to remove the support for this particular very
  old interface: any binaries that use it are likely a.out binaries, and
  they will no longer run anyway since we removed a.out binftm support
  in commit eac6165570 ("x86: Deprecate a.out support").

  But I'm not sure which came first: getdents() or ELF support, so let's
  pretend somebody might still have a working binary that uses the
  legacy readdir() case.. ]

Link: https://lore.kernel.org/lkml/CAHk-=wjbvzCAhAtvG0d81W5o0-KT5PPTHhfJ5ieDFq+bGtgOYg@mail.gmail.com/
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-17 11:39:49 -07:00
Linus Torvalds
9cdbf64674 io_uring-5.12-2021-04-16
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmB5/80QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpgJWD/0YyQ7YlMKw4x2nS6TiojeD5KN8aTvVVHlf
 TetYVXIOaMNPatUDREVl7vgaQVFi2cgWLvshYPIarsQK9ESNJ/UqKi0c6pJsa29w
 02K8EJsEdmQmJlyC6tUv/Drvdfzinx+jAm9doz8FrJ+2v5+pYqNFhFTh/JKRebI8
 kBEKH/e3eHg305TDXcWOoNxX3jD77v1IJo8JavLt1vncQKb6P3dahGwmkFRpRcHZ
 sY33baI2jdO/79ybH0F0fNgD+XGnBYShKB7iOlJfT9SeI8NfpR/OCuuPl3gxeeKZ
 PwHcsPHA/b0AAUKpWY0sX/86xMUUab2DPJV5nQR5SrgniJLs0Xqm0dvz1r6QM2BK
 7EfwbXK48xm6Wop5WFlo+wqzW11ES/nRHm9leRomeeKo6Goe/4kyEgK93QJOkckv
 /kt8t4PIxALlWKmgJgKk5kEWiP/QT0LjaZE2lYy9VcfpSoC7E7ZnK1KlvwbYOfZK
 QqjNNsRr2z7zQaUCFHDc3mPe/WkV8DygRrXZYl8qfopqndOduceFaGqNe5iZV2IX
 NnC7FGY66dcmjqGmOqr+4RvhUYR2+GtazaYgV14LM1cGBO5OtqlS1m+DRDbiavp4
 fpHeGGQx78O97LURNJG4Ti/AR5dfSQLQwZzb60UBk43a9ZwVsLtxFquSaV8dkm3U
 0hfxANWYXg==
 =KEBW
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.12-2021-04-16' of git://git.kernel.dk/linux-block

Pull io_uring fix from Jens Axboe:
 "Fix for a potential hang at exit with SQPOLL from Pavel"

* tag 'io_uring-5.12-2021-04-16' of git://git.kernel.dk/linux-block:
  io_uring: fix early sqd_list removal sqpoll hangs
2021-04-16 16:18:53 -07:00
Pavel Begunkov
c7d95613c7 io_uring: fix early sqd_list removal sqpoll hangs
[  245.463317] INFO: task iou-sqp-1374:1377 blocked for more than 122 seconds.
[  245.463334] task:iou-sqp-1374    state:D flags:0x00004000
[  245.463345] Call Trace:
[  245.463352]  __schedule+0x36b/0x950
[  245.463376]  schedule+0x68/0xe0
[  245.463385]  __io_uring_cancel+0xfb/0x1a0
[  245.463407]  do_exit+0xc0/0xb40
[  245.463423]  io_sq_thread+0x49b/0x710
[  245.463445]  ret_from_fork+0x22/0x30

It happens when sqpoll forgot to run park_task_work and goes to exit,
then exiting user may remove ctx from sqd_list, and so corresponding
io_sq_thread() -> io_uring_cancel_sqpoll() won't be executed. Hopefully
it just stucks in do_exit() in this case.

Fixes: dbe1bdbb39 ("io_uring: handle signals for IO threads like a normal thread")
Reported-by: Joakim Hassila <joj@mac.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-14 13:07:27 -06:00
Linus Torvalds
7d90072491 for-5.12-rc6-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmBy9DoACgkQxWXV+ddt
 WDtqdxAAnK4zx79k5ok6nlj8JlOfReimX4wPYYigiiKGY40cfQUZ1YUqbDscvrt+
 cbzvqJuMU/V/UVaPW/CLmNi5XpNlSmj0229iwy59BIcpXfgtAMTsa1zsY4teZ/AT
 3noNuT15CTeybwii0nT++AkqJbCbwXc5ItccGh9ZMOQwXuA5IUVTAzKrulUJoxXN
 zt23lX/ivtSfUH+pMMIG6wMVG2eGIP5m9drw+2n0yK08gt+oprLYnaAaE389mXgb
 TIRBafeBY7UA1YEcA4JDBDMNa0L8yWSV+XiMhxw7Ear7KoROAunKNbsG8USll6zb
 zBftfO+Gzv86wVvvPXg2KR8Qs9vyJMw2bOROFKzOnd+wQQ76v0XefOhNUUN98E6g
 tLTmCH+M1B1Qm1j2hVyOect/PMY51xqJA9xwlTtAbqIcz4qyOtfTR9KqqlWxVKJW
 9pAEMII063xEKVxgv2khOhewEjOgqa4v9YFQjVXdcHPKvGTAYBeoJA735+WnQ1HZ
 okPC5k3DoEcVZEkUPvespEsAqm+RoBufNxWmQ7hq5N3IwZAXsIwTlhysgrXQWyc9
 aTigWBq6rQ/bMz/57vI626+MAMh3StL+UOxlWiT+GToInpjZwoxZ0lgQdD6vUfUm
 T90T2930+PTkykQM9sNdQygGiH0J5FzkvneYvpkOYJ/+vphsRiA=
 =MuRt
 -----END PGP SIGNATURE-----

Merge tag 'for-5.12-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
 "One more patch that we'd like to get to 5.12 before release.

  It's changing where and how the superblock is stored in the zoned
  mode. It is an on-disk format change but so far there are no
  implications for users as the proper mkfs support hasn't been merged
  and is waiting for the kernel side to settle.

  Until now, the superblocks were derived from the zone index, but zone
  size can differ per device. This is changed to be based on fixed
  offset values, to make it independent of the device zone size.

  The work on that got a bit delayed, we discussed the exact locations
  to support potential device sizes and usecases. (Partially delayed
  also due to my vacation.) Having that in the same release where the
  zoned mode is declared usable is highly desired, there are userspace
  projects that need to be updated to recognize the feature. Pushing
  that to the next release would make things harder to test"

* tag 'for-5.12-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: move superblock logging zone location
2021-04-11 11:53:36 -07:00
Naohiro Aota
53b74fa990 btrfs: zoned: move superblock logging zone location
Moves the location of the superblock logging zones. The new locations of
the logging zones are now determined based on fixed block addresses
instead of on fixed zone numbers.

The old placement method based on fixed zone numbers causes problems when
one needs to inspect a file system image without access to the drive zone
information. In such case, the super block locations cannot be reliably
determined as the zone size is unknown. By locating the superblock logging
zones using fixed addresses, we can scan a dumped file system image without
the zone information since a super block copy will always be present at or
after the fixed known locations.

Introduce the following three pairs of zones containing fixed offset
locations, regardless of the device zone size.

  - primary superblock: offset   0B (and the following zone)
  - first copy:         offset 512G (and the following zone)
  - Second copy:        offset   4T (4096G, and the following zone)

If a logging zone is outside of the disk capacity, we do not record the
superblock copy.

The first copy position is much larger than for a non-zoned filesystem,
which is at 64M.  This is to avoid overlapping with the log zones for
the primary superblock. This higher location is arbitrary but allows
supporting devices with very large zone sizes, plus some space around in
between.

Such large zone size is unrealistic and very unlikely to ever be seen in
real devices. Currently, SMR disks have a zone size of 256MB, and we are
expecting ZNS drives to be in the 1-4GB range, so this limit gives us
room to breathe. For now, we only allow zone sizes up to 8GB. The
maximum zone size that would still fit in the space is 256G.

The fixed location addresses are somewhat arbitrary, with the intent of
maintaining superblock reliability for smaller and larger devices, with
the preference for the latter. For this reason, there are two superblocks
under the first 1T. This should cover use cases for physical devices and
for emulated/device-mapper devices.

The superblock logging zones are reserved for superblock logging and
never used for data or metadata blocks. Note that we only reserve the
two zones per primary/copy actually used for superblock logging. We do
not reserve the ranges of zones possibly containing superblocks with the
largest supported zone size (0-16GB, 512G-528GB, 4096G-4112G).

The zones containing the fixed location offsets used to store
superblocks on a non-zoned volume are also reserved to avoid confusion.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-10 12:13:16 +02:00
Linus Torvalds
adb2c4174f Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "14 patches.

  Subsystems affected by this patch series: mm (kasan, gup, pagecache,
  and kfence), MAINTAINERS, mailmap, nds32, gcov, ocfs2, ia64, and lib"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  lib: fix kconfig dependency on ARCH_WANT_FRAME_POINTERS
  kfence, x86: fix preemptible warning on KPTI-enabled systems
  lib/test_kasan_module.c: suppress unused var warning
  kasan: fix conflict with page poisoning
  fs: direct-io: fix missing sdio->boundary
  ia64: fix user_stack_pointer() for ptrace()
  ocfs2: fix deadlock between setattr and dio_end_io_write
  gcov: re-fix clang-11+ support
  nds32: flush_dcache_page: use page_mapping_file to avoid races with swapoff
  mm/gup: check page posion status for coredump.
  .mailmap: fix old email addresses
  mailmap: update email address for Jordan Crouse
  treewide: change my e-mail address, fix my name
  MAINTAINERS: update CZ.NIC's Turris information
2021-04-09 17:06:32 -07:00
Linus Torvalds
3b9784350f io_uring-5.12-2021-04-09
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmBwXtgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprCVEAC8ZqiwYFIaCODC4NouBtMjZsInyGwetwBa
 sezn2KUIsP3umjbgZMO5KrI/6zGWO5j+7O7j1eW05FoN/yE1gM477EfPVgYeupgV
 tefAOskEAwPtMrTkQJt7kTxnv2IHNhTV1LEvv5Co3ueFoIA4Fbqso1oTsn87D3JK
 x5oZ+c5tDC7VendkZ/5SX4ioysFwabZv1qUR8GxRHCx4Kk5prrdUmejb8QeUOglG
 aJ6hd0UIFNmQAW+3ujOD6wlhn3MyKVfdZpGbhQfFQz3GV88maxAxNm/5dcEuEDan
 W8khgdJHZ5mr7/oDkjmJyjp9MNvamuPw52UzrQwmMOf3RvaKanM74/mJWXcVz+J9
 tf9UsqiWwMmCpXwszA2KYBy+NIFZ3QqAy0Ed1pj0WLeYOzw0zPb34S/O9SnWy597
 /T0I2jIBsOl478Wo6U+kmvTubJoRpGz9kgXbpHLve0rLb1BEWj1hQsZZrMJf/29b
 b5zULnIQqhztFoaYdg08LFrhePIp1oVTuY8XO8/k84C4VpOFQjBKfGMtK+HalRTA
 bX2h/K3xCQPwK4VWIaM1ucvhrMTTReajfKXesQwuBmwETbs2p4tRIjL8KFw7d1rC
 2jB4UvKVnWuxE64LZ+VK86btC9eAjYmDTTEPRUVFE4ZFroXcr+MK2Qke5o2FNReN
 UOVy2Xlelw==
 =EbJ/
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.12-2021-04-09' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Two minor fixups for the reissue logic, and one for making sure that
  unbounded work is canceled on io-wq exit"

* tag 'io_uring-5.12-2021-04-09' of git://git.kernel.dk/linux-block:
  io-wq: cancel unbounded works on io-wq destroy
  io_uring: fix rw req completion
  io_uring: clear F_REISSUE right after getting it
2021-04-09 15:06:52 -07:00
Jack Qiu
df41872b68 fs: direct-io: fix missing sdio->boundary
I encountered a hung task issue, but not a performance one.  I run DIO
on a device (need lba continuous, for example open channel ssd), maybe
hungtask in below case:

  DIO:						Checkpoint:
  get addr A(at boundary), merge into BIO,
  no submit because boundary missing
						flush dirty data(get addr A+1), wait IO(A+1)
						writeback timeout, because DIO(A) didn't submit
  get addr A+2 fail, because checkpoint is doing

dio_send_cur_page() may clear sdio->boundary, so prevent it from missing
a boundary.

Link: https://lkml.kernel.org/r/20210322042253.38312-1-jack.qiu@huawei.com
Fixes: b1058b9812 ("direct-io: submit bio after boundary buffer is added to it")
Signed-off-by: Jack Qiu <jack.qiu@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-09 14:54:23 -07:00
Wengang Wang
90bd070aae ocfs2: fix deadlock between setattr and dio_end_io_write
The following deadlock is detected:

  truncate -> setattr path is waiting for pending direct IO to be done (inode->i_dio_count become zero) with inode->i_rwsem held (down_write).

  PID: 14827  TASK: ffff881686a9af80  CPU: 20  COMMAND: "ora_p005_hrltd9"
   #0  __schedule at ffffffff818667cc
   #1  schedule at ffffffff81866de6
   #2  inode_dio_wait at ffffffff812a2d04
   #3  ocfs2_setattr at ffffffffc05f322e [ocfs2]
   #4  notify_change at ffffffff812a5a09
   #5  do_truncate at ffffffff812808f5
   #6  do_sys_ftruncate.constprop.18 at ffffffff81280cf2
   #7  sys_ftruncate at ffffffff81280d8e
   #8  do_syscall_64 at ffffffff81003949
   #9  entry_SYSCALL_64_after_hwframe at ffffffff81a001ad

dio completion path is going to complete one direct IO (decrement
inode->i_dio_count), but before that it hung at locking inode->i_rwsem:

   #0  __schedule+700 at ffffffff818667cc
   #1  schedule+54 at ffffffff81866de6
   #2  rwsem_down_write_failed+536 at ffffffff8186aa28
   #3  call_rwsem_down_write_failed+23 at ffffffff8185a1b7
   #4  down_write+45 at ffffffff81869c9d
   #5  ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2]
   #6  ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2]
   #7  dio_complete+140 at ffffffff812c873c
   #8  dio_aio_complete_work+25 at ffffffff812c89f9
   #9  process_one_work+361 at ffffffff810b1889
  #10  worker_thread+77 at ffffffff810b233d
  #11  kthread+261 at ffffffff810b7fd5
  #12  ret_from_fork+62 at ffffffff81a0035e

Thus above forms ABBA deadlock.  The same deadlock was mentioned in
upstream commit 28f5a8a7c0 ("ocfs2: should wait dio before inode lock
in ocfs2_setattr()").  It seems that that commit only removed the
cluster lock (the victim of above dead lock) from the ABBA deadlock
party.

End-user visible effects: Process hang in truncate -> ocfs2_setattr path
and other processes hang at ocfs2_dio_end_io_write path.

This is to fix the deadlock itself.  It removes inode_lock() call from
dio completion path to remove the deadlock and add ip_alloc_sem lock in
setattr path to synchronize the inode modifications.

[wen.gang.wang@oracle.com: remove the "had_alloc_lock" as suggested]
  Link: https://lkml.kernel.org/r/20210402171344.1605-1-wen.gang.wang@oracle.com

Link: https://lkml.kernel.org/r/20210331203654.3911-1-wen.gang.wang@oracle.com
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-09 14:54:23 -07:00
Linus Torvalds
17e7124aad 3 cifs/smb3 fixes, 2 for stable, includes a reconnetct fix and fix for display of devnames with special characters
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmBvsOIACgkQiiy9cAdy
 T1G90Qv+IEnZZClHENPxoefXE9hdMYflrZ0FgR1xAD0JJZDTjj0OIsilCfJxpDq5
 Wz4oXZT0WDuwOishMljGkdIUR+xdH8fILWsXhUJoa38zCzF4OZ7ld1zbBDblHLSw
 dR0bZJ5FmfXJmqMmrdl2ebLysQ2Xn0qCEn/6FiHABCKgoEvUcYJ95TMVnc6xyc+j
 x1dZbCmL0lQjRsUE+V918fFyAHJqWvlJC3dfEPl15ARgksEM/14f4o1Tp3tI3jZ1
 aVgPMsCb/ZC4Cwjr1NB7g66ymLKZZODl66wxM6zgNXQj72Ay2Sr2KeXT4WH6jspK
 mJQED27i20VtganGTBcZaULsupqd5+378G5Or1TDqEsDDq+Xg4+B+BBgojhBqSh6
 Czp1iZgZGHNaw/40t4ikeiNTqzQN3WNaiTAptsAew9MS2yvM4wsu1T2D70r0wQOI
 FytBASx/u60mu5BnomTaOSgvkOw4LJ9LJproqREdgyNcvSwqnqc3HkCRoWDxxcQc
 TCxr/i4a
 =rxNZ
 -----END PGP SIGNATURE-----

Merge tag '5.12-rc6-smb3' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Three cifs/smb3 fixes, two for stable: a reconnect fix and a fix for
  display of devnames with special characters"

* tag '5.12-rc6-smb3' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: escape spaces in share names
  fs: cifs: Remove unnecessary struct declaration
  cifs: On cifs_reconnect, resolve the hostname again.
2021-04-08 18:57:47 -07:00
Pavel Begunkov
c60eb049f4 io-wq: cancel unbounded works on io-wq destroy
WARNING: CPU: 5 PID: 227 at fs/io_uring.c:8578 io_ring_exit_work+0xe6/0x470
RIP: 0010:io_ring_exit_work+0xe6/0x470
Call Trace:
 process_one_work+0x206/0x400
 worker_thread+0x4a/0x3d0
 kthread+0x129/0x170
 ret_from_fork+0x22/0x30

INFO: task lfs-openat:2359 blocked for more than 245 seconds.
task:lfs-openat      state:D stack:    0 pid: 2359 ppid:     1 flags:0x00000004
Call Trace:
 ...
 wait_for_completion+0x8b/0xf0
 io_wq_destroy_manager+0x24/0x60
 io_wq_put_and_exit+0x18/0x30
 io_uring_clean_tctx+0x76/0xa0
 __io_uring_files_cancel+0x1b9/0x2e0
 do_exit+0xc0/0xb40
 ...

Even after io-wq destroy has been issued io-wq worker threads will
continue executing all left work items as usual, and may hang waiting
for I/O that won't ever complete (aka unbounded).

[<0>] pipe_read+0x306/0x450
[<0>] io_iter_do_read+0x1e/0x40
[<0>] io_read+0xd5/0x330
[<0>] io_issue_sqe+0xd21/0x18a0
[<0>] io_wq_submit_work+0x6c/0x140
[<0>] io_worker_handle_work+0x17d/0x400
[<0>] io_wqe_worker+0x2c0/0x330
[<0>] ret_from_fork+0x22/0x30

Cancel all unbounded I/O instead of executing them. This changes the
user visible behaviour, but that's inevitable as io-wq is not per task.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cd4b543154154cba055cf86f351441c2174d7f71.1617842918.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-08 13:33:17 -06:00
Pavel Begunkov
9728463737 io_uring: fix rw req completion
WARNING: at fs/io_uring.c:8578 io_ring_exit_work.cold+0x0/0x18

As reissuing is now passed back by REQ_F_REISSUE and kiocb_done()
internally uses __io_complete_rw(), it may stop after setting the flag
so leaving a dangling request.

There are tricky edge cases, e.g. reading beyound file, boundary, so
the easiest way is to hand code reissue in kiocb_done() as
__io_complete_rw() was doing for us before.

Fixes: 230d50d448 ("io_uring: move reissue into regular IO path")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f602250d292f8a84cca9a01d747744d1e797be26.1617842918.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-08 13:32:59 -06:00
Linus Torvalds
4ea51e0e37 for-linus-2021-04-08
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYG7FBgAKCRCRxhvAZXjc
 osGSAQCW7V8zPhZ7Zwll3QeUk0xAqD6e6T3Uv3EoQPKCVcc00gEA/hQtDJYSGZWI
 22hPAffU2YOKeYDXq7SIu+eJ1y/ShQ0=
 =xfme
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-2021-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull close_range() fix from Christian Brauner:
 "Syzbot reported a bug in close_range.

  Debugging this showed we didn't recalculate the current maximum fd
  number for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC after we unshared
  the file descriptors table. As a result, max_fd could exceed the
  current fdtable maximum causing us to set excessive bits.

  As a concrete example, let's say the user requested everything from fd
  4 to ~0UL to be closed and their current fdtable size is 256 with
  their highest open fd being 4. With CLOSE_RANGE_UNSHARE the caller
  will end up with a new fdtable which has room for 64 file descriptors
  since that is the lowest fdtable size we accept. But now max_fd will
  still point to 255 and needs to be adjusted. Fix this by retrieving
  the correct maximum fd value in __range_cloexec().

  I've carried this fix for a little while but since there was no
  linux-next release over easter I waited until now.

  With this change close_range() can be further simplified but imho we
  are in no hurry to do that and so I'll defer this for the 5.13 merge
  window"

* tag 'for-linus-2021-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  file: fix close_range() for unshare+cloexec
2021-04-08 08:46:53 -07:00
Linus Torvalds
035d80695f Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull umount fix from Al Viro:
 "Brown paperbag time: dumb braino in the series that went into 5.7
  broke the 'don't step into ->d_weak_revalidate() when umount(2) looks
  the victim up' behaviour.

  Spotted only now - saw

        if (!err && unlikely(nd->flags & LOOKUP_MOUNTPOINT)) {
                err = handle_lookup_down(nd);
                nd->flags &= ~LOOKUP_JUMPED; // no d_weak_revalidate(), please...
        }

  and went "why do we clear that flag here - nothing below that point is
  going to check it anyway" / "wait a minute, what is it doing *after*
  complete_walk() (which is where we check that flag and call
  ->d_weak_revalidate())" / "how could that possibly _not_ break?",
  followed by reproducing the breakage and verifying that the obvious
  fix of that braino does, indeed, fix it.

  The reproducer is (assuming that $DIR exists and is exported r/w to
  localhost)

      mkdir $DIR/a
      mkdir /tmp/foo
      mount --bind /tmp/foo /tmp/foo
      mkdir /tmp/foo/a
      mkdir /tmp/foo/b
      mount -t nfs4 localhost:$DIR/a /tmp/foo/a
      mount -t nfs4 localhost:$DIR /tmp/foo/b
      rmdir /tmp/foo/b/a
      umount /tmp/foo/b
      umount /tmp/foo/a
      umount -l /tmp/foo      # will get everything under /tmp/foo, no matter what

  Correct behaviour is successful umount; broken kernels (5.7-rc1 and
  later) get

      umount.nfs4: /tmp/foo/a: Stale file handle

  Note that bind mount is there to be able to recover - on broken
  kernels we'd get stuck with impossible-to-umount filesystem if not for
  that.

  FWIW, that braino had been posted for review back then, at least
  twice. Unfortunately, the call of complete_walk() was outside of diff
  context, so the bogosity hadn't been immediately obvious from the
  patch alone ;-/"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  LOOKUP_MOUNTPOINT: we are cleaning "jumped" flag too late
2021-04-08 08:26:06 -07:00
Pavel Begunkov
6ad7f2332e io_uring: clear F_REISSUE right after getting it
There are lots of ways r/w request may continue its path after getting
REQ_F_REISSUE, it's not necessarily io-wq and can be, e.g. apoll,
and submitted via  io_async_task_func() -> __io_req_task_submit()

Clear the flag right after getting it, so the next attempt is well
prepared regardless how the request will be executed.

Fixes: 230d50d448 ("io_uring: move reissue into regular IO path")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/11dcead939343f4e27cab0074d34afcab771bfa4.1617842918.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-07 22:10:19 -06:00
Maciek Borzecki
0fc9322ab5 cifs: escape spaces in share names
Commit 653a5efb84 ("cifs: update super_operations to show_devname")
introduced the display of devname for cifs mounts. However, when mounting
a share which has a whitespace in the name, that exact share name is also
displayed in mountinfo. Make sure that all whitespace is escaped.

Signed-off-by: Maciek Borzecki <maciek.borzecki@gmail.com>
CC: <stable@vger.kernel.org> # 5.11+
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-04-07 21:30:27 -05:00
Wan Jiabing
d135be0a7f fs: cifs: Remove unnecessary struct declaration
struct cifs_readdata is declared twice. One is declared
at 208th line.
And struct cifs_readdata is defined blew.
The declaration here is not needed. Remove the duplicate.

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-04-07 21:30:27 -05:00
Shyam Prasad N
4e456b30f7 cifs: On cifs_reconnect, resolve the hostname again.
On cifs_reconnect, make sure that DNS resolution happens again.
It could be the cause of connection to go dead in the first place.

This also contains the fix for a build issue identified by Intel bot.
Reported-by: kernel test robot <lkp@intel.com>

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: <stable@vger.kernel.org> # 5.11+
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-04-07 21:29:36 -05:00
Al Viro
4f0ed93fb9 LOOKUP_MOUNTPOINT: we are cleaning "jumped" flag too late
That (and traversals in case of umount .) should be done before
complete_walk().  Either a braino or mismerge damage on queue
reorders - either way, I should've spotted that much earlier.

Fucked-up-by: Al Viro <viro@zeniv.linux.org.uk>
X-Paperbag: Brown
Fixes: 161aff1d93 "LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()"
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-04-06 20:33:00 -04:00
Linus Torvalds
2d74366078 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fs fixes from Al Viro:
 "Fairly old hostfs bug (in setups that are not used by anyone,
  apparently) + fix for this cycle regression: extra dput/mntput in
  LOOKUP_CACHED failure handling"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Make sure nd->path.mnt and nd->path.dentry are always valid pointers
  hostfs: fix memory handling in follow_link()
2021-04-06 12:52:49 -07:00
Al Viro
7d01ef7585 Make sure nd->path.mnt and nd->path.dentry are always valid pointers
Initialize them in set_nameidata() and make sure that terminate_walk() clears them
once the pointers become potentially invalid (i.e. we leave RCU mode or drop them
in non-RCU one).  Currently we have "path_init() always initializes them and nobody
accesses them outside of path_init()/terminate_walk() segments", which is asking
for trouble.

With that change we would have nd->path.{mnt,dentry}
	1) always valid - NULL or pointing to currently allocated objects.
	2) non-NULL while we are successfully walking
	3) NULL when we are not walking at all
	4) contributing to refcounts whenever non-NULL outside of RCU mode.

Fixes: 6c6ec2b0a3 ("fs: add support for LOOKUP_CACHED")
Reported-by: syzbot+c88a7030da47945a3cc3@syzkaller.appspotmail.com
Tested-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2021-04-06 12:33:07 -04:00
Linus Torvalds
d83e98f9d8 io_uring-5.12-2021-04-03
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmBoyXQQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpiOXD/4wnXFHOmt5HMHmEXU4bB5b46snh3iCU4QM
 w7pqgPMgS8uRZZ16FWpzDwccC2NZDoFRHhFxDrOqdKMO1DQlzk1yfSeKjNs9h2Oe
 mJ6JuuaX6bEk2RRmjqqM5aMCazE2+tWtDydveL9OrJO6uiPcZYFPiRDwafDzQo4n
 YPhUhPm+dqn/4Li6Fap2ieCeLqXNEUKpA0/Bd9QQV0Q/oZq5oLdJefIk8EMXH/tf
 eKH2muZDjOV0FYdG8lPsNAF0c5qJ/aID+jlhyUz8Bkn31lOS96d5rzXoFq+AonsC
 gVwLbaMcAibHrDjNxIQGcEU0VSjvvfy9GAfjJ3uSuyjpE4dNMe2fuU/B3rF6xb6E
 upEfAik+frvzfFuZx11SZh/JwNBatJh35DVZZczI48YKk+s2MI0q9+lNINLtL6bD
 3J287jnZbETU54WCMruiRwjQ3J1YWOj2pfxrPT9J0NZdQ0r7MXsDJecGNu+nL0X3
 Ry7IpXUqabCf+4+XrGZ2NG6/kd5D/smatc5FqTkyeih3mw94iprNuWanzBlUZYQV
 8ybrVtW37caAd1vyx0/p2I1LV5Y0eNGpSWz/WTDj0FINPCPjLq/VmPsIYgihmvz4
 joWEzGnBKqds/CAvSneyz2d9MVlQ1083Az/Vi9g4w0IHG/p3ekQGBPDFBvQyszq9
 pNkMpQ9Wxw==
 =edGl
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.12-2021-04-03' of git://git.kernel.dk/linux-block

POull io_uring fix from Jens Axboe:
 "Just fixing a silly braino in a previous patch, where we'd end up
  failing to compile if CONFIG_BLOCK isn't enabled.

  Not that a lot of people do that, but kernel bot spotted it and it's
  probably prudent to just flush this out now before -rc6.

  Sorry about that, none of my test compile configs have !CONFIG_BLOCK"

* tag 'io_uring-5.12-2021-04-03' of git://git.kernel.dk/linux-block:
  io_uring: fix !CONFIG_BLOCK compilation failure
2021-04-03 14:26:47 -07:00
Linus Torvalds
8e29be3468 Two more gfs2 fixes
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmBomgAUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqWkA//bGV+XgrDas0mBjAQiGBcqC9M1Ts1
 cffrJ2E9E5J/Cnn+/wAmkGYZU3oHpdmNI4DytCneHhfY9YamSlg6B9MDxebF3Iuu
 eXiWB4rXEf7l5AG5ywUloXl171TTxOQDo5bbOlvYeDWPZ/WYEBsdZ/rxF9yIent/
 RS5HMt9I9x6c2WQGTlBvo8D618WyfxzkX5AjvhojITgX5UItZg6bOKkcuqcQ9PNG
 5MGnHCCJU2Zh3Y6gGZjp3rQnxRAzpBhFdVrXaZgTAtKLycsGHMcjqfmST6N2si8l
 cCDgujRhRfIOpQZs0vOdcJVtpNBfDxgOO5JlTdkY6Grh/STYoN9dFo7V0SnYhMJE
 FdBgfHwNyyBEo/QV3gDXrvITxw+xypACjS/znffArxFySNzSfv2oWPCxOJvbel0L
 H4y+gbJ4R+QUzkUuvnXjxjl7c70jK+flLbzUxXxeSQBmOHtCiHZJK43UnCq+fpmZ
 hOrUaYHvCV1iC/9OeAy1N8MlXicUHnpmu/7q7GEGaRTV4zN85MjddOiRKAz6RiAl
 2nB64GbLDFjY3HF8+/giEwAWikcYPk2W8S0WmX9Bjn+UdMYuH1cWF/TLwWrSmQJX
 MQmYlwLOj+UKg8Ku0+Klh2c8oMxo7C8o9t9BjBJio5od0U/sen9v/08DUobk4jwN
 hhjW+as1ZV/+Tpc=
 =TP8w
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v5.12-rc2-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fixes from Andreas Gruenbacher:
 "Two more gfs2 fixes"

* tag 'gfs2-v5.12-rc2-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: report "already frozen/thawed" errors
  gfs2: Flag a withdraw if init_threads() fails
2021-04-03 12:15:01 -07:00
Jens Axboe
e82ad48539 io_uring: fix !CONFIG_BLOCK compilation failure
kernel test robot correctly pinpoints a compilation failure if
CONFIG_BLOCK isn't set:

fs/io_uring.c: In function '__io_complete_rw':
>> fs/io_uring.c:2509:48: error: implicit declaration of function 'io_rw_should_reissue'; did you mean 'io_rw_reissue'? [-Werror=implicit-function-declaration]
    2509 |  if ((res == -EAGAIN || res == -EOPNOTSUPP) && io_rw_should_reissue(req)) {
         |                                                ^~~~~~~~~~~~~~~~~~~~
         |                                                io_rw_reissue
    cc1: some warnings being treated as errors

Ensure that we have a stub declaration of io_rw_should_reissue() for
!CONFIG_BLOCK.

Fixes: 230d50d448 ("io_uring: move reissue into regular IO path")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-02 19:45:34 -06:00
Linus Torvalds
d93a0d43e3 block-5.12-2021-04-02
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmBnh84QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpubkD/0Y+l3cecjzds3RnRXEXYsRFBKGfK6c7Uuu
 QVCrlRp6tKPmBoDLQyl95Mg0e44pR4s3Bw5W4j9GmJtVyNVzC2x3dqXn3uXSFca/
 KU+4GIzl2VIXS5Pn90GLE6/xw3FtVy8w2c6V3g4jkLR29bexPdO4s57cohxKR9kL
 ZU+icCag9RlNIYkuB79Wy6Y3/m41L5WRkMGiMb0sJS9Q+k+zetZNIeNIxWn4E1zF
 qWymdyBFx31qL8/2ZmRwb8XzF5qE2XimXz1a7ZX754zyR/Ry5rGc0h+JjqgUhSV9
 wM2gLlMNEP+k+8DOU9ACYdff18P6b+RZ8mJnGZjZseAut1qJXonVtgDoWX7mEs9+
 8Gl+n18TYpKEfzLiOOOtu/xeZYMjp0MUjO6iHTpzRfqBjKNoZGTuz0wGC5nX/ZYI
 y5QWifI0NmMmTPDJpH6nVYzqDLbEZzcMz6WeOfhKQ/yv7gOxj+BFGJ3olJ+DAx8c
 e6HDPa/WkC0iqie5cpzYjmve0HrKJADMMrRRWGRkgmOZ8uAaSS17rZExg1CICr8I
 bOVYsrPsg8ErKVvzlx/DK6EfhNrw0+Db7paYccl2a3pXx/T8iHmW3RSqn7jMrhA1
 7QPOCUMKuWuaOupWJWw25gxNS3viJa57/hxMG1nvAgpJx6QvBNaLrwcIWXO1cfrp
 boe/UFnftg==
 =8odY
 -----END PGP SIGNATURE-----

Merge tag 'block-5.12-2021-04-02' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Remove comment that never came to fruition in 22 years of development
   (Christoph)

 - Remove unused request flag (Christoph)

 - Fix for null_blk fake timeout handling (Damien)

 - Fix for IOCB_NOWAIT being ignored for O_DIRECT on raw bdevs (Pavel)

 - Error propagation fix for multiple split bios (Yufen)

* tag 'block-5.12-2021-04-02' of git://git.kernel.dk/linux-block:
  block: remove the unused RQF_ALLOCED flag
  block: update a few comments in uapi/linux/blkpg.h
  block: don't ignore REQ_NOWAIT for direct IO
  null_blk: fix command timeout completion handling
  block: only update parent bi_status when bio fail
2021-04-02 16:13:13 -07:00
Linus Torvalds
1faccb6394 io_uring-5.12-2021-04-02
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmBnh+kQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpo3AEACSddwiafCkKLQyl5oaIdrzP1ANvH3vWOyD
 MCbcf0NR5W1dcYS4JSA3fmrXpBVYL5tPdAxYcbachBhK2zYJaWuZtgQlB3ofYiNo
 x1nRFsJXcY/vNBCrZo5xJTgRHyvsNrviZFgb2OOy9Cv2IDn0riJSciPr+A1cIE6J
 Tn1lhGaWHDcboWl2oYUAGUWimkmTuuCcwpP6KCuBVRkTc+C1v4sRy2EO/84AQUBc
 XQWov8IUCDISlZmiukktr4a1+9vL4PbsLDRw2Zc8ZH6oTuNIju8sQgxyzm/EN4Uz
 D3oJ/YEHNUfW+divI3djqwNBiskcl9SUcpgzPwkWOJf+YcUE6iGNJPwJ9B+1NiH9
 WKmgjulRrDMTO9/flK8+GpAegDjaPUXcM4nd1ItQGHX6GHxCIWYaNHsngWgWebSy
 +wjOlwRxCdgRRhwAWQwu8k5O85UjCLO8uq4mK0TA2GTz5QzGVa9dQaqovMpsHAOb
 8TtxWdRFePZIl3CXB3r6nSFQv3S9d70Dq5+Mgq7pz9+n0vGfV6cTbWPIbne2V7g+
 +IaZlVLQXu8WRTf/sTq91LWyaJrJiMEsY7dts+8K9lGsdFT0PJIxf6VeuZpBYCBg
 B+JBHpdlMBZhTjltEzEubBUQZog+cQkway90Q7MtL4Ue+qwV4WbgLziHTyzL3GmI
 cQiujMlcRg==
 =pxfZ
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.12-2021-04-02' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Nothing really major in here, and finally nothing really related to
  signals. A few minor fixups related to the threading changes, and some
  general fixes, that's it.

  There's the pending gdb-get-confused-about-arch, but that's more of a
  cosmetic issue, nothing that hinder use of it. And given that other
  archs will likely be affected by that oddity too, better to postpone
  any changes there until 5.13 imho"

* tag 'io_uring-5.12-2021-04-02' of git://git.kernel.dk/linux-block:
  io_uring: move reissue into regular IO path
  io_uring: fix EIOCBQUEUED iter revert
  io_uring/io-wq: protect against sprintf overflow
  io_uring: don't mark S_ISBLK async work as unbounded
  io_uring: drop sqd lock before handling signals for SQPOLL
  io_uring: handle setup-failed ctx in kill_timeouts
  io_uring: always go for cancellation spin on exec
2021-04-02 16:08:19 -07:00
Jens Axboe
230d50d448 io_uring: move reissue into regular IO path
It's non-obvious how retry is done for block backed files, when it happens
off the kiocb done path. It also makes it tricky to deal with the iov_iter
handling.

Just mark the req as needing a reissue, and handling it from the
submission path instead. This makes it directly obvious that we're not
re-importing the iovec from userspace past the submit point, and it means
that we can just reuse our usual -EAGAIN retry path from the read/write
handling.

At some point in the future, we'll gain the ability to always reliably
return -EAGAIN through the stack. A previous attempt on the block side
didn't pan out and got reverted, hence the need to check for this
information out-of-band right now.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-02 09:24:20 -06:00
Pavel Begunkov
f8b78caf21 block: don't ignore REQ_NOWAIT for direct IO
If IOCB_NOWAIT is set on submission, then that needs to get propagated to
REQ_NOWAIT on the block side. Otherwise we completely lose this
information, and any issuer of IOCB_NOWAIT IO will potentially end up
blocking on eg request allocation on the storage side.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-02 08:34:30 -06:00
Christian Brauner
9b5b872215
file: fix close_range() for unshare+cloexec
syzbot reported a bug when putting the last reference to a tasks file
descriptor table. Debugging this showed we didn't recalculate the
current maximum fd number for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC
after we unshared the file descriptors table. So max_fd could exceed the
current fdtable maximum causing us to set excessive bits. As a concrete
example, let's say the user requested everything from fd 4 to ~0UL to be
closed and their current fdtable size is 256 with their highest open fd
being 4. With CLOSE_RANGE_UNSHARE the caller will end up with a new
fdtable which has room for 64 file descriptors since that is the lowest
fdtable size we accept. But now max_fd will still point to 255 and needs
to be adjusted. Fix this by retrieving the correct maximum fd value in
__range_cloexec().

Reported-by: syzbot+283ce5a46486d6acdbaf@syzkaller.appspotmail.com
Fixes: 582f1fb6b7 ("fs, close_range: add flag CLOSE_RANGE_CLOEXEC")
Fixes: fec8a6a691 ("close_range: unshare all fds for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Giuseppe Scrivano <gscrivan@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-04-02 14:11:10 +02:00
Pavel Begunkov
07204f2157 io_uring: fix EIOCBQUEUED iter revert
iov_iter_revert() is done in completion handlers that happensf before
read/write returns -EIOCBQUEUED, no need to repeat reverting afterwards.
Moreover, even though it may appear being just a no-op, it's actually
races with 1) user forging a new iovec of a different size 2) reissue,
that is done via io-wq continues completely asynchronously.

Fixes: 3e6a0d3c75 ("io_uring: fix -EAGAIN retry with IOPOLL")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-01 09:31:21 -06:00
Pavel Begunkov
696ee88a7c io_uring/io-wq: protect against sprintf overflow
task_pid may be large enough to not fit into the left space of
TASK_COMM_LEN-sized buffers and overflow in sprintf. We not so care
about uniqueness, so replace it with safer snprintf().

Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1702c6145d7e1c46fbc382f28334c02e1a3d3994.1617267273.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-01 09:21:18 -06:00
Jens Axboe
4b982bd0f3 io_uring: don't mark S_ISBLK async work as unbounded
S_ISBLK is marked as unbounded work for async preparation, because it
doesn't match S_ISREG. That is incorrect, as any read/write to a block
device is also a bounded operation. Fix it up and ensure that S_ISBLK
isn't marked unbounded.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-01 08:56:28 -06:00
Tetsuo Handa
5e46d1b78a reiserfs: update reiserfs_xattrs_initialized() condition
syzbot is reporting NULL pointer dereference at reiserfs_security_init()
[1], for commit ab17c4f021 ("reiserfs: fixup xattr_root caching")
is assuming that REISERFS_SB(s)->xattr_root != NULL in
reiserfs_xattr_jcreate_nblocks() despite that commit made
REISERFS_SB(sb)->priv_root != NULL && REISERFS_SB(s)->xattr_root == NULL
case possible.

I guess that commit 6cb4aff0a7 ("reiserfs: fix oops while creating
privroot with selinux enabled") wanted to check xattr_root != NULL
before reiserfs_xattr_jcreate_nblocks(), for the changelog is talking
about the xattr root.

  The issue is that while creating the privroot during mount
  reiserfs_security_init calls reiserfs_xattr_jcreate_nblocks which
  dereferences the xattr root. The xattr root doesn't exist, so we get
  an oops.

Therefore, update reiserfs_xattrs_initialized() to check both the
privroot and the xattr root.

Link: https://syzkaller.appspot.com/bug?id=8abaedbdeb32c861dc5340544284167dd0e46cde # [1]
Reported-and-tested-by: syzbot <syzbot+690cb1e51970435f9775@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 6cb4aff0a7 ("reiserfs: fix oops while creating privroot with selinux enabled")
Acked-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Jan Kara <jack@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-30 14:27:32 -07:00
Jens Axboe
82734c5b1b io_uring: drop sqd lock before handling signals for SQPOLL
Don't call into get_signal() with the sqd mutex held, it'll fail if we're
freezing the task and we'll get complaints on locks still being held:

====================================
WARNING: iou-sqp-8386/8387 still has locks held!
5.12.0-rc4-syzkaller #0 Not tainted
------------------------------------
1 lock held by iou-sqp-8386/8387:
 #0: ffff88801e1d2470 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread+0x24c/0x13a0 fs/io_uring.c:6731

 stack backtrace:
 CPU: 1 PID: 8387 Comm: iou-sqp-8386 Not tainted 5.12.0-rc4-syzkaller #0
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:79 [inline]
  dump_stack+0x141/0x1d7 lib/dump_stack.c:120
  try_to_freeze include/linux/freezer.h:66 [inline]
  get_signal+0x171a/0x2150 kernel/signal.c:2576
  io_sq_thread+0x8d2/0x13a0 fs/io_uring.c:6748

Fold the get_signal() case in with the parking checks, as we need to drop
the lock in both cases, and since we need to be checking for parking when
juggling the lock anyway.

Reported-by: syzbot+796d767eb376810256f5@syzkaller.appspotmail.com
Fixes: dbe1bdbb39 ("io_uring: handle signals for IO threads like a normal thread")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-30 14:36:46 -06:00
Pavel Begunkov
51520426f4 io_uring: handle setup-failed ctx in kill_timeouts
general protection fault, probably for non-canonical address
	0xdffffc0000000018: 0000 [#1] KASAN: null-ptr-deref
	in range [0x00000000000000c0-0x00000000000000c7]
RIP: 0010:io_commit_cqring+0x37f/0xc10 fs/io_uring.c:1318
Call Trace:
 io_kill_timeouts+0x2b5/0x320 fs/io_uring.c:8606
 io_ring_ctx_wait_and_kill+0x1da/0x400 fs/io_uring.c:8629
 io_uring_create fs/io_uring.c:9572 [inline]
 io_uring_setup+0x10da/0x2ae0 fs/io_uring.c:9599
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae

It can get into wait_and_kill() before setting up ctx->rings, and hence
io_commit_cqring() fails. Mimic poll cancel and do it only when we
completed events, there can't be any requests if it failed before
initialising rings.

Fixes: 80c4cbdb5e ("io_uring: do post-completion chore on t-out cancel")
Reported-by: syzbot+0e905eb8228070c457a0@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/660261a48f0e7abf260c8e43c87edab3c16736fa.1617014345.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-29 06:48:26 -06:00
Pavel Begunkov
5a978dcfc0 io_uring: always go for cancellation spin on exec
Always try to do cancellation in __io_uring_task_cancel() at least once,
so it actually goes and cleans its sqpoll tasks (i.e. via
io_sqpoll_cancel_sync()), otherwise sqpoll task may submit new requests
after cancellation and it's racy for many reasons.

Fixes: 521d6a737a ("io_uring: cancel sqpoll via task_work")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0a21bd6d794bb1629bc906dd57a57b2c2985a8ac.1616839147.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-28 18:11:53 -06:00
Linus Torvalds
81b1d39fd3 5 cifs/smb3 fixes, 2 for stable, includes an important fix for encryption and an ACL fix, as well as a fix for possible reflink data corruption
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmBfx9sACgkQiiy9cAdy
 T1F/igv8DsHxOJLw9kc5pBrbmUEgsUpQbdRMESqEROyqKte80jga2P3wsvJQYqQY
 JwHPxh477eiRSpSkEWSFDMmELsVtoQIYv3aqgPe79668eCd97mHRM2ItSV++5x9M
 iJ0N8GuiVARSyKmndrZ9gvbPoJb4TKkPX6X44pDSgAkgskvTTFKTywZaY5IEiqKe
 9zBWghZbNnWhtYG+2On2M2tzy8/Fo8aveLxhFhJstZ0IP6Px+Rg9GMdzRAfDJqK+
 QQMwcqmRKjVo4/Z6yji/s9OI1+eQyIAKLa6cyB0Yd+AqnvDYv1dagkRAjRCHl/Ri
 28loxGatXeXjXJGYU58EjNkKdoBUh09idJJolcMGwPSteL2j1DQDV9utbZLhAWPq
 yNugiIkzbQj3Z55UQ3n3u79pztK31GZ2TOcwJbIqQs3tctJ5aqUIWjQibLVpaNBR
 7C5Yug9aC5gpr3LPIUD3AGZIUAenCzsVN5Y9br4SPx0/zHmbynyyF27w14shX8O/
 3uQr6xhl
 =NxLV
 -----END PGP SIGNATURE-----

Merge tag '5.12-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Five cifs/smb3 fixes, two for stable.

  Includes an important fix for encryption and an ACL fix, as well as a
  fix for possible reflink data corruption"

* tag '5.12-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: fix cached file size problems in duplicate extents (reflink)
  cifs: Silently ignore unknown oplock break handle
  cifs: revalidate mapping when we open files for SMB1 POSIX
  cifs: Fix chmod with modefromsid when an older ACE already exists.
  cifs: Adjust key sizes and key generation routines for AES256 encryption
2021-03-28 12:06:21 -07:00
Linus Torvalds
b44d1ddcf8 io_uring-5.12-2021-03-27
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmBf1KAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjVSD/0f1HdekXnIE6aSRQ7YEV8ux2t5wUeDyP8U
 cdcZ8fBW9PvKZLdODSI4sw8UYV5OYEBcfImFe3nRVHR+RIVQo72UTYvuHqeUYNct
 w3drgF2GEMIxJFZR6zf9LDrQVduPqXvbEJLui6TN+eX/5E99ZlUWMLwkX1k+vDju
 QfaGZjz2736GTn1MPc7jdyZKoK7eCi5xtNFPash5wGck7aYl5TGXnG/8bRYsv2Tw
 eCYKbvv4x0s8OFcYVQMooDfbIMCyyfTwt6YatFHQEtM/RM+M66gndvv3jfkeJQju
 hz0I8qOJ8X5lf0VucncWs5J8b9Whr5YZV+k9461xalBbV9ed2vzIIikP8DpCxtYz
 yKbsdDm0+3hwfuZOz+d7ooEXKsphJ1PnSsEeuNZXtKDXVtphksUbbq4H2NLINcsQ
 m6dwaRPSEA0EymngGY2e+8+CU0euiE4mqoMpw4D9m9Irs+BAaWYGk9xCWr0BGem0
 auZOMqvV2xktdBlGx1BJCLts1sHHxy8IM3u0852R/1AfcKOkXwNVPt62I8e9ceIA
 wc731aWHwJfS25m430xFDPJKJpUZoZgste4qwVym70CmRziuamgYyIfrfRg1ZjsD
 ZBa9Z4hPiT4e0eDqlYjcMpl9FORgYQXVXy5ofd/eZg5xkU8X+i6TVZkaQNkZyqV/
 4ogBZYUolg==
 =mwLC
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.12-2021-03-27' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:

 - Use thread info versions of flag testing, as discussed last week.

 - The series enabling PF_IO_WORKER to just take signals, instead of
   needing to special case that they do not in a bunch of places. Ends
   up being pretty trivial to do, and then we can revert all the special
   casing we're currently doing.

 - Kill dead pointer assignment

 - Fix hashed part of async work queue trace

 - Fix sign extension issue for IORING_OP_PROVIDE_BUFFERS

 - Fix a link completion ordering regression in this merge window

 - Cancellation fixes

* tag 'io_uring-5.12-2021-03-27' of git://git.kernel.dk/linux-block:
  io_uring: remove unsued assignment to pointer io
  io_uring: don't cancel extra on files match
  io_uring: don't cancel-track common timeouts
  io_uring: do post-completion chore on t-out cancel
  io_uring: fix timeout cancel return code
  Revert "signal: don't allow STOP on PF_IO_WORKER threads"
  Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing"
  Revert "kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals"
  Revert "signal: don't allow sending any signals to PF_IO_WORKER threads"
  kernel: stop masking signals in create_io_thread()
  io_uring: handle signals for IO threads like a normal thread
  kernel: don't call do_exit() for PF_IO_WORKER threads
  io_uring: maintain CQE order of a failed link
  io-wq: fix race around pending work on teardown
  io_uring: do ctx sqd ejection in a clear context
  io_uring: fix provide_buffers sign extension
  io_uring: don't skip file_end_write() on reissue
  io_uring: correct io_queue_async_work() traces
  io_uring: don't use {test,clear}_tsk_thread_flag() for current
2021-03-28 11:42:05 -07:00
Linus Torvalds
abed516ecd block-5.12-2021-03-27
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmBf1YoQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprivEADZx//LFwziicjD3Nd5XcLfMeE1su6+CULD
 SkGh8MALlB3/smeYa+gG5tb5U8l+7Xk62pDWXsRZj+Ckw/FDGql4qve6uSDqAIBz
 6W6PKjHLY81E8nVe/WHcvhQrxE8E9yZg/Hrg4FWLpcLbmJTt709Cm+FciHP8BAsR
 iv3gkBreMRrt9Xlfimn4XCsGaqbXg2Xx8AhaJBshhhjIXvirvB8ctNZvguNX4KFl
 ob+KTO1p26mTFxHLiaJt1fNJzj21XdMrT27FMPqylBF5s1Xr4U9plZHgTX6KMx3o
 BZx1QFTGiskgdKhR01AgzM4ASIWZAUDfpRgABfyWdqHTwqeJyHbcJ+emRpiGCyER
 Og3ar2m75WUA8+Pfgl9TusnNTCiRVYBAcMZGpGEbGKZt+cyCq2Ed161e2I7NPOxR
 c60/j4KHq3uBXh1FhNRX1Y9ZUiK031RqGhBCABeM0bnxImyEo96L3VXJ72RZOvjZ
 1lo9U35q7B6AaFlAesYH4/WaPIExy3RObVHUVtXokzcm4RFh9eycuxPdGc+HDZ04
 h8t6KaAKTtBadIIMWvz34SNykqM4Q0xcHrt8Wz+1C3FZfgc7rkQpVBZLjhk5fx8h
 33KeuMrATAFGvv9d0tbARbIXqXaFGwcc7Z0sSfVnzRfFM/aPa5xnIfGmbxoT5gH8
 v/6ySA3EWA==
 =ZaB3
 -----END PGP SIGNATURE-----

Merge tag 'block-5.12-2021-03-27' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Fix regression from this merge window with the xarray partition
   change, which allowed partition counts that overflow the u8 that
   holds the partition number (Ming)

 - Fix zone append warning (Johannes)

 - Segmentation count fix for multipage bvecs (David)

 - Partition scan fix (Chris)

* tag 'block-5.12-2021-03-27' of git://git.kernel.dk/linux-block:
  block: don't create too many partitions
  block: support zone append bvecs
  block: recalculate segment count for multi-segment discards correctly
  block: clear GD_NEED_PART_SCAN later in bdev_disk_changed
2021-03-28 11:37:42 -07:00
Colin Ian King
2b8ed1c941 io_uring: remove unsued assignment to pointer io
There is an assignment to io that is never read after the assignment,
the assignment is redundant and can be removed.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27 14:09:11 -06:00
Pavel Begunkov
78d9d7c2a3 io_uring: don't cancel extra on files match
As tasks always wait and kill their io-wq on exec/exit, files are of no
more concern to us, so we don't need to specifically cancel them by hand
in those cases. Moreover we should not, because io_match_task() looks at
req->task->files now, which is always true and so leads to extra
cancellations, that wasn't a case before per-task io-wq.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0566c1de9b9dd417f5de345c817ca953580e0e2e.1616696997.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27 14:09:11 -06:00
Pavel Begunkov
2482b58ffb io_uring: don't cancel-track common timeouts
Don't account usual timeouts (i.e. not linked) as REQ_F_INFLIGHT but
keep behaviour prior to dd59a3d595 ("io_uring: reliably cancel linked
timeouts").

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/104441ef5d97e3932113d44501fda0df88656b83.1616696997.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27 14:09:11 -06:00
Pavel Begunkov
80c4cbdb5e io_uring: do post-completion chore on t-out cancel
Don't forget about io_commit_cqring() + io_cqring_ev_posted() after
exit/exec cancelling timeouts. Both functions declared only after
io_kill_timeouts(), so to avoid tons of forward declarations move
it down.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/72ace588772c0f14834a6a4185d56c445a366fb4.1616696997.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27 14:09:11 -06:00
Pavel Begunkov
1ee4160c73 io_uring: fix timeout cancel return code
When we cancel a timeout we should emit a sensible return code, like
-ECANCELED but not 0, otherwise it may trick users.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7b0ad1065e3bd1994722702bd0ba9e7bc9b0683b.1616696997.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27 14:09:11 -06:00
Jens Axboe
dbe1bdbb39 io_uring: handle signals for IO threads like a normal thread
We go through various hoops to disallow signals for the IO threads, but
there's really no reason why we cannot just allow them. The IO threads
never return to userspace like a normal thread, and hence don't go through
normal signal processing. Instead, just check for a pending signal as part
of the work loop, and call get_signal() to handle it for us if anything
is pending.

With that, we can support receiving signals, including special ones like
SIGSTOP.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27 14:09:07 -06:00
Steve French
cfc63fc812 smb3: fix cached file size problems in duplicate extents (reflink)
There were two problems (one of which could cause data corruption)
that were noticed with duplicate extents (ie reflink)
when debugging why various xfstests were being incorrectly skipped
(e.g. generic/138, generic/140, generic/142). First, we were not
updating the file size locally in the cache when extending a
file due to reflink (it would refresh after actimeo expires)
but xfstest was checking the size immediately which was still
0 so caused the test to be skipped.  Second, we were setting
the target file size (which could shrink the file) in all cases
to the end of the reflinked range rather than only setting the
target file size when reflink would extend the file.

CC: <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-26 18:41:55 -05:00
Vincent Whitchurch
219481a8f9 cifs: Silently ignore unknown oplock break handle
Make SMB2 not print out an error when an oplock break is received for an
unknown handle, similar to SMB1.  The debug message which is printed for
these unknown handles may also be misleading, so fix that too.

The SMB2 lease break path is not affected by this patch.

Without this, a program which writes to a file from one thread, and
opens, reads, and writes the same file from another thread triggers the
below errors several times a minute when run against a Samba server
configured with "smb2 leases = no".

 CIFS: VFS: \\192.168.0.1 No task to wake, unknown frame received! NumMids 2
 00000000: 424d53fe 00000040 00000000 00000012  .SMB@...........
 00000010: 00000001 00000000 ffffffff ffffffff  ................
 00000020: 00000000 00000000 00000000 00000000  ................
 00000030: 00000000 00000000 00000000 00000000  ................

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Reviewed-by: Tom Talpey <tom@talpey.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-26 18:05:26 -05:00
Ronnie Sahlberg
cee8f4f6fc cifs: revalidate mapping when we open files for SMB1 POSIX
RHBZ: 1933527

Under SMB1 + POSIX, if an inode is reused on a server after we have read and
cached a part of a file, when we then open the new file with the
re-cycled inode there is a chance that we may serve the old data out of cache
to the application.
This only happens for SMB1 (deprecated) and when posix are used.
The simplest solution to avoid this race is to force a revalidate
on smb1-posix open.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-26 18:04:58 -05:00
Shyam Prasad N
3bffbe9e0b cifs: Fix chmod with modefromsid when an older ACE already exists.
My recent fixes to cifsacl to maintain inherited ACEs had
regressed modefromsid when an older ACL already exists.

Found testing xfstest 495 with modefromsid mount option

Fixes: f506550889 ("cifs: Retain old ACEs when converting between mode bits and ACL")

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-03-26 18:04:35 -05:00