7057572745
73956 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
|
7057572745 |
ubifs: rename_whiteout: correct old_dir size computing
When renaming the whiteout file, the old whiteout file is not deleted.
Therefore, we add the old dentry size to the old dir like XFS.
Otherwise, an error may be reported due to `fscki->calc_sz != fscki->size`
in check_indes.
Fixes:
|
||
|
9cdd312887 |
jffs2: fix memory leak in jffs2_scan_medium
If an error is returned in jffs2_scan_eraseblock() and some memory
has been added to the jffs2_summary *s, we can observe the following
kmemleak report:
--------------------------------------------
unreferenced object 0xffff88812b889c40 (size 64):
comm "mount", pid 692, jiffies 4294838325 (age 34.288s)
hex dump (first 32 bytes):
40 48 b5 14 81 88 ff ff 01 e0 31 00 00 00 50 00 @H........1...P.
00 00 01 00 00 00 01 00 00 00 02 00 00 00 09 08 ................
backtrace:
[<ffffffffae93a3a3>] __kmalloc+0x613/0x910
[<ffffffffaf423b9c>] jffs2_sum_add_dirent_mem+0x5c/0xa0
[<ffffffffb0f3afa8>] jffs2_scan_medium.cold+0x36e5/0x4794
[<ffffffffb0f3dbe1>] jffs2_do_mount_fs.cold+0xa7/0x2267
[<ffffffffaf40acf3>] jffs2_do_fill_super+0x383/0xc30
[<ffffffffaf40c00a>] jffs2_fill_super+0x2ea/0x4c0
[<ffffffffb0315d64>] mtd_get_sb+0x254/0x400
[<ffffffffb0315f5f>] mtd_get_sb_by_nr+0x4f/0xd0
[<ffffffffb0316478>] get_tree_mtd+0x498/0x840
[<ffffffffaf40bd15>] jffs2_get_tree+0x25/0x30
[<ffffffffae9f358d>] vfs_get_tree+0x8d/0x2e0
[<ffffffffaea7a98f>] path_mount+0x50f/0x1e50
[<ffffffffaea7c3d7>] do_mount+0x107/0x130
[<ffffffffaea7c5c5>] __se_sys_mount+0x1c5/0x2f0
[<ffffffffaea7c917>] __x64_sys_mount+0xc7/0x160
[<ffffffffb10142f5>] do_syscall_64+0x45/0x70
unreferenced object 0xffff888114b54840 (size 32):
comm "mount", pid 692, jiffies 4294838325 (age 34.288s)
hex dump (first 32 bytes):
c0 75 b5 14 81 88 ff ff 02 e0 02 00 00 00 02 00 .u..............
00 00 84 00 00 00 44 00 00 00 6b 6b 6b 6b 6b a5 ......D...kkkkk.
backtrace:
[<ffffffffae93be24>] kmem_cache_alloc_trace+0x584/0x880
[<ffffffffaf423b04>] jffs2_sum_add_inode_mem+0x54/0x90
[<ffffffffb0f3bd44>] jffs2_scan_medium.cold+0x4481/0x4794
[...]
unreferenced object 0xffff888114b57280 (size 32):
comm "mount", pid 692, jiffies 4294838393 (age 34.357s)
hex dump (first 32 bytes):
10 d5 6c 11 81 88 ff ff 08 e0 05 00 00 00 01 00 ..l.............
00 00 38 02 00 00 28 00 00 00 6b 6b 6b 6b 6b a5 ..8...(...kkkkk.
backtrace:
[<ffffffffae93be24>] kmem_cache_alloc_trace+0x584/0x880
[<ffffffffaf423c34>] jffs2_sum_add_xattr_mem+0x54/0x90
[<ffffffffb0f3a24f>] jffs2_scan_medium.cold+0x298c/0x4794
[...]
unreferenced object 0xffff8881116cd510 (size 16):
comm "mount", pid 692, jiffies 4294838395 (age 34.355s)
hex dump (first 16 bytes):
00 00 00 00 00 00 00 00 09 e0 60 02 00 00 6b a5 ..........`...k.
backtrace:
[<ffffffffae93be24>] kmem_cache_alloc_trace+0x584/0x880
[<ffffffffaf423cc4>] jffs2_sum_add_xref_mem+0x54/0x90
[<ffffffffb0f3b2e3>] jffs2_scan_medium.cold+0x3a20/0x4794
[...]
--------------------------------------------
Therefore, we should call jffs2_sum_reset_collected(s) on exit to
release the memory added in s. In addition, a new tag "out_buf" is
added to prevent the NULL pointer reference caused by s being NULL.
(thanks to Zhang Yi for this analysis)
Fixes:
|
||
|
d051cef784 |
jffs2: fix memory leak in jffs2_do_mount_fs
If jffs2_build_filesystem() in jffs2_do_mount_fs() returns an error,
we can observe the following kmemleak report:
--------------------------------------------
unreferenced object 0xffff88811b25a640 (size 64):
comm "mount", pid 691, jiffies 4294957728 (age 71.952s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffffa493be24>] kmem_cache_alloc_trace+0x584/0x880
[<ffffffffa5423a06>] jffs2_sum_init+0x86/0x130
[<ffffffffa5400e58>] jffs2_do_mount_fs+0x798/0xac0
[<ffffffffa540acf3>] jffs2_do_fill_super+0x383/0xc30
[<ffffffffa540c00a>] jffs2_fill_super+0x2ea/0x4c0
[...]
unreferenced object 0xffff88812c760000 (size 65536):
comm "mount", pid 691, jiffies 4294957728 (age 71.952s)
hex dump (first 32 bytes):
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
backtrace:
[<ffffffffa493a449>] __kmalloc+0x6b9/0x910
[<ffffffffa5423a57>] jffs2_sum_init+0xd7/0x130
[<ffffffffa5400e58>] jffs2_do_mount_fs+0x798/0xac0
[<ffffffffa540acf3>] jffs2_do_fill_super+0x383/0xc30
[<ffffffffa540c00a>] jffs2_fill_super+0x2ea/0x4c0
[...]
--------------------------------------------
This is because the resources allocated in jffs2_sum_init() are not
released. Call jffs2_sum_exit() to release these resources to solve
the problem.
Fixes:
|
||
|
4c7c44ee16 |
jffs2: fix use-after-free in jffs2_clear_xattr_subsystem
When we mount a jffs2 image, assume that the first few blocks of
the image are normal and contain at least one xattr-related inode,
but the next block is abnormal. As a result, an error is returned
in jffs2_scan_eraseblock(). jffs2_clear_xattr_subsystem() is then
called in jffs2_build_filesystem() and then again in
jffs2_do_fill_super().
Finally we can observe the following report:
==================================================================
BUG: KASAN: use-after-free in jffs2_clear_xattr_subsystem+0x95/0x6ac
Read of size 8 at addr ffff8881243384e0 by task mount/719
Call Trace:
dump_stack+0x115/0x16b
jffs2_clear_xattr_subsystem+0x95/0x6ac
jffs2_do_fill_super+0x84f/0xc30
jffs2_fill_super+0x2ea/0x4c0
mtd_get_sb+0x254/0x400
mtd_get_sb_by_nr+0x4f/0xd0
get_tree_mtd+0x498/0x840
jffs2_get_tree+0x25/0x30
vfs_get_tree+0x8d/0x2e0
path_mount+0x50f/0x1e50
do_mount+0x107/0x130
__se_sys_mount+0x1c5/0x2f0
__x64_sys_mount+0xc7/0x160
do_syscall_64+0x45/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Allocated by task 719:
kasan_save_stack+0x23/0x60
__kasan_kmalloc.constprop.0+0x10b/0x120
kasan_slab_alloc+0x12/0x20
kmem_cache_alloc+0x1c0/0x870
jffs2_alloc_xattr_ref+0x2f/0xa0
jffs2_scan_medium.cold+0x3713/0x4794
jffs2_do_mount_fs.cold+0xa7/0x2253
jffs2_do_fill_super+0x383/0xc30
jffs2_fill_super+0x2ea/0x4c0
[...]
Freed by task 719:
kmem_cache_free+0xcc/0x7b0
jffs2_free_xattr_ref+0x78/0x98
jffs2_clear_xattr_subsystem+0xa1/0x6ac
jffs2_do_mount_fs.cold+0x5e6/0x2253
jffs2_do_fill_super+0x383/0xc30
jffs2_fill_super+0x2ea/0x4c0
[...]
The buggy address belongs to the object at ffff8881243384b8
which belongs to the cache jffs2_xattr_ref of size 48
The buggy address is located 40 bytes inside of
48-byte region [ffff8881243384b8, ffff8881243384e8)
[...]
==================================================================
The triggering of the BUG is shown in the following stack:
-----------------------------------------------------------
jffs2_fill_super
jffs2_do_fill_super
jffs2_do_mount_fs
jffs2_build_filesystem
jffs2_scan_medium
jffs2_scan_eraseblock <--- ERROR
jffs2_clear_xattr_subsystem <--- free
jffs2_clear_xattr_subsystem <--- free again
-----------------------------------------------------------
An error is returned in jffs2_do_mount_fs(). If the error is returned
by jffs2_sum_init(), the jffs2_clear_xattr_subsystem() does not need to
be executed. If the error is returned by jffs2_build_filesystem(), the
jffs2_clear_xattr_subsystem() also does not need to be executed again.
So move jffs2_clear_xattr_subsystem() from 'out_inohash' to 'out_root'
to fix this UAF problem.
Fixes:
|
||
|
163b438b51 |
fs/jffs2: fix comments mentioning i_mutex
inode->i_mutex has been replaced with inode->i_rwsem long ago. Fix comments still mentioning i_mutex. Signed-off-by: hongnanli <hongnan.li@linux.alibaba.com> Signed-off-by: Richard Weinberger <richard@nod.at> |
||
|
3b67db8a6c |
ubifs: Fix to add refcount once page is set private
MM defined the rule [1] very clearly that once page was set with PG_private
flag, we should increment the refcount in that page, also main flows like
pageout(), migrate_page() will assume there is one additional page
reference count if page_has_private() returns true. Otherwise, we may
get a BUG in page migration:
page:0000000080d05b9d refcount:-1 mapcount:0 mapping:000000005f4d82a8
index:0xe2 pfn:0x14c12
aops:ubifs_file_address_operations [ubifs] ino:8f1 dentry name:"f30e"
flags: 0x1fffff80002405(locked|uptodate|owner_priv_1|private|node=0|
zone=1|lastcpupid=0x1fffff)
page dumped because: VM_BUG_ON_PAGE(page_count(page) != 0)
------------[ cut here ]------------
kernel BUG at include/linux/page_ref.h:184!
invalid opcode: 0000 [#1] SMP
CPU: 3 PID: 38 Comm: kcompactd0 Not tainted 5.15.0-rc5
RIP: 0010:migrate_page_move_mapping+0xac3/0xe70
Call Trace:
ubifs_migrate_page+0x22/0xc0 [ubifs]
move_to_new_page+0xb4/0x600
migrate_pages+0x1523/0x1cc0
compact_zone+0x8c5/0x14b0
kcompactd+0x2bc/0x560
kthread+0x18c/0x1e0
ret_from_fork+0x1f/0x30
Before the time, we should make clean a concept, what does refcount means
in page gotten from grab_cache_page_write_begin(). There are 2 situations:
Situation 1: refcount is 3, page is created by __page_cache_alloc.
TYPE_A - the write process is using this page
TYPE_B - page is assigned to one certain mapping by calling
__add_to_page_cache_locked()
TYPE_C - page is added into pagevec list corresponding current cpu by
calling lru_cache_add()
Situation 2: refcount is 2, page is gotten from the mapping's tree
TYPE_B - page has been assigned to one certain mapping
TYPE_A - the write process is using this page (by calling
page_cache_get_speculative())
Filesystem releases one refcount by calling put_page() in xxx_write_end(),
the released refcount corresponds to TYPE_A (write task is using it). If
there are any processes using a page, page migration process will skip the
page by judging whether expected_page_refs() equals to page refcount.
The BUG is caused by following process:
PA(cpu 0) kcompactd(cpu 1)
compact_zone
ubifs_write_begin
page_a = grab_cache_page_write_begin
add_to_page_cache_lru
lru_cache_add
pagevec_add // put page into cpu 0's pagevec
(refcnf = 3, for page creation process)
ubifs_write_end
SetPagePrivate(page_a) // doesn't increase page count !
unlock_page(page_a)
put_page(page_a) // refcnt = 2
[...]
PB(cpu 0)
filemap_read
filemap_get_pages
add_to_page_cache_lru
lru_cache_add
__pagevec_lru_add // traverse all pages in cpu 0's pagevec
__pagevec_lru_add_fn
SetPageLRU(page_a)
isolate_migratepages
isolate_migratepages_block
get_page_unless_zero(page_a)
// refcnt = 3
list_add(page_a, from_list)
migrate_pages(from_list)
__unmap_and_move
move_to_new_page
ubifs_migrate_page(page_a)
migrate_page_move_mapping
expected_page_refs get 3
(migration[1] + mapping[1] + private[1])
release_pages
put_page_testzero(page_a) // refcnt = 3
page_ref_freeze // refcnt = 0
page_ref_dec_and_test(0 - 1 = -1)
page_ref_unfreeze
VM_BUG_ON_PAGE(-1 != 0, page)
UBIFS doesn't increase the page refcount after setting private flag, which
leads to page migration task believes the page is not used by any other
processes, so the page is migrated. This causes concurrent accessing on
page refcount between put_page() called by other process(eg. read process
calls lru_cache_add) and page_ref_unfreeze() called by migration task.
Actually zhangjun has tried to fix this problem [2] by recalculating page
refcnt in ubifs_migrate_page(). It's better to follow MM rules [1], because
just like Kirill suggested in [2], we need to check all users of
page_has_private() helper. Like f2fs does in [3], fix it by adding/deleting
refcount when setting/clearing private for a page. BTW, according to [4],
we set 'page->private' as 1 because ubifs just simply SetPagePrivate().
And, [5] provided a common helper to set/clear page private, ubifs can
use this helper following the example of iomap, afs, btrfs, etc.
Jump [6] to find a reproducer.
[1] https://lore.kernel.org/lkml/2b19b3c4-2bc4-15fa-15cc-27a13e5c7af1@aol.com
[2] https://www.spinics.net/lists/linux-mtd/msg04018.html
[3] http://lkml.iu.edu/hypermail/linux/kernel/1903.0/03313.html
[4] https://lore.kernel.org/linux-f2fs-devel/20210422154705.GO3596236@casper.infradead.org
[5] https://lore.kernel.org/all/20200517214718.468-1-guoqing.jiang@cloud.ionos.com
[6] https://bugzilla.kernel.org/show_bug.cgi?id=214961
Fixes:
|
||
|
4f2262a334 |
ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock()
Function ubifs_wbuf_write_nolock() may access buf out of bounds in
following process:
ubifs_wbuf_write_nolock():
aligned_len = ALIGN(len, 8); // Assume len = 4089, aligned_len = 4096
if (aligned_len <= wbuf->avail) ... // Not satisfy
if (wbuf->used) {
ubifs_leb_write() // Fill some data in avail wbuf
len -= wbuf->avail; // len is still not 8-bytes aligned
aligned_len -= wbuf->avail;
}
n = aligned_len >> c->max_write_shift;
if (n) {
n <<= c->max_write_shift;
err = ubifs_leb_write(c, wbuf->lnum, buf + written,
wbuf->offs, n);
// n > len, read out of bounds less than 8(n-len) bytes
}
, which can be catched by KASAN:
=========================================================
BUG: KASAN: slab-out-of-bounds in ecc_sw_hamming_calculate+0x1dc/0x7d0
Read of size 4 at addr ffff888105594ff8 by task kworker/u8:4/128
Workqueue: writeback wb_workfn (flush-ubifs_0_0)
Call Trace:
kasan_report.cold+0x81/0x165
nand_write_page_swecc+0xa9/0x160
ubifs_leb_write+0xf2/0x1b0 [ubifs]
ubifs_wbuf_write_nolock+0x421/0x12c0 [ubifs]
write_head+0xdc/0x1c0 [ubifs]
ubifs_jnl_write_inode+0x627/0x960 [ubifs]
wb_workfn+0x8af/0xb80
Function ubifs_wbuf_write_nolock() accepts that parameter 'len' is not 8
bytes aligned, the 'len' represents the true length of buf (which is
allocated in 'ubifs_jnl_xxx', eg. ubifs_jnl_write_inode), so
ubifs_wbuf_write_nolock() must handle the length read from 'buf' carefully
to write leb safely.
Fetch a reproducer in [Link].
Fixes:
|
||
|
1b83ec057d |
ubifs: setflags: Make dirtied_ino_d 8 bytes aligned
Make 'ui->data_len' aligned with 8 bytes before it is assigned to
dirtied_ino_d. Since 8871d84c8f8b0c6b("ubifs: convert to fileattr")
applied, 'setflags()' only affects regular files and directories, only
xattr inode, symlink inode and special inode(pipe/char_dev/block_dev)
have none- zero 'ui->data_len' field, so assertion
'!(req->dirtied_ino_d & 7)' cannot fail in ubifs_budget_space().
To avoid assertion fails in future evolution(eg. setflags can operate
special inodes), it's better to make dirtied_ino_d 8 bytes aligned,
after all aligned size is still zero for regular files.
Fixes:
|
||
|
a6dab6607d |
ubifs: Rectify space amount budget for mkdir/tmpfile operations
UBIFS should make sure the flash has enough space to store dirty (Data that is newer than disk) data (in memory), space budget is exactly designed to do that. If space budget calculates less data than we need, 'make_reservation()' will do more work(return -ENOSPC if no free space lelf, sometimes we can see "cannot reserve xxx bytes in jhead xxx, error -28" in ubifs error messages) with ubifs inodes locked, which may effect other syscalls. A simple way to decide how much space do we need when make a budget: See how much space is needed by 'make_reservation()' in ubifs_jnl_xxx() function according to corresponding operation. It's better to report ENOSPC in ubifs_budget_space(), as early as we can. Fixes: |
||
|
60eb3b9c9f |
ubifs: Fix 'ui->dirty' race between do_tmpfile() and writeback work
'ui->dirty' is not protected by 'ui_mutex' in function do_tmpfile() which
may race with ubifs_write_inode[wb_workfn] to access/update 'ui->dirty',
finally dirty space is released twice.
open(O_TMPFILE) wb_workfn
do_tmpfile
ubifs_budget_space(ino_req = { .dirtied_ino = 1})
d_tmpfile // mark inode(tmpfile) dirty
ubifs_jnl_update // without holding tmpfile's ui_mutex
mark_inode_clean(ui)
if (ui->dirty)
ubifs_release_dirty_inode_budget(ui) // release first time
ubifs_write_inode
mutex_lock(&ui->ui_mutex)
ubifs_release_dirty_inode_budget(ui)
// release second time
mutex_unlock(&ui->ui_mutex)
ui->dirty = 0
Run generic/476 can reproduce following message easily
(See reproducer in [Link]):
UBIFS error (ubi0:0 pid 2578): ubifs_assert_failed [ubifs]: UBIFS assert
failed: c->bi.dd_growth >= 0, in fs/ubifs/budget.c:554
UBIFS warning (ubi0:0 pid 2578): ubifs_ro_mode [ubifs]: switched to
read-only mode, error -22
Workqueue: writeback wb_workfn (flush-ubifs_0_0)
Call Trace:
ubifs_ro_mode+0x54/0x60 [ubifs]
ubifs_assert_failed+0x4b/0x80 [ubifs]
ubifs_release_budget+0x468/0x5a0 [ubifs]
ubifs_release_dirty_inode_budget+0x53/0x80 [ubifs]
ubifs_write_inode+0x121/0x1f0 [ubifs]
...
wb_workfn+0x283/0x7b0
Fix it by holding tmpfile ubifs inode lock during ubifs_jnl_update().
Similar problem exists in whiteout renaming, but previous fix("ubifs:
Rename whiteout atomically") has solved the problem.
Fixes:
|
||
|
278d9a2436 |
ubifs: Rename whiteout atomically
Currently, rename whiteout has 3 steps:
1. create tmpfile(which associates old dentry to tmpfile inode) for
whiteout, and store tmpfile to disk
2. link whiteout, associate whiteout inode to old dentry agagin and
store old dentry, old inode, new dentry on disk
3. writeback dirty whiteout inode to disk
Suddenly power-cut or error occurring(eg. ENOSPC returned by budget,
memory allocation failure) during above steps may cause kinds of problems:
Problem 1: ENOSPC returned by whiteout space budget (before step 2),
old dentry will disappear after rename syscall, whiteout file
cannot be found either.
ls dir // we get file, whiteout
rename(dir/file, dir/whiteout, REANME_WHITEOUT)
ENOSPC = ubifs_budget_space(&wht_req) // return
ls dir // empty (no file, no whiteout)
Problem 2: Power-cut happens before step 3, whiteout inode with 'nlink=1'
is not stored on disk, whiteout dentry(old dentry) is written
on disk, whiteout file is lost on next mount (We get "dead
directory entry" after executing 'ls -l' on whiteout file).
Now, we use following 3 steps to finish rename whiteout:
1. create an in-mem inode with 'nlink = 1' as whiteout
2. ubifs_jnl_rename (Write on disk to finish associating old dentry to
whiteout inode, associating new dentry with old inode)
3. iput(whiteout)
Rely writing in-mem inode on disk by ubifs_jnl_rename() to finish rename
whiteout, which avoids middle disk state caused by suddenly power-cut
and error occurring.
Fixes:
|
||
|
716b457302 |
ubifs: Add missing iput if do_tmpfile() failed in rename whiteout
whiteout inode should be put when do_tmpfile() failed if inode has been
initialized. Otherwise we will get following warning during umount:
UBIFS error (ubi0:0 pid 1494): ubifs_assert_failed [ubifs]: UBIFS
assert failed: c->bi.dd_growth == 0, in fs/ubifs/super.c:1930
VFS: Busy inodes after unmount of ubifs. Self-destruct in 5 seconds.
Fixes:
|
||
|
7a8884feec |
ubifs: Fix wrong number of inodes locked by ui_mutex in ubifs_inode comment
Since 9ec64962afb1702f75b("ubifs: Implement RENAME_EXCHANGE") and 9e0a1fff8db56eaaebb("ubifs: Implement RENAME_WHITEOUT") are applied, ubifs_rename locks and changes 4 ubifs inodes, correct the comment for ui_mutex in ubifs_inode. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> |
||
|
afd4270480 |
ubifs: Fix deadlock in concurrent rename whiteout and inode writeback
Following hung tasks:
[ 77.028764] task:kworker/u8:4 state:D stack: 0 pid: 132
[ 77.028820] Call Trace:
[ 77.029027] schedule+0x8c/0x1b0
[ 77.029067] mutex_lock+0x50/0x60
[ 77.029074] ubifs_write_inode+0x68/0x1f0 [ubifs]
[ 77.029117] __writeback_single_inode+0x43c/0x570
[ 77.029128] writeback_sb_inodes+0x259/0x740
[ 77.029148] wb_writeback+0x107/0x4d0
[ 77.029163] wb_workfn+0x162/0x7b0
[ 92.390442] task:aa state:D stack: 0 pid: 1506
[ 92.390448] Call Trace:
[ 92.390458] schedule+0x8c/0x1b0
[ 92.390461] wb_wait_for_completion+0x82/0xd0
[ 92.390469] __writeback_inodes_sb_nr+0xb2/0x110
[ 92.390472] writeback_inodes_sb_nr+0x14/0x20
[ 92.390476] ubifs_budget_space+0x705/0xdd0 [ubifs]
[ 92.390503] do_rename.cold+0x7f/0x187 [ubifs]
[ 92.390549] ubifs_rename+0x8b/0x180 [ubifs]
[ 92.390571] vfs_rename+0xdb2/0x1170
[ 92.390580] do_renameat2+0x554/0x770
, are caused by concurrent rename whiteout and inode writeback processes:
rename_whiteout(Thread 1) wb_workfn(Thread2)
ubifs_rename
do_rename
lock_4_inodes (Hold ui_mutex)
ubifs_budget_space
make_free_space
shrink_liability
__writeback_inodes_sb_nr
bdi_split_work_to_wbs (Queue new wb work)
wb_do_writeback(wb work)
__writeback_single_inode
ubifs_write_inode
LOCK(ui_mutex)
↑
wb_wait_for_completion (Wait wb work) <-- deadlock!
Reproducer (Detail program in [Link]):
1. SYS_renameat2("/mp/dir/file", "/mp/dir/whiteout", RENAME_WHITEOUT)
2. Consume out of space before kernel(mdelay) doing budget for whiteout
Fix it by doing whiteout space budget before locking ubifs inodes.
BTW, it also fixes wrong goto tag 'out_release' in whiteout budget
error handling path(It should at least recover dir i_size and unlock
4 ubifs inodes).
Fixes:
|
||
|
40a8f0d5e7 |
ubifs: rename_whiteout: Fix double free for whiteout_ui->data
'whiteout_ui->data' will be freed twice if space budget fail for
rename whiteout operation as following process:
rename_whiteout
dev = kmalloc
whiteout_ui->data = dev
kfree(whiteout_ui->data) // Free first time
iput(whiteout)
ubifs_free_inode
kfree(ui->data) // Double free!
KASAN reports:
==================================================================
BUG: KASAN: double-free or invalid-free in ubifs_free_inode+0x4f/0x70
Call Trace:
kfree+0x117/0x490
ubifs_free_inode+0x4f/0x70 [ubifs]
i_callback+0x30/0x60
rcu_do_batch+0x366/0xac0
__do_softirq+0x133/0x57f
Allocated by task 1506:
kmem_cache_alloc_trace+0x3c2/0x7a0
do_rename+0x9b7/0x1150 [ubifs]
ubifs_rename+0x106/0x1f0 [ubifs]
do_syscall_64+0x35/0x80
Freed by task 1506:
kfree+0x117/0x490
do_rename.cold+0x53/0x8a [ubifs]
ubifs_rename+0x106/0x1f0 [ubifs]
do_syscall_64+0x35/0x80
The buggy address belongs to the object at ffff88810238bed8 which
belongs to the cache kmalloc-8 of size 8
==================================================================
Let ubifs_free_inode() free 'whiteout_ui->data'. BTW, delete unused
assignment 'whiteout_ui->data_len = 0', process 'ubifs_evict_inode()
-> ubifs_jnl_delete_inode() -> ubifs_jnl_write_inode()' doesn't need it
(because 'inc_nlink(whiteout)' won't be excuted by 'goto out_release',
and the nlink of whiteout inode is 0).
Fixes:
|
||
|
aa39cc6757 |
jffs2: GC deadlock reading a page that is used in jffs2_write_begin()
GC task can deadlock in read_cache_page() because it may attempt to release a page that is actually allocated by another task in jffs2_write_begin(). The reason is that in jffs2_write_begin() there is a small window a cache page is allocated for use but not set Uptodate yet. This ends up with a deadlock between two tasks: 1) A task (e.g. file copy) - jffs2_write_begin() locks a cache page - jffs2_write_end() tries to lock "alloc_sem" from jffs2_reserve_space() <-- STUCK 2) GC task (jffs2_gcd_mtd3) - jffs2_garbage_collect_pass() locks "alloc_sem" - try to lock the same cache page in read_cache_page() <-- STUCK So to avoid this deadlock, hold "alloc_sem" in jffs2_write_begin() while reading data in a cache page. Signed-off-by: Kyeong Yoo <kyeong.yoo@alliedtelesis.co.nz> Signed-off-by: Richard Weinberger <richard@nod.at> |
||
|
50cb437325 |
ubifs: read-only if LEB may always be taken in ubifs_garbage_collect
If ubifs_garbage_collect_leb() returns -EAGAIN and ubifs_return_leb returns error, a LEB will always has a "taken" flag. In this case, set the ubifs to read-only to prevent a worse situation. Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> |
||
|
0d76502172 |
ubifs: fix double return leb in ubifs_garbage_collect
If ubifs_garbage_collect_leb() returns -EAGAIN and enters the "out" branch, ubifs_return_leb will execute twice on the same lnum. This can cause data loss in concurrency situations. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> |
||
|
88618feecf |
ubifs: fix slab-out-of-bounds in ubifs_change_lp
Hulk Robot reported a KASAN report about slab-out-of-bounds: ================================================================== BUG: KASAN: slab-out-of-bounds in ubifs_change_lp+0x3a9/0x1390 [ubifs] Read of size 8 at addr ffff888101c961f8 by task fsstress/1068 [...] Call Trace: check_memory_region+0x1c1/0x1e0 ubifs_change_lp+0x3a9/0x1390 [ubifs] ubifs_change_one_lp+0x170/0x220 [ubifs] ubifs_garbage_collect+0x7f9/0xda0 [ubifs] ubifs_budget_space+0xfe4/0x1bd0 [ubifs] ubifs_write_begin+0x528/0x10c0 [ubifs] Allocated by task 1068: kmemdup+0x25/0x50 ubifs_lpt_lookup_dirty+0x372/0xb00 [ubifs] ubifs_update_one_lp+0x46/0x260 [ubifs] ubifs_tnc_end_commit+0x98b/0x1720 [ubifs] do_commit+0x6cb/0x1950 [ubifs] ubifs_run_commit+0x15a/0x2b0 [ubifs] ubifs_budget_space+0x1061/0x1bd0 [ubifs] ubifs_write_begin+0x528/0x10c0 [ubifs] [...] ================================================================== In ubifs_garbage_collect(), if ubifs_find_dirty_leb returns an error, lp is an uninitialized variable. But lp.num might be used in the out branch, which is a random value. If the value is -1 or another value that can pass the check, soob may occur in the ubifs_change_lp() in the following procedure. To solve this problem, we initialize lp.lnum to -1, and then initialize it correctly in ubifs_find_dirty_leb, which is not equal to -1, and ubifs_return_leb is executed only when lp.lnum != -1. if find a retained or indexing LEB and continue to next loop, but break before find another LEB, the "taken" flag of this LEB will be cleaned in ubi_return_lebi(). This bug has also been fixed in this patch. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at> |
||
|
d3de970bcb |
ubifs: fix snprintf() length check
The snprintf() function returns the number of bytes (not including the NUL terminator) which would have been printed if there were enough space. So it can be greater than UBIFS_DFS_DIR_LEN. And actually if it equals UBIFS_DFS_DIR_LEN then that's okay so this check is too strict. Fixes: 9a620291fc01 ("ubifs: Export filesystem error counters") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Richard Weinberger <richard@nod.at> |
||
|
2e3cbf4258 |
ubifs: Export filesystem error counters
Not all ubifs filesystem errors are propagated to userspace. Export bad magic, bad node and crc errors via sysfs. This allows userspace to notice filesystem errors: /sys/fs/ubifs/ubiX_Y/errors_magic /sys/fs/ubifs/ubiX_Y/errors_node /sys/fs/ubifs/ubiX_Y/errors_crc The counters are reset to 0 with a remount. Signed-off-by: Stefan Schaeckeler <sschaeck@cisco.com> Signed-off-by: Richard Weinberger <richard@nod.at> |
||
|
3fea4d9d16 |
ubifs: Error path in ubifs_remount_rw() seems to wrongly free write buffers
it seems freeing the write buffers in the error path of the
ubifs_remount_rw() is wrong. It leads later to a kernel oops like this:
[10016.431274] UBIFS (ubi0:0): start fixing up free space
[10090.810042] UBIFS (ubi0:0): free space fixup complete
[10090.814623] UBIFS error (ubi0:0 pid 512): ubifs_remount_fs: cannot
spawn "ubifs_bgt0_0", error -4
[10101.915108] UBIFS (ubi0:0): background thread "ubifs_bgt0_0" started,
PID 517
[10105.275498] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000030
[10105.284352] Mem abort info:
[10105.287160] ESR = 0x96000006
[10105.290252] EC = 0x25: DABT (current EL), IL = 32 bits
[10105.295592] SET = 0, FnV = 0
[10105.298652] EA = 0, S1PTW = 0
[10105.301848] Data abort info:
[10105.304723] ISV = 0, ISS = 0x00000006
[10105.308573] CM = 0, WnR = 0
[10105.311564] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000f03d1000
[10105.318034] [0000000000000030] pgd=00000000f6cee003,
pud=00000000f4884003, pmd=0000000000000000
[10105.326783] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[10105.332355] Modules linked in: ath10k_pci ath10k_core ath mac80211
libarc4 cfg80211 nvme nvme_core cryptodev(O)
[10105.342468] CPU: 3 PID: 518 Comm: touch Tainted: G O
5.4.3 #1
[10105.349517] Hardware name: HYPEX CPU (DT)
[10105.353525] pstate: 40000005 (nZcv daif -PAN -UAO)
[10105.358324] pc : atomic64_try_cmpxchg_acquire.constprop.22+0x8/0x34
[10105.364596] lr : mutex_lock+0x1c/0x34
[10105.368253] sp : ffff000075633aa0
[10105.371563] x29: ffff000075633aa0 x28: 0000000000000001
[10105.376874] x27: ffff000076fa80c8 x26: 0000000000000004
[10105.382185] x25: 0000000000000030 x24: 0000000000000000
[10105.387495] x23: 0000000000000000 x22: 0000000000000038
[10105.392807] x21: 000000000000000c x20: ffff000076fa80c8
[10105.398119] x19: ffff000076fa8000 x18: 0000000000000000
[10105.403429] x17: 0000000000000000 x16: 0000000000000000
[10105.408741] x15: 0000000000000000 x14: fefefefefefefeff
[10105.414052] x13: 0000000000000000 x12: 0000000000000fe0
[10105.419364] x11: 0000000000000fe0 x10: ffff000076709020
[10105.424675] x9 : 0000000000000000 x8 : 00000000000000a0
[10105.429986] x7 : ffff000076fa80f4 x6 : 0000000000000030
[10105.435297] x5 : 0000000000000000 x4 : 0000000000000000
[10105.440609] x3 : 0000000000000000 x2 : ffff00006f276040
[10105.445920] x1 : ffff000075633ab8 x0 : 0000000000000030
[10105.451232] Call trace:
[10105.453676] atomic64_try_cmpxchg_acquire.constprop.22+0x8/0x34
[10105.459600] ubifs_garbage_collect+0xb4/0x334
[10105.463956] ubifs_budget_space+0x398/0x458
[10105.468139] ubifs_create+0x50/0x180
[10105.471712] path_openat+0x6a0/0x9b0
[10105.475284] do_filp_open+0x34/0x7c
[10105.478771] do_sys_open+0x78/0xe4
[10105.482170] __arm64_sys_openat+0x1c/0x24
[10105.486180] el0_svc_handler+0x84/0xc8
[10105.489928] el0_svc+0x8/0xc
[10105.492808] Code: 52800013 17fffffb d2800003 f9800011 (c85ffc05)
[10105.498903] ---[ end trace 46b721d93267a586 ]---
To reproduce the problem:
1. Filesystem initially mounted read-only, free space fixup flag set.
2. mount -o remount,rw <mountpoint>
3. it takes some time (free space fixup running)
... try to terminate running mount by CTRL-C
... does not respond, only after free space fixup is complete
... then "ubifs_remount_fs: cannot spawn "ubifs_bgt0_0", error -4"
4. mount -o remount,rw <mountpoint>
... now finished instantly (fixup already done).
5. Create file or just unmount the filesystem and we get the oops.
Cc: <stable@vger.kernel.org>
Fixes:
|
||
|
d98c6c35c8 |
ubifs: Make use of the helper macro kthread_run()
Repalce kthread_create/wake_up_process() with kthread_run() to simplify the code. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Signed-off-by: Richard Weinberger <richard@nod.at> |
||
|
7296c8af6a |
ubifs: Fix spelling mistakes
Found with `codespell -i 3 -w fs/ubifs/**` and proof reading that parts. Signed-off-by: Alexander Dahl <ada@thorsis.com> Signed-off-by: Richard Weinberger <richard@nod.at> |
||
|
9273d6cb99 |
two cifs/smb3 fixes, one fscache related, and one mount parsing related for stable
-----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmG9V/0ACgkQiiy9cAdy T1GbQAv+N/2p7mMzhxcB91xlU6DSo2kZhC1FcmiXld3BEiMzkef6Rtb/lzW1AtBP kNgqjVU7MMmdFQNKLhrHUV2BNCtLWKOC2I0xTZqr+nBlGMJkiNUBzUWW8DcnLd1e zGky7Wck8HHu7LdnZA3TVlPhNV9KjdlHMEYBBJFlOc779a/Q0KyHLdwhP9mpSa2d QrVRHkBMylNUqF4WiLKQ0I8eJFhKtd7pJNAW1qGeKe2YDV6kZQiXwHOLIZb7SHET 9Qtj83IwsQ13cLTmb7tGibj0Fn+GEEqgBv40PjSMuTCoiCpfhcLrx0lWT5hxlxao 9w3Yw1pPPaJXbxCmL9b3Bqm3vnNqMaN7HqxsuGIFwnulGFB4GJY2l+ZQPcTHzWG0 oCHlimlixH7RKVEfjIDhidC1C0ReZI8/1XuHPGE17Ysxt6L7dH/a7J2y7SlIdYTZ ZVvVT0Kib74I7ZUJH1ymc4MdAVnBIkiTGOrQk/FU+cHOyzMUpNcwXiCqkvF77QUa hDW4hkzE =MW5I -----END PGP SIGNATURE----- Merge tag '5.16-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Two cifs/smb3 fixes, one fscache related, and one mount parsing related for stable" * tag '5.16-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: sanitize multiple delimiters in prepath cifs: ignore resource_id while getting fscache super cookie |
||
|
1887bf5cc4 |
zonefs fixes for 5.16-rc6
One fix and one trivial update for rc6: * Add MODULE_ALIAS_FS to get automatic module loading on mount (from Naohiro) * Update Damien's email address in the MAINTAINERS file (from me). -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCYb0hkgAKCRDdoc3SxdoY dlciAP4lGpsiFcO7TdLY2W64EHgIkcstGx1UitsqBTR5iZnp6QD/bAfaHOaTNQDG Nr8GznsB3di4WRbFoV7BhBsKFcC4cgA= =GWY0 -----END PGP SIGNATURE----- Merge tag 'zonefs-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs fixes from Damien Le Moal: "One fix and one trivial update for rc6: - Add MODULE_ALIAS_FS to get automatic module loading on mount (Naohiro) - Update Damien's email address in the MAINTAINERS file (me)" * tag 'zonefs-5.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: MAITAINERS: Change zonefs maintainer email address zonefs: add MODULE_ALIAS_FS |
||
|
a31080899d |
cifs: sanitize multiple delimiters in prepath
mount.cifs can pass a device with multiple delimiters in it. This will
cause rename(2) to fail with ENOENT.
V2:
- Make sanitize_path more readable.
- Fix multiple delimiters between UNC and prepath.
- Avoid a memory leak if a bad user starts putting a lot of delimiters
in the path on purpose.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=2031200
Fixes:
|
||
|
b774302e88 |
cifs: ignore resource_id while getting fscache super cookie
We have a cyclic dependency between fscache super cookie
and root inode cookie. The super cookie relies on
tcon->resource_id, which gets populated from the root inode
number. However, fetching the root inode initializes inode
cookie as a child of super cookie, which is yet to be populated.
resource_id is only used as auxdata to check the validity of
super cookie. We can completely avoid setting resource_id to
remove the circular dependency. Since vol creation time and
vol serial numbers are used for auxdata, we should be fine.
Additionally, there will be auxiliary data check for each
inode cookie as well.
Fixes:
|
||
|
9609134186 |
for-5.16-rc5-tag
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmG8+tEACgkQxWXV+ddt WDuuGA/9E75ZMqsMLW5az7z8Rt5voBjPeweyRHmGCLZKpgfaj0QjrJRvu0CTKU/W zCSQf+ShTTY2D3cmh1eEwKyX/waKQ71qBrMX/SgIeA0OjmlhK1UGB18MF5sAVGCB mymVYJh7IntYJE7S7OiMUL/yILmIWZYrYT+iaPZlIc9M6h0b1gjMIsE0VEmxJMCN X8RAQ4CfL9bpTTKItNehSyXx+J7TB5yamh5AspaiB/ivyN1DcUcsFf3AoaWXeh2D YIBzq4WbGnDMfUdWXKE2rqDfQgaTXtN9ffGUvphJnegg8Tqfp29LyLZ1GU0qGSXc /K8g5QNmM3nhubXq2MG5zfbHPJ1H2CgnvkDqiCcyeop+09yj/ugxTt+ULaIbJL76 pKSpcuIFXTmoW2Z7ZwijIEX4H5Dgk2l2DbE8SkJT4LjJybgpHfBT1KDQrj5iQdx+ XgmG/CbRELuGGltJNuldp0SqIyMNRgDuv6Rheg9N9H73m9epwfH5oiM0Fj/FYyQ6 lfgle6DQCP4xaDmk1zA9zrJHTUqi8Caeyg+tQYT6AbkoeCoXnvEAPgv9OOGe1M+C Ks7zeAseWs3A/j/+wCdiCKombOfR+AY3RGkPzlodUJj4YYOTyXrigtb5yhTz6Zdv ozVBZ71LUMMOf0NV45mGtqsiLqyfe3cnlqj1XtHQKaajyjgHvW8= =G7CE -----END PGP SIGNATURE----- Merge tag 'for-5.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes, almost all error handling one-liners and for stable. - regression fix in directory logging items - regression fix of extent buffer status bits handling after an error - fix memory leak in error handling path in tree-log - fix freeing invalid anon device number when handling errors during subvolume creation - fix warning when freeing leaf after subvolume creation failure - fix missing blkdev put in device scan error handling - fix invalid delayed ref after subvolume creation failure" * tag 'for-5.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix missing blkdev_put() call in btrfs_scan_one_device() btrfs: fix warning when freeing leaf after subvolume creation failure btrfs: fix invalid delayed ref after subvolume creation failure btrfs: check WRITE_ERR when trying to read an extent buffer btrfs: fix missing last dir item offset update when logging directory btrfs: fix double free of anon_dev after failure to create subvolume btrfs: fix memory leak in __add_inode_ref() |
||
|
cb29eee3b2 |
io_uring-5.16-2021-12-17
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmG8wLIQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpncuD/wI9UnDapmyWodpXCw+f/GN0ZfuGzKAblkR yYJcA3spTIYHHZiR4KCKke1vM+j8PoPBTGwCuDX9EhHFq9EIEAAAFqQwbta85Rcu c3iKhb7qWGN2VQwqdmQDZtegdblM5usNsY3fNUA3iaGeQy+7qvYEyk0u5CDSIKCQ oR0NQpE/cHYlyoqUbITlCvOwD32QvcuHK2oCBnGW02FYiKq3gddJ3EPxnBkLcTUI zeHRCtslZTFrZexapQ/w4rAJ0O6bMR/08TYjtRDxLB4bTpYR34Hx23tvEsZk4jqE olpUlJDwV2B+vZ1aAKqCrabbkNQpiws7Y1EMSwM1pqqZYaPhpVeEmcqkpDNsSgY2 s764iXPKjJnKB6XWUuuShobOjiwwk0rAyNdWKVdvzNVkEnhpNnFVmfllChOiphqM jHua88LHpRJpmHSDdfJMO9qT1uffixv1AIfhEoc0hBCddwp9RwqhijelbJDYIjou KKE7aip0rzwK90tOTFGP9lD7cUl3aFig3yQNLaJpXybm2Qa+xE//hdUk4lm1i57c ciEYtz8S827lCx/CzhpWoTszG11cpT249IMXyxtspoZZh44VTXYC83EeYI2cyG24 q8zfmgMLNMYn6dc4dP5oAA/d0r7s9AY9GvyblhwmLGpjO5mANqJPZAtl2PpSYWg6 jVh1NDEyKw== =IY1e -----END PGP SIGNATURE----- Merge tag 'io_uring-5.16-2021-12-17' of git://git.kernel.dk/linux-block Pull io_uring fix from Jens Axboe: "Just a single fix, fixing an issue with the worker creation change that was merged last week" * tag 'io_uring-5.16-2021-12-17' of git://git.kernel.dk/linux-block: io-wq: drop wqe lock before creating new worker |
||
|
8ffea2599f |
zonefs: add MODULE_ALIAS_FS
Add MODULE_ALIAS_FS() to load the module automatically when you do "mount
-t zonefs".
Fixes:
|
||
|
1744a22ae9 |
afs: Fix mmap
Fix afs_add_open_map() to check that the vnode isn't already on the list
when it adds it. It's possible that afs_drop_open_mmap() decremented
the cb_nr_mmap counter, but hadn't yet got into the locked section to
remove it.
Also vnode->cb_mmap_link should be initialised, so fix that too.
Fixes:
|
||
|
2b14864acb |
An SGID directory handling fix (marked for stable), a metrics
accounting fix and two fixups to appease static checkers. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmG54aMTHGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzixPBB/4rf0d+oJbBNQQIjTVSO09G9A5pxnU1 CY++W5xppv2hQQX8Czh8iLLYSP3tRh+1xpAlKT+DFsRI3GTh5tXiPf7Mpu76noo3 UAKxiz5NmtqiwrHKZtpsQEJlmwtAFa+qFVJIOiqq1l+gFeiYMueoFEGDgdSCLJrE P6TL9CZx06Yf+fef7hJnXKR3YM0qras1jV6Adpvu+2g/ciaThzJF3YdyAFYFp7nl sGnPTXlJ1DDIsiOAgj0oGeQICpjQJOSGNZtxVjhJx5dJjqgKjim0V4MIShqsW/tg n8LzjrdGh/lR2QfCgOfYmvFrVTeb9QQyF57sT034rGrP6VXlzCkALPi3 =FQag -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.16-rc6' of git://github.com/ceph/ceph-client Pull ceph fixes from Ilya Dryomov: "An SGID directory handling fix (marked for stable), a metrics accounting fix and two fixups to appease static checkers" * tag 'ceph-for-5.16-rc6' of git://github.com/ceph/ceph-client: ceph: fix up non-directory creation in SGID directories ceph: initialize pathlen variable in reconnect_caps_cb ceph: initialize i_size variable in ceph_sync_read ceph: fix duplicate increment of opened_inodes metric |
||
|
4989d4a0ae |
btrfs: fix missing blkdev_put() call in btrfs_scan_one_device()
The function btrfs_scan_one_device() calls blkdev_get_by_path() and
blkdev_put() to get and release its target block device. However, when
btrfs_sb_log_location_bdev() fails, blkdev_put() is not called and the
block device is left without clean up. This triggered failure of fstests
generic/085. Fix the failure path of btrfs_sb_log_location_bdev() to
call blkdev_put().
Fixes:
|
||
|
212a58fda9 |
btrfs: fix warning when freeing leaf after subvolume creation failure
When creating a subvolume, at ioctl.c:create_subvol(), if we fail to
insert the root item for the new subvolume into the root tree, we can
trigger the following warning:
[78961.741046] WARNING: CPU: 0 PID: 4079814 at fs/btrfs/extent-tree.c:3357 btrfs_free_tree_block+0x2af/0x310 [btrfs]
[78961.743344] Modules linked in:
[78961.749440] dm_snapshot dm_thin_pool (...)
[78961.773648] CPU: 0 PID: 4079814 Comm: fsstress Not tainted 5.16.0-rc4-btrfs-next-108 #1
[78961.775198] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[78961.777266] RIP: 0010:btrfs_free_tree_block+0x2af/0x310 [btrfs]
[78961.778398] Code: 17 00 48 85 (...)
[78961.781067] RSP: 0018:ffffaa4001657b28 EFLAGS: 00010202
[78961.781877] RAX: 0000000000000213 RBX: ffff897f8a796910 RCX: 0000000000000000
[78961.782780] RDX: 0000000000000000 RSI: 0000000011004000 RDI: 00000000ffffffff
[78961.783764] RBP: ffff8981f490e800 R08: 0000000000000001 R09: 0000000000000000
[78961.784740] R10: 0000000000000000 R11: 0000000000000001 R12: ffff897fc963fcc8
[78961.785665] R13: 0000000000000001 R14: ffff898063548000 R15: ffff898063548000
[78961.786620] FS: 00007f31283c6b80(0000) GS:ffff8982ace00000(0000) knlGS:0000000000000000
[78961.787717] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[78961.788598] CR2: 00007f31285c3000 CR3: 000000023fcc8003 CR4: 0000000000370ef0
[78961.789568] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[78961.790585] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[78961.791684] Call Trace:
[78961.792082] <TASK>
[78961.792359] create_subvol+0x5d1/0x9a0 [btrfs]
[78961.793054] btrfs_mksubvol+0x447/0x4c0 [btrfs]
[78961.794009] ? preempt_count_add+0x49/0xa0
[78961.794705] __btrfs_ioctl_snap_create+0x123/0x190 [btrfs]
[78961.795712] ? _copy_from_user+0x66/0xa0
[78961.796382] btrfs_ioctl_snap_create_v2+0xbb/0x140 [btrfs]
[78961.797392] btrfs_ioctl+0xd1e/0x35c0 [btrfs]
[78961.798172] ? __slab_free+0x10a/0x360
[78961.798820] ? rcu_read_lock_sched_held+0x12/0x60
[78961.799664] ? lock_release+0x223/0x4a0
[78961.800321] ? lock_acquired+0x19f/0x420
[78961.800992] ? rcu_read_lock_sched_held+0x12/0x60
[78961.801796] ? trace_hardirqs_on+0x1b/0xe0
[78961.802495] ? _raw_spin_unlock_irqrestore+0x3e/0x60
[78961.803358] ? kmem_cache_free+0x321/0x3c0
[78961.804071] ? __x64_sys_ioctl+0x83/0xb0
[78961.804711] __x64_sys_ioctl+0x83/0xb0
[78961.805348] do_syscall_64+0x3b/0xc0
[78961.805969] entry_SYSCALL_64_after_hwframe+0x44/0xae
[78961.806830] RIP: 0033:0x7f31284bc957
[78961.807517] Code: 3c 1c 48 f7 d8 (...)
This is because we are calling btrfs_free_tree_block() on an extent
buffer that is dirty. Fix that by cleaning the extent buffer, with
btrfs_clean_tree_block(), before freeing it.
This was triggered by test case generic/475 from fstests.
Fixes:
|
||
|
7a1636089a |
btrfs: fix invalid delayed ref after subvolume creation failure
When creating a subvolume, at ioctl.c:create_subvol(), if we fail to
insert the new root's root item into the root tree, we are freeing the
metadata extent we reserved for the new root to prevent a metadata
extent leak, as we don't abort the transaction at that point (since
there is nothing at that point that is irreversible).
However we allocated the metadata extent for the new root which we are
creating for the new subvolume, so its delayed reference refers to the
ID of this new root. But when we free the metadata extent we pass the
root of the subvolume where the new subvolume is located to
btrfs_free_tree_block() - this is incorrect because this will generate
a delayed reference that refers to the ID of the parent subvolume's root,
and not to ID of the new root.
This results in a failure when running delayed references that leads to
a transaction abort and a trace like the following:
[3868.738042] RIP: 0010:__btrfs_free_extent+0x709/0x950 [btrfs]
[3868.739857] Code: 68 0f 85 e6 fb ff (...)
[3868.742963] RSP: 0018:ffffb0e9045cf910 EFLAGS: 00010246
[3868.743908] RAX: 00000000fffffffe RBX: 00000000fffffffe RCX: 0000000000000002
[3868.745312] RDX: 00000000fffffffe RSI: 0000000000000002 RDI: ffff90b0cd793b88
[3868.746643] RBP: 000000000e5d8000 R08: 0000000000000000 R09: ffff90b0cd793b88
[3868.747979] R10: 0000000000000002 R11: 00014ded97944d68 R12: 0000000000000000
[3868.749373] R13: ffff90b09afe4a28 R14: 0000000000000000 R15: ffff90b0cd793b88
[3868.750725] FS: 00007f281c4a8b80(0000) GS:ffff90b3ada00000(0000) knlGS:0000000000000000
[3868.752275] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3868.753515] CR2: 00007f281c6a5000 CR3: 0000000108a42006 CR4: 0000000000370ee0
[3868.754869] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[3868.756228] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[3868.757803] Call Trace:
[3868.758281] <TASK>
[3868.758655] ? btrfs_merge_delayed_refs+0x178/0x1c0 [btrfs]
[3868.759827] __btrfs_run_delayed_refs+0x2b1/0x1250 [btrfs]
[3868.761047] btrfs_run_delayed_refs+0x86/0x210 [btrfs]
[3868.762069] ? lock_acquired+0x19f/0x420
[3868.762829] btrfs_commit_transaction+0x69/0xb20 [btrfs]
[3868.763860] ? _raw_spin_unlock+0x29/0x40
[3868.764614] ? btrfs_block_rsv_release+0x1c2/0x1e0 [btrfs]
[3868.765870] create_subvol+0x1d8/0x9a0 [btrfs]
[3868.766766] btrfs_mksubvol+0x447/0x4c0 [btrfs]
[3868.767669] ? preempt_count_add+0x49/0xa0
[3868.768444] __btrfs_ioctl_snap_create+0x123/0x190 [btrfs]
[3868.769639] ? _copy_from_user+0x66/0xa0
[3868.770391] btrfs_ioctl_snap_create_v2+0xbb/0x140 [btrfs]
[3868.771495] btrfs_ioctl+0xd1e/0x35c0 [btrfs]
[3868.772364] ? __slab_free+0x10a/0x360
[3868.773198] ? rcu_read_lock_sched_held+0x12/0x60
[3868.774121] ? lock_release+0x223/0x4a0
[3868.774863] ? lock_acquired+0x19f/0x420
[3868.775634] ? rcu_read_lock_sched_held+0x12/0x60
[3868.776530] ? trace_hardirqs_on+0x1b/0xe0
[3868.777373] ? _raw_spin_unlock_irqrestore+0x3e/0x60
[3868.778280] ? kmem_cache_free+0x321/0x3c0
[3868.779011] ? __x64_sys_ioctl+0x83/0xb0
[3868.779718] __x64_sys_ioctl+0x83/0xb0
[3868.780387] do_syscall_64+0x3b/0xc0
[3868.781059] entry_SYSCALL_64_after_hwframe+0x44/0xae
[3868.781953] RIP: 0033:0x7f281c59e957
[3868.782585] Code: 3c 1c 48 f7 d8 4c (...)
[3868.785867] RSP: 002b:00007ffe1f83e2b8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[3868.787198] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f281c59e957
[3868.788450] RDX: 00007ffe1f83e2c0 RSI: 0000000050009418 RDI: 0000000000000003
[3868.789748] RBP: 00007ffe1f83f300 R08: 0000000000000000 R09: 00007ffe1f83fe36
[3868.791214] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000003
[3868.792468] R13: 0000000000000003 R14: 00007ffe1f83e2c0 R15: 00000000000003cc
[3868.793765] </TASK>
[3868.794037] irq event stamp: 0
[3868.794548] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[3868.795670] hardirqs last disabled at (0): [<ffffffff98294214>] copy_process+0x934/0x2040
[3868.797086] softirqs last enabled at (0): [<ffffffff98294214>] copy_process+0x934/0x2040
[3868.798309] softirqs last disabled at (0): [<0000000000000000>] 0x0
[3868.799284] ---[ end trace be24c7002fe27747 ]---
[3868.799928] BTRFS info (device dm-0): leaf 241188864 gen 1268 total ptrs 214 free space 469 owner 2
[3868.801133] BTRFS info (device dm-0): refs 2 lock_owner 225627 current 225627
[3868.802056] item 0 key (237436928 169 0) itemoff 16250 itemsize 33
[3868.802863] extent refs 1 gen 1265 flags 2
[3868.803447] ref#0: tree block backref root 1610
(...)
[3869.064354] item 114 key (241008640 169 0) itemoff 12488 itemsize 33
[3869.065421] extent refs 1 gen 1268 flags 2
[3869.066115] ref#0: tree block backref root 1689
(...)
[3869.403834] BTRFS error (device dm-0): unable to find ref byte nr 241008640 parent 0 root 1622 owner 0 offset 0
[3869.405641] BTRFS: error (device dm-0) in __btrfs_free_extent:3076: errno=-2 No such entry
[3869.407138] BTRFS: error (device dm-0) in btrfs_run_delayed_refs:2159: errno=-2 No such entry
Fix this by passing the new subvolume's root ID to btrfs_free_tree_block().
This requires changing the root argument of btrfs_free_tree_block() from
struct btrfs_root * to a u64, since at this point during the subvolume
creation we have not yet created the struct btrfs_root for the new
subvolume, and btrfs_free_tree_block() only needs a root ID and nothing
else from a struct btrfs_root.
This was triggered by test case generic/475 from fstests.
Fixes:
|
||
|
651740a502 |
btrfs: check WRITE_ERR when trying to read an extent buffer
Filipe reported a hang when we have errors on btrfs. This turned out to be a side-effect of my fix |
||
|
1b2e5e5c7f |
btrfs: fix missing last dir item offset update when logging directory
When logging a directory, once we finish processing a leaf that is full
of dir items, if we find the next leaf was not modified in the current
transaction, we grab the first key of that next leaf and log it as to
mark the end of a key range boundary.
However we did not update the value of ctx->last_dir_item_offset, which
tracks the offset of the last logged key. This can result in subsequent
logging of the same directory in the current transaction to not realize
that key was already logged, and then add it to the middle of a batch
that starts with a lower key, resulting later in a leaf with one key
that is duplicated and at non-consecutive slots. When that happens we get
an error later when writing out the leaf, reporting that there is a pair
of keys in wrong order. The report is something like the following:
Dec 13 21:44:50 kernel: BTRFS critical (device dm-0): corrupt leaf:
root=18446744073709551610 block=118444032 slot=21, bad key order, prev
(704687 84 4146773349) current (704687 84 1063561078)
Dec 13 21:44:50 kernel: BTRFS info (device dm-0): leaf 118444032 gen
91449 total ptrs 39 free space 546 owner 18446744073709551610
Dec 13 21:44:50 kernel: item 0 key (704687 1 0) itemoff 3835
itemsize 160
Dec 13 21:44:50 kernel: inode generation 35532 size
1026 mode 40755
Dec 13 21:44:50 kernel: item 1 key (704687 12 704685) itemoff
3822 itemsize 13
Dec 13 21:44:50 kernel: item 2 key (704687 24 3817753667)
itemoff 3736 itemsize 86
Dec 13 21:44:50 kernel: item 3 key (704687 60 0) itemoff 3728 itemsize 8
Dec 13 21:44:50 kernel: item 4 key (704687 72 0) itemoff 3720 itemsize 8
Dec 13 21:44:50 kernel: item 5 key (704687 84 140445108)
itemoff 3666 itemsize 54
Dec 13 21:44:50 kernel: dir oid 704793 type 1
Dec 13 21:44:50 kernel: item 6 key (704687 84 298800632)
itemoff 3599 itemsize 67
Dec 13 21:44:50 kernel: dir oid 707849 type 2
Dec 13 21:44:50 kernel: item 7 key (704687 84 476147658)
itemoff 3532 itemsize 67
Dec 13 21:44:50 kernel: dir oid 707901 type 2
Dec 13 21:44:50 kernel: item 8 key (704687 84 633818382)
itemoff 3471 itemsize 61
Dec 13 21:44:50 kernel: dir oid 704694 type 2
Dec 13 21:44:50 kernel: item 9 key (704687 84 654256665)
itemoff 3403 itemsize 68
Dec 13 21:44:50 kernel: dir oid 707841 type 1
Dec 13 21:44:50 kernel: item 10 key (704687 84 995843418)
itemoff 3331 itemsize 72
Dec 13 21:44:50 kernel: dir oid 2167736 type 1
Dec 13 21:44:50 kernel: item 11 key (704687 84 1063561078)
itemoff 3278 itemsize 53
Dec 13 21:44:50 kernel: dir oid 704799 type 2
Dec 13 21:44:50 kernel: item 12 key (704687 84 1101156010)
itemoff 3225 itemsize 53
Dec 13 21:44:50 kernel: dir oid 704696 type 1
Dec 13 21:44:50 kernel: item 13 key (704687 84 2521936574)
itemoff 3173 itemsize 52
Dec 13 21:44:50 kernel: dir oid 704704 type 2
Dec 13 21:44:50 kernel: item 14 key (704687 84 2618368432)
itemoff 3112 itemsize 61
Dec 13 21:44:50 kernel: dir oid 704738 type 1
Dec 13 21:44:50 kernel: item 15 key (704687 84 2676316190)
itemoff 3046 itemsize 66
Dec 13 21:44:50 kernel: dir oid 2167729 type 1
Dec 13 21:44:50 kernel: item 16 key (704687 84 3319104192)
itemoff 2986 itemsize 60
Dec 13 21:44:50 kernel: dir oid 704745 type 2
Dec 13 21:44:50 kernel: item 17 key (704687 84 3908046265)
itemoff 2929 itemsize 57
Dec 13 21:44:50 kernel: dir oid 2167734 type 1
Dec 13 21:44:50 kernel: item 18 key (704687 84 3945713089)
itemoff 2857 itemsize 72
Dec 13 21:44:50 kernel: dir oid 2167730 type 1
Dec 13 21:44:50 kernel: item 19 key (704687 84 4077169308)
itemoff 2795 itemsize 62
Dec 13 21:44:50 kernel: dir oid 704688 type 1
Dec 13 21:44:50 kernel: item 20 key (704687 84 4146773349)
itemoff 2727 itemsize 68
Dec 13 21:44:50 kernel: dir oid 707892 type 1
Dec 13 21:44:50 kernel: item 21 key (704687 84 1063561078)
itemoff 2674 itemsize 53
Dec 13 21:44:50 kernel: dir oid 704799 type 2
Dec 13 21:44:50 kernel: item 22 key (704687 96 2) itemoff 2612
itemsize 62
Dec 13 21:44:50 kernel: item 23 key (704687 96 6) itemoff 2551
itemsize 61
Dec 13 21:44:50 kernel: item 24 key (704687 96 7) itemoff 2498
itemsize 53
Dec 13 21:44:50 kernel: item 25 key (704687 96 12) itemoff
2446 itemsize 52
Dec 13 21:44:50 kernel: item 26 key (704687 96 14) itemoff
2385 itemsize 61
Dec 13 21:44:50 kernel: item 27 key (704687 96 18) itemoff
2325 itemsize 60
Dec 13 21:44:50 kernel: item 28 key (704687 96 24) itemoff
2271 itemsize 54
Dec 13 21:44:50 kernel: item 29 key (704687 96 28) itemoff
2218 itemsize 53
Dec 13 21:44:50 kernel: item 30 key (704687 96 62) itemoff
2150 itemsize 68
Dec 13 21:44:50 kernel: item 31 key (704687 96 66) itemoff
2083 itemsize 67
Dec 13 21:44:50 kernel: item 32 key (704687 96 75) itemoff
2015 itemsize 68
Dec 13 21:44:50 kernel: item 33 key (704687 96 79) itemoff
1948 itemsize 67
Dec 13 21:44:50 kernel: item 34 key (704687 96 82) itemoff
1882 itemsize 66
Dec 13 21:44:50 kernel: item 35 key (704687 96 83) itemoff
1810 itemsize 72
Dec 13 21:44:50 kernel: item 36 key (704687 96 85) itemoff
1753 itemsize 57
Dec 13 21:44:50 kernel: item 37 key (704687 96 87) itemoff
1681 itemsize 72
Dec 13 21:44:50 kernel: item 38 key (704694 1 0) itemoff 1521
itemsize 160
Dec 13 21:44:50 kernel: inode generation 35534 size 30
mode 40755
Dec 13 21:44:50 kernel: BTRFS error (device dm-0): block=118444032
write time tree block corruption detected
So fix that by adding the missing update of ctx->last_dir_item_offset with
the offset of the boundary key.
Reported-by: Chris Murphy <lists@colorremedies.com>
Link: https://lore.kernel.org/linux-btrfs/CAJCQCtT+RSzpUjbMq+UfzNUMe1X5+1G+DnAGbHC=OZ=iRS24jg@mail.gmail.com/
Fixes:
|
||
|
33fab97249 |
btrfs: fix double free of anon_dev after failure to create subvolume
When creating a subvolume, at create_subvol(), we allocate an anonymous
device and later call btrfs_get_new_fs_root(), which in turn just calls
btrfs_get_root_ref(). There we call btrfs_init_fs_root() which assigns
the anonymous device to the root, but if after that call there's an error,
when we jump to 'fail' label, we call btrfs_put_root(), which frees the
anonymous device and then returns an error that is propagated back to
create_subvol(). Than create_subvol() frees the anonymous device again.
When this happens, if the anonymous device was not reallocated after
the first time it was freed with btrfs_put_root(), we get a kernel
message like the following:
(...)
[13950.282466] BTRFS: error (device dm-0) in create_subvol:663: errno=-5 IO failure
[13950.283027] ida_free called for id=65 which is not allocated.
[13950.285974] BTRFS info (device dm-0): forced readonly
(...)
If the anonymous device gets reallocated by another btrfs filesystem
or any other kernel subsystem, then bad things can happen.
So fix this by setting the root's anonymous device to 0 at
btrfs_get_root_ref(), before we call btrfs_put_root(), if an error
happened.
Fixes:
|
||
|
f35838a693 |
btrfs: fix memory leak in __add_inode_ref()
Line 1169 (#3) allocates a memory chunk for victim_name by kmalloc(),
but when the function returns in line 1184 (#4) victim_name allocated
by line 1169 (#3) is not freed, which will lead to a memory leak.
There is a similar snippet of code in this function as allocating a memory
chunk for victim_name in line 1104 (#1) as well as releasing the memory
in line 1116 (#2).
We should kfree() victim_name when the return value of backref_in_log()
is less than zero and before the function returns in line 1184 (#4).
1057 static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
1058 struct btrfs_root *root,
1059 struct btrfs_path *path,
1060 struct btrfs_root *log_root,
1061 struct btrfs_inode *dir,
1062 struct btrfs_inode *inode,
1063 u64 inode_objectid, u64 parent_objectid,
1064 u64 ref_index, char *name, int namelen,
1065 int *search_done)
1066 {
1104 victim_name = kmalloc(victim_name_len, GFP_NOFS);
// #1: kmalloc (victim_name-1)
1105 if (!victim_name)
1106 return -ENOMEM;
1112 ret = backref_in_log(log_root, &search_key,
1113 parent_objectid, victim_name,
1114 victim_name_len);
1115 if (ret < 0) {
1116 kfree(victim_name); // #2: kfree (victim_name-1)
1117 return ret;
1118 } else if (!ret) {
1169 victim_name = kmalloc(victim_name_len, GFP_NOFS);
// #3: kmalloc (victim_name-2)
1170 if (!victim_name)
1171 return -ENOMEM;
1180 ret = backref_in_log(log_root, &search_key,
1181 parent_objectid, victim_name,
1182 victim_name_len);
1183 if (ret < 0) {
1184 return ret; // #4: missing kfree (victim_name-2)
1185 } else if (!ret) {
1241 return 0;
1242 }
Fixes:
|
||
|
e386dfc56f |
fget: clarify and improve __fget_files() implementation
Commit
|
||
|
d800c65c2d |
io-wq: drop wqe lock before creating new worker
We have two io-wq creation paths:
- On queue enqueue
- When a worker goes to sleep
The latter invokes worker creation with the wqe->lock held, but that can
run into problems if we end up exiting and need to cancel the queued work.
syzbot caught this:
============================================
WARNING: possible recursive locking detected
5.16.0-rc4-syzkaller #0 Not tainted
--------------------------------------------
iou-wrk-6468/6471 is trying to acquire lock:
ffff88801aa98018 (&wqe->lock){+.+.}-{2:2}, at: io_worker_cancel_cb+0xb7/0x210 fs/io-wq.c:187
but task is already holding lock:
ffff88801aa98018 (&wqe->lock){+.+.}-{2:2}, at: io_wq_worker_sleeping+0xb6/0x140 fs/io-wq.c:700
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&wqe->lock);
lock(&wqe->lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
1 lock held by iou-wrk-6468/6471:
#0: ffff88801aa98018 (&wqe->lock){+.+.}-{2:2}, at: io_wq_worker_sleeping+0xb6/0x140 fs/io-wq.c:700
stack backtrace:
CPU: 1 PID: 6471 Comm: iou-wrk-6468 Not tainted 5.16.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1dc/0x2d8 lib/dump_stack.c:106
print_deadlock_bug kernel/locking/lockdep.c:2956 [inline]
check_deadlock kernel/locking/lockdep.c:2999 [inline]
validate_chain+0x5984/0x8240 kernel/locking/lockdep.c:3788
__lock_acquire+0x1382/0x2b00 kernel/locking/lockdep.c:5027
lock_acquire+0x19f/0x4d0 kernel/locking/lockdep.c:5637
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:154
io_worker_cancel_cb+0xb7/0x210 fs/io-wq.c:187
io_wq_cancel_tw_create fs/io-wq.c:1220 [inline]
io_queue_worker_create+0x3cf/0x4c0 fs/io-wq.c:372
io_wq_worker_sleeping+0xbe/0x140 fs/io-wq.c:701
sched_submit_work kernel/sched/core.c:6295 [inline]
schedule+0x67/0x1f0 kernel/sched/core.c:6323
schedule_timeout+0xac/0x300 kernel/time/timer.c:1857
wait_woken+0xca/0x1b0 kernel/sched/wait.c:460
unix_msg_wait_data net/unix/unix_bpf.c:32 [inline]
unix_bpf_recvmsg+0x7f9/0xe20 net/unix/unix_bpf.c:77
unix_stream_recvmsg+0x214/0x2c0 net/unix/af_unix.c:2832
sock_recvmsg_nosec net/socket.c:944 [inline]
sock_recvmsg net/socket.c:962 [inline]
sock_read_iter+0x3a7/0x4d0 net/socket.c:1035
call_read_iter include/linux/fs.h:2156 [inline]
io_iter_do_read fs/io_uring.c:3501 [inline]
io_read fs/io_uring.c:3558 [inline]
io_issue_sqe+0x144c/0x9590 fs/io_uring.c:6671
io_wq_submit_work+0x2d8/0x790 fs/io_uring.c:6836
io_worker_handle_work+0x808/0xdd0 fs/io-wq.c:574
io_wqe_worker+0x395/0x870 fs/io-wq.c:630
ret_from_fork+0x1f/0x30
We can safely drop the lock before doing work creation, making the two
contexts the same in that regard.
Reported-by: syzbot+b18b8be69df33a3918e9@syzkaller.appspotmail.com
Fixes:
|
||
|
e034d9cbf9 |
Fixes for 5.16-rc4:
- Fix a data corruption vector that can result from the ro remount process failing to clear all speculative preallocations from files and the rw remount process not noticing the incomplete cleanup. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmGyQ6UACgkQ+H93GTRK tOskJA/+Jg+fn+Vs7dHwBN+SaHCBH81RT+YD9gbI7Lq5xcK16K7BWz3Hf360J9uJ BpVn0fv6/+3bNg0/QMWUE67CotB2S43CzWnltUN0ymiGgAEO1eMKL1JHRtZ/89d4 pG2Lr9W1QsxMqP9L/TgxqHb5WC2xtromDLMWa7qYL8aH4qp4Dx6hv++u3qhbdTis sfvTOMsDKTwK5RC+7yB2qxr/Txn7viIlBLjTWR3UvvO0K1LxjuBTXpBsiVeQE92T ZqLgAhWDBu/wZFBibqtMsnLF0Wmn5oIZJgBoxp/+Ze8saUsuYdAI1X65ESdVmw7E ZJtSyZObDHOlRhvDn7Vlb8BNF7QY4l4n2eeGFB4zoPdy0jjz/7QKCqMSm7YCMXqc WycjeEIo1H+ZEk0AWu7KrTUAaO3zzqHB/DJt4zQA/w4QdSD14bAsyp8ZWyUEMnWy 7M0Pc0q2ptzOOp1rjd2J+0s9YSc24/dceW1lCDpIDSkOuxO82HdCxlWQ/YPEt2Lx gGv2hll8Cg0j18VThB++XZ3ozImFu+sZhkQPPCM/kKY0tWZvd4Yby1bWLNzS+ppE RIKh+hyhFTXmr0C2iJ8fGNFu/KA3V0efztvCIATHDmGcC6f4WePUQyLzZGFgS0Yw uBPmvqWeUWySgfauIvqW7Oosf0pbKvYVEd+1fcLVdh0sgr2MUPE= =N9Wf -----END PGP SIGNATURE----- Merge tag 'xfs-5.16-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fix from Darrick Wong: "This fixes a race between a readonly remount process and other processes that hold a file IOLOCK on files that previously experienced copy on write, that could result in severe filesystem corruption if the filesystem is then remounted rw. I think this is fairly rare (since the only reliable reproducer I have that fits the second criteria is the experimental xfs_scrub program), but the race is clear, so we still need to fix this. Summary: - Fix a data corruption vector that can result from the ro remount process failing to clear all speculative preallocations from files and the rw remount process not noticing the incomplete cleanup" * tag 'xfs-5.16-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: remove all COW fork extents when remounting readonly |
||
|
f152165ada |
io_uring-5.16-2021-12-10
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmG0PwYQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgphbiEACv95PrGVuOzgXwC1Ydk4Km5Yyt8vANKcxA LFzeScrRhGG28vHCLZuPSxtFn9nGjBJPDKscDWQE0tpq8NwB6BAjr3Ps2AR4tI3M 2bn1ssHIHADdkaV1ixwTtQynHgCvOtKSTmLJ+TTDIYdgwAyRVlRBtpke/nXn5VCR kUdQ/3hHSPbEY+WvrDz7yAvDIMxTrBHmt8TbDiI1FXzNf8AOd32ekpBNzwe2hTQi OLhpaQ60S74UG78c4OGg6SV+twW0nkp1DyTf8MhC3WoBBnTwQdc/BpLNJbnUwH57 lJVJTz0sknbnQCtifJ8/Wjzn6r95cOwbYvaGgXyi7KtTGn0QXLcK+FohNVizFR0x qTMWlFJ5aPmzqed6oMZ2m8iaGxR9813I0GP54Hsqw4PmoPb1GDN5FvZIN+MKdaJd PSp9YH7RDHsusYZjjaUtHIcxhxztblf5wNVAkmkfrFZzNArnwsw0k9gX7DI0f+GX +TazZzX8gLRonAskBHgoSDT+8WOzDOHYYIVVe0g/8TR+jW8oGDd9D0sJbH74dV1Y 3C4WLbJCquBzatotmcmgkt9CJzZz9y373cVVNu4HZgH49qjEZCJuFgBaTpanvwHk Qk5KTl9yIfjs+H2FDWa2Pd6LnsVvuIpwOd4R37RDLGM2ZiX100eX1VbD6RxtIehQ PPKP13kYAQ== =BUVd -----END PGP SIGNATURE----- Merge tag 'io_uring-5.16-2021-12-10' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "A few fixes that are all bound for stable: - Two syzbot reports for io-wq that turned out to be separate fixes, but ultimately very closely related - io_uring task_work running on cancelations" * tag 'io_uring-5.16-2021-12-10' of git://git.kernel.dk/linux-block: io-wq: check for wq exit after adding new worker task_work io_uring: ensure task_work gets run as part of cancelations io-wq: remove spurious bit clear on task_work addition |
||
|
6f51352929 |
for-5.16-rc4-tag
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmGwxl8ACgkQxWXV+ddt WDvSeQ//T+mv4o7ucldQZCVdN0TTVCkQUhia+ZdMwBcPty2/ZEdap+KEIVmfCV/v OLRmSNkIPDhHcIc/O3zJ1/AY0DFbb9brYGkMD/qidgPbRArhDZSrDIr+xnrKJ0iq HFxM01B54l8hJe6GWIGFuuOz+nXUP1o9SfiDOwMDTkqgzz1JSvPec70RKMxG8pTC 4plVrGaUXkKTC8WyBXnSvkP2gvjfJxqnEKv2Ru1eP1t7Bq65aed0+Z32Mogzl2ip ZSHVlRergeB03xF9YErWSfgofVWLEIXgLCh/Wfq73sMnHUHUGM/dpEKxP911MI00 A5TuTl3I25cVnDv3qMayBPECMCCGZc2vUyXTLCWUNYLb/vEUTI36QBu/KglRURmO Zx20B+3/7Yu9pFeZ23S+nlNdGDADmjkOfvZIZDoYDzqBAKWM6kZ/9oWiib21Uwi6 ql5oZNl7G6UXLPvgJoq8dqZIj/HYeLEHeqwf/tepSVQLXYzQyWYDMp686z958XI1 K/A/TKaKk19nn1Dhsz4KJeb3xhMFlFN30K7BNjdH3XRH1RrwJP6pLF/WUduJq2qn rAvJoODLHkby1D9+HqSKW/RToeR5NbBYYdfB0Q8zpy9zF/gZX7wgFc21/4B7qtib hkKfNSfAVjedOJJWTFCUDnZwwrqtcrKxYzUpdcls/O1AjifK7+U= =Mh19 -----END PGP SIGNATURE----- Merge tag 'for-5.16-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more regression fixes and stable patches, mostly one-liners. Regression fixes: - fix pointer/ERR_PTR mismatch returned from memdup_user - reset dedicated zoned mode relocation block group to avoid using it and filling it without any recourse Fixes: - handle a case to FITRIM range (also to make fstests/generic/260 work) - fix warning when extent buffer state and pages get out of sync after an IO error - fix transaction abort when syncing due to missing mapping error set on metadata inode after inlining a compressed file - fix transaction abort due to tree-log and zoned mode interacting in an unexpected way - fix memory leak of additional extent data when qgroup reservation fails - do proper handling of slot search call when deleting root refs" * tag 'for-5.16-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: replace the BUG_ON in btrfs_del_root_ref with proper error handling btrfs: zoned: clear data relocation bg on zone finish btrfs: free exchange changeset on failures btrfs: fix re-dirty process of tree-log nodes btrfs: call mapping_set_error() on btree inode with a write error btrfs: clear extent buffer uptodate when we fail to write it btrfs: fail if fstrim_range->start == U64_MAX btrfs: fix error pointer dereference in btrfs_ioctl_rm_dev_v2() |
||
|
e1b96811e2 |
two cifs/smb3 fixes, one for stable, the other fixes a recently reported NTLMSSP auth problem
-----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmGzuJoACgkQiiy9cAdy T1H9MAv/cOAk4iUhgDUsa+HYsBNiAQ+IOOu/WOZI56jEV/qVP0I3cpkiIJenG5gu ZurivI4smsgNokHOFoT3vjtLXTfTl0OdHHY/mftd5IGIPG+KnXcg3+ZaE3T+fUV3 uHX0cH3a3Azo3RGf2rRiTfW6u9FXJnb9aAUTif7UDVwsU37wUAQrKEmcazaUXdaT +j3KpqwGhCgbkneKuAd/FDTAg4wciJgRg3aE/4W2s2ovFiF6vsUcrmhQD1zP1EZh sPWdx4/U+WpeV02RfKBLQlXi6ofqRF5qRT4HkD07G5Zhz8OOZcm06wYFNl/+aYhF lTOTa9KAoTPfV2tFlTGZVEy07ggEFd3wDxeAPulDrqY8etwWSwRAHHRi5HstMlIX iYz06DLS+LSPyriy6GV6CIL01OPg8vkeLPbrLFbs8oLSVQgmx74UDS339oxJMdFe kkhLgtfRr5DfxS3uD0u17aBDL4ullH84dWdlsUKiUfGvVsBcoVSCPIQeWqkB8xjA OTKih3J1 =+1L9 -----END PGP SIGNATURE----- Merge tag '5.16-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Two cifs/smb3 fixes - one for stable, the other fixes a recently reported NTLMSSP auth problem" * tag '5.16-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: fix ntlmssp auth when there is no key exchange cifs: Fix crash on unload of cifs_arc4.ko |
||
|
e80bdc5ed0 |
Fix a race on startup and another in the delegation code. The latter
has been around for years, but I suspect recent changes may have widened the race window a little, so I'd like to go ahead and get it in. -----BEGIN PGP SIGNATURE----- iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAmGzuB8VHGJmaWVsZHNA ZmllbGRzZXMub3JnAAoJECebzXlCjuG+rYQP/1N61E3rf2MOfF4sluubaGGi0fVR cx3w8pTDSALfxAiz0ZTA6hJbZN0SR7MZuXMhWDHT/RiG7jFfuMJACU81BWn0cCPJ zuJziE3Y6fElGi5Ifp71yMLABqTlFPOUHBZkpzaADqXGwmZk4fecXmqKbtqa9kCU 1mCQJlZDGU1ljGVYuXU4SaBOzsrCpPfbxby2ZWFDs8bu5cGKDTq9OSVF1LaGzjMN 2+v0y2Qni+dRfTBCGAmxF8AukRa2r5i2rfHRatgKQO8PdROv/pguy1hYjjeeofhX /Jlq2RIVl1/9/Fwar4Cew2+fpQhpaY62wUuo4KG4PK5WZ92LGDjSpYqvu/I4lI6z BXdge0HzDPRnpZtD+tu4IJ7nQwG+JnO2Ws91rPxwT6DDX4zX9gNZc+7EGevVeXJI XTO+61Tyz+7/6YLR8UvnvyER9KvKjQiUWZkO81LZpuAAwn1PE3rahD48jxkmwwYE m5EvfuA6YR8oyk3Ml+nZNuKwgrNqPcvohyJz4QHpNpqmvOlVrb9I1k90JzIm8D9Q 09M8An1Z3AGw76+RMnvBnWewn1Nx1bKIS32SJk9x3zBoeyXLS17G6FmpEhW81xvN LoToRqD/GNx+4AtfvkHqwv2/Qd1rvONCsjbZv+2P0DAj8ZEOWn0YHk18WlvAIZUg 4ipyZbwMUbXeytKP =GgUu -----END PGP SIGNATURE----- Merge tag 'nfsd-5.16-2' of git://linux-nfs.org/~bfields/linux Pull nfsd fixes from Bruce Fields: "Fix a race on startup and another in the delegation code. The latter has been around for years, but I suspect recent changes may have widened the race window a little, so I'd like to go ahead and get it in" * tag 'nfsd-5.16-2' of git://linux-nfs.org/~bfields/linux: nfsd: fix use-after-free due to delegation race nfsd: Fix nsfd startup race (again) |
||
|
257dcf2923 |
tracing, ftrace and tracefs fixes:
- Have tracefs honor the gid mount option - Have new files in tracefs inherit the parent ownership - Have direct_ops unregister when it has no more functions - Properly clean up the ops when unregistering multi direct ops - Add a sample module to test the multiple direct ops - Fix memory leak in error path of __create_synth_event() -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYbOgPBQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qgOtAP0YD+cRLxnRKA376oQVB8zmuZ3mZ/4x 6M1hqruSDlno3AEA19PyHpxl7flFwnBb6Gnfo9VeefcMS5ENDH9p1aHX4wU= =Tr6t -----END PGP SIGNATURE----- Merge tag 'trace-v5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Tracing, ftrace and tracefs fixes: - Have tracefs honor the gid mount option - Have new files in tracefs inherit the parent ownership - Have direct_ops unregister when it has no more functions - Properly clean up the ops when unregistering multi direct ops - Add a sample module to test the multiple direct ops - Fix memory leak in error path of __create_synth_event()" * tag 'trace-v5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix possible memory leak in __create_synth_event() error path ftrace/samples: Add module to test multi direct modify interface ftrace: Add cleanup to unregister_ftrace_direct_multi ftrace: Use direct_ops hash in unregister_ftrace_direct tracefs: Set all files to the same group ownership as the mount option tracefs: Have new files inherit the ownership of their parent |
||
|
0d21e66847 |
aio poll fixes for 5.16-rc5
Fix three bugs in aio poll, and one issue with POLLFREE more broadly: - aio poll didn't handle POLLFREE, causing a use-after-free. - aio poll could block while the file is ready. - aio poll called eventfd_signal() when it isn't allowed. - POLLFREE didn't handle multiple exclusive waiters correctly. This has been tested with the libaio test suite, as well as with test programs I wrote that reproduce the first two bugs. I am sending this pull request myself as no one seems to be maintaining this code. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCYbObthQcZWJpZ2dlcnNA Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK+3mAQC9W8ApzBleEPI6FXzIIo5AiQT/2jGl 7FbO1MtkdUBU4QEAzf+VWl4Z4BJTgxl44avRdVDpXGAMnbWkd7heY+e3HwA= =mp+r -----END PGP SIGNATURE----- Merge tag 'aio-poll-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull aio poll fixes from Eric Biggers: "Fix three bugs in aio poll, and one issue with POLLFREE more broadly: - aio poll didn't handle POLLFREE, causing a use-after-free. - aio poll could block while the file is ready. - aio poll called eventfd_signal() when it isn't allowed. - POLLFREE didn't handle multiple exclusive waiters correctly. This has been tested with the libaio test suite, as well as with test programs I wrote that reproduce the first two bugs. I am sending this pull request myself as no one seems to be maintaining this code" * tag 'aio-poll-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: aio: Fix incorrect usage of eventfd_signal_allowed() aio: fix use-after-free due to missing POLLFREE handling aio: keep poll requests on waitqueue until completed signalfd: use wake_up_pollfree() binder: use wake_up_pollfree() wait: add wake_up_pollfree() |
||
|
71a8538754 |
io-wq: check for wq exit after adding new worker task_work
We check IO_WQ_BIT_EXIT before attempting to create a new worker, and wq exit cancels pending work if we have any. But it's possible to have a race between the two, where creation checks exit finding it not set, but we're in the process of exiting. The exit side will cancel pending creation task_work, but there's a gap where we add task_work after we've canceled existing creations at exit time. Fix this by checking the EXIT bit post adding the creation task_work. If it's set, run the same cancelation that exit does. Reported-and-tested-by: syzbot+b60c982cb0efc5e05a47@syzkaller.appspotmail.com Reviewed-by: Hao Xu <haoxu@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> |