In fill_thread_core_info() the ptrace accessible registers are collected
to be written out as notes in a core file. The note array is allocated
from a size calculated by iterating the user regset view, and counting the
regsets that have a non-zero core_note_type. However, this only allows for
there to be non-zero core_note_type at the end of the regset view. If
there are any gaps in the middle, fill_thread_core_info() will overflow the
note allocation, as it iterates over the size of the view and the
allocation would be smaller than that.
There doesn't appear to be any arch that has gaps such that they exceed
the notes allocation, but the code is brittle and tries to support
something it doesn't. It could be fixed by increasing the allocation size,
but instead just have the note collecting code utilize the array better.
This way the allocation can stay smaller.
Even in the case of no arch's that have gaps in their regset views, this
introduces a change in the resulting indicies of t->notes. It does not
introduce any changes to the core file itself, because any blank notes are
skipped in write_note_info().
In case, the allocation logic between fill_note_info() and
fill_thread_core_info() ever diverges from the usage logic, warn and skip
writing any notes that would overflow the array.
This fix is derrived from an earlier one[0] by Yu-cheng Yu.
[0] https://lore.kernel.org/lkml/20180717162502.32274-1-yu-cheng.yu@intel.com/
Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220317192013.13655-4-rick.p.edgecombe@intel.com
Looks like a victim of too much copy/paste, we should not be looking
at req->open.how in accept. The point is to check CLOEXEC and error
out, which we don't invalid direct descriptors on exec. Hence any
attempt to get a direct descriptor with CLOEXEC is invalid.
No harm is done here, as req->open.how.flags overlaps with
req->accept.flags, but it's very confusing and might change if either of
those command structs are modified.
Fixes: aaa4db12ef ("io_uring: accept directly into fixed file table")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Let's enable GC_URGENT_HIGH mode during f2fs_disable_checkpoint(),
so that we can use SSR allocator for GCed data/node persistence,
it can improve the performance due to it avoiding migration of
data/node locates in selected target segment of SSR allocator.
Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When compressed file has blocks, f2fs_ioc_start_atomic_write will succeed,
but compressed flag will be remained in inode. If write partial compreseed
cluster and commit atomic write will cause data corruption.
This is the reproduction process:
Step 1:
create a compressed file ,write 64K data , call fsync(), then the blocks
are write as compressed cluster.
Step2:
iotcl(F2FS_IOC_START_ATOMIC_WRITE) --- this should be fail, but not.
write page 0 and page 3.
iotcl(F2FS_IOC_COMMIT_ATOMIC_WRITE) -- page 0 and 3 write as normal file,
Step3:
drop cache.
read page 0-4 -- Since page 0 has a valid block address, read as
non-compressed cluster, page 1 and 2 will be filled with compressed data
or zero.
The root cause is, after commit 7eab7a6968 ("f2fs: compress: remove
unneeded read when rewrite whole cluster"), in step 2, f2fs_write_begin()
only set target page dirty, and in f2fs_commit_inmem_pages(), we will write
partial raw pages into compressed cluster, result in corrupting compressed
cluster layout.
Fixes: 4c8ff7095b ("f2fs: support data compression")
Fixes: 7eab7a6968 ("f2fs: compress: remove unneeded read when rewrite whole cluster")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Fengnan Chang <changfengnan@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
It needs to initialized sbi->gc_mode to GC_NORMAL explicitly.
Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When compiling with -Wformat, clang emits the following warnings:
fs/nfsd/flexfilelayout.c:120:27: warning: format specifies type 'unsigned
char' but the argument has type 'int' [-Wformat]
"%s.%hhu.%hhu", addr, port >> 8, port & 0xff);
~~~~ ^~~~~~~~~
%d
fs/nfsd/flexfilelayout.c:120:38: warning: format specifies type 'unsigned
char' but the argument has type 'int' [-Wformat]
"%s.%hhu.%hhu", addr, port >> 8, port & 0xff);
~~~~ ^~~~~~~~~~~
%d
The types of these arguments are unconditionally defined, so this patch
updates the format character to the correct ones for ints and unsigned
ints.
Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Bill Wendling <morbo@google.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Tom Haynes <loghyr@hammerspace.com>
Workloads using provided buffers benefit from using and returning buffers
in the right order, and so does TLBs for that matter. Manage the internal
buffer list in a straight list, rather than use the head buffer as the
insertion node. Use a hashed list for the buffer group IDs instead of
xarray, the overhead is much lower this way. xarray provides internal
locking and other trickery that is handy for some uses cases, but
io_uring already locks internally for the buffer manipulation and needs
none of that.
This is good for about a 2% reduction in overhead, combination of the
improved management and the fact that the workload has an easier time
bundling back provided buffers.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When renaming the whiteout file, the old whiteout file is not deleted.
Therefore, we add the old dentry size to the old dir like XFS.
Otherwise, an error may be reported due to `fscki->calc_sz != fscki->size`
in check_indes.
Fixes: 9e0a1fff8d ("ubifs: Implement RENAME_WHITEOUT")
Reported-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
We need a mid level of gc urgent mode to do GC forcibly in a period
of given gc_urgent_sleep_time, but not like using greedy GC approach
and switching to SSR mode such as gc urgent high mode. This can be
used for more aggressive periodic storage clean up.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In lz4_decompress_pages(), if size of decompressed data is not equal to
expected one, we should print the size rather than size of target buffer
for decompressed data, fix it.
Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
iput() has already judged the incoming parameter, so there is
no need to repeat the judgment here.
Signed-off-by: Wang Xiaojun <wangxiaojun11@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
A recent change to how the SMB3 server (socket) and session status
is managed regressed multiuser mounts by changing the check
for whether session setup is needed to the socket (TCP_Server_info)
structure instead of the session struct (cifs_ses). Add additional
check in cifs_setup_sesion to fix this.
Fixes: 73f9bfbe3d ("cifs: maintain a state machine for tcp/smb/tcon sessions")
Reported-by: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
io_commit_cqring() is currently always under spinlock section, so it's
always better to keep it as slim as possible. Move
__io_commit_cqring_flush() out of it into ev_posted*(). If fast checks
do fail and this post-processing is required, we'll reacquire
->completion_lock, which is fine as we don't care about performance of
draining and offset timeouts.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ec4e81fd720d3bc7bca8cb9152e080dad1a052f1.1647481208.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
With commit "io_uring: cache req->apoll->events in req->cflags" applied,
we now have just io_poll_remove_entries() dipping into req->apoll when
it isn't strictly necessary.
Mark poll and double-poll with a flag, so we know if we need to look
at apoll->double_poll. This avoids pulling in those cachelines if we
don't need them. The common case is that the poll wake handler already
removed these entries while hot off the completion path.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When we arm poll on behalf of a different type of request, like a network
receive, then we allocate req->apoll as our poll entry. Running network
workloads shows io_poll_check_events() as the most expensive part of
io_uring, and it's all due to having to pull in req->apoll instead of
just the request which we have hot already.
Cache poll->events in req->cflags, which isn't used until the request
completes anyway. This isn't strictly needed for regular poll, where
req->poll.events is used and thus already hot, but for the sake of
unification we do it all around.
This saves 3-4% of overhead in certain request workloads.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When we mount a jffs2 image, assume that the first few blocks of
the image are normal and contain at least one xattr-related inode,
but the next block is abnormal. As a result, an error is returned
in jffs2_scan_eraseblock(). jffs2_clear_xattr_subsystem() is then
called in jffs2_build_filesystem() and then again in
jffs2_do_fill_super().
Finally we can observe the following report:
==================================================================
BUG: KASAN: use-after-free in jffs2_clear_xattr_subsystem+0x95/0x6ac
Read of size 8 at addr ffff8881243384e0 by task mount/719
Call Trace:
dump_stack+0x115/0x16b
jffs2_clear_xattr_subsystem+0x95/0x6ac
jffs2_do_fill_super+0x84f/0xc30
jffs2_fill_super+0x2ea/0x4c0
mtd_get_sb+0x254/0x400
mtd_get_sb_by_nr+0x4f/0xd0
get_tree_mtd+0x498/0x840
jffs2_get_tree+0x25/0x30
vfs_get_tree+0x8d/0x2e0
path_mount+0x50f/0x1e50
do_mount+0x107/0x130
__se_sys_mount+0x1c5/0x2f0
__x64_sys_mount+0xc7/0x160
do_syscall_64+0x45/0x70
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Allocated by task 719:
kasan_save_stack+0x23/0x60
__kasan_kmalloc.constprop.0+0x10b/0x120
kasan_slab_alloc+0x12/0x20
kmem_cache_alloc+0x1c0/0x870
jffs2_alloc_xattr_ref+0x2f/0xa0
jffs2_scan_medium.cold+0x3713/0x4794
jffs2_do_mount_fs.cold+0xa7/0x2253
jffs2_do_fill_super+0x383/0xc30
jffs2_fill_super+0x2ea/0x4c0
[...]
Freed by task 719:
kmem_cache_free+0xcc/0x7b0
jffs2_free_xattr_ref+0x78/0x98
jffs2_clear_xattr_subsystem+0xa1/0x6ac
jffs2_do_mount_fs.cold+0x5e6/0x2253
jffs2_do_fill_super+0x383/0xc30
jffs2_fill_super+0x2ea/0x4c0
[...]
The buggy address belongs to the object at ffff8881243384b8
which belongs to the cache jffs2_xattr_ref of size 48
The buggy address is located 40 bytes inside of
48-byte region [ffff8881243384b8, ffff8881243384e8)
[...]
==================================================================
The triggering of the BUG is shown in the following stack:
-----------------------------------------------------------
jffs2_fill_super
jffs2_do_fill_super
jffs2_do_mount_fs
jffs2_build_filesystem
jffs2_scan_medium
jffs2_scan_eraseblock <--- ERROR
jffs2_clear_xattr_subsystem <--- free
jffs2_clear_xattr_subsystem <--- free again
-----------------------------------------------------------
An error is returned in jffs2_do_mount_fs(). If the error is returned
by jffs2_sum_init(), the jffs2_clear_xattr_subsystem() does not need to
be executed. If the error is returned by jffs2_build_filesystem(), the
jffs2_clear_xattr_subsystem() also does not need to be executed again.
So move jffs2_clear_xattr_subsystem() from 'out_inohash' to 'out_root'
to fix this UAF problem.
Fixes: aa98d7cf59 ("[JFFS2][XATTR] XATTR support on JFFS2 (version. 5)")
Cc: stable@vger.kernel.org
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
inode->i_mutex has been replaced with inode->i_rwsem long ago. Fix
comments still mentioning i_mutex.
Signed-off-by: hongnanli <hongnan.li@linux.alibaba.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This serves two purposes:
- We now have the last cacheline mostly unused for generic workloads,
instead of having to pull in the poll refs explicitly for workloads
that rely on poll arming.
- It shrinks the io_kiocb from 232 to 224 bytes.
Signed-off-by: Jens Axboe <axboe@kernel.dk>