Commit Graph

1264556 Commits

Author SHA1 Message Date
Jaegeuk Kim
7643f3fe27 f2fs: assign the write hint per stream by default
This reverts commit 930e260763 ("f2fs: remove obsolete whint_mode"), as we
decide to pass write hints to the disk.

Cc: Hyunchul Lee <cheol.lee@lge.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-19 17:56:13 +00:00
Daeho Jeong
b2cf5a1ff2 f2fs: allow direct io of pinned files for zoned storage
Since the allocation happens in conventional LU for zoned storage, we
can allow direct io for that.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-14 15:18:23 +00:00
Daeho Jeong
3fdd89b452 f2fs: prevent writing without fallocate() for pinned files
In a case writing without fallocate(), we can't guarantee it's allocated
in the conventional area for zoned stroage. To make it consistent across
storage devices, we disallow it regardless of storage device types.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-14 15:18:20 +00:00
Jaegeuk Kim
16778aea91 f2fs: use folio_test_writeback
Let's convert PageWriteback to folio_test_writeback.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-12 20:58:35 +00:00
Zhiguo Niu
fa18d87cb2 f2fs: add REQ_TIME time update for some user behaviors
some user behaviors requested filesystem operations, which
will cause filesystem not idle.
Meanwhile adjust some f2fs_update_time(REQ_TIME) positions.

Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-12 20:58:35 +00:00
Daeho Jeong
b084403cfc f2fs: write missing last sum blk of file pinning section
While do not allocating a new section in advance for file pinning area, I
missed that we should write the sum block for the last segment of a file
pinning section.

Fixes: 9703d69d9d ("f2fs: support file pinning for zoned devices")
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-12 20:58:35 +00:00
Jaegeuk Kim
3bdb7f1616 f2fs: don't set RO when shutting down f2fs
Shutdown does not check the error of thaw_super due to readonly, which
causes a deadlock like below.

f2fs_ioc_shutdown(F2FS_GOING_DOWN_FULLSYNC)        issue_discard_thread
 - bdev_freeze
  - freeze_super
 - f2fs_stop_checkpoint()
  - f2fs_handle_critical_error                     - sb_start_write
    - set RO                                         - waiting
 - bdev_thaw
  - thaw_super_locked
    - return -EINVAL, if sb_rdonly()
 - f2fs_stop_discard_thread
  -> wait for kthread_stop(discard_thread);

Reported-by: "Light Hsieh (謝明燈)" <Light.Hsieh@mediatek.com>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-12 20:58:35 +00:00
Chao Yu
e07230da05 f2fs: fix to check pinfile flag in f2fs_move_file_range()
ioctl(F2FS_IOC_MOVE_RANGE) can truncate or punch hole on pinned file,
fix to disallow it.

Fixes: 5fed0be858 ("f2fs: do not allow partial truncation on pinned file")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-12 20:58:35 +00:00
Chao Yu
278a6253a6 f2fs: fix to relocate check condition in f2fs_fallocate()
compress and pinfile flag should be checked after inode lock held to
avoid race condition, fix it.

Fixes: 4c8ff7095b ("f2fs: support data compression")
Fixes: 5fed0be858 ("f2fs: do not allow partial truncation on pinned file")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-12 20:58:35 +00:00
Chao Yu
bd9ae4ae9e f2fs: compress: fix to relocate check condition in f2fs_ioc_{,de}compress_file()
Compress flag should be checked after inode lock held to avoid
racing w/ f2fs_setflags_common() , fix it.

Fixes: 5fdb322ff2 ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE")
Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/CAHJ8P3LdZXLc2rqeYjvymgYHr2+YLuJ0sLG9DdsJZmwO7deuhw@mail.gmail.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-12 20:58:35 +00:00
Chao Yu
7c5dffb3d9 f2fs: compress: fix to relocate check condition in f2fs_{release,reserve}_compress_blocks()
Compress flag should be checked after inode lock held to avoid
racing w/ f2fs_setflags_common(), fix it.

Fixes: 4c8ff7095b ("f2fs: support data compression")
Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/CAHJ8P3LdZXLc2rqeYjvymgYHr2+YLuJ0sLG9DdsJZmwO7deuhw@mail.gmail.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-12 20:58:35 +00:00
Wenjie Qi
0f9b12142b f2fs: fix zoned block device information initialization
If the max open zones of zoned devices are less than
the active logs of F2FS, the device may error due to
insufficient zone resources when multiple active logs
are being written at the same time.

Signed-off-by: Wenjie Qi <qwjhust@gmail.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-09 16:17:48 +00:00
Zhiguo Niu
197080156f f2fs: fix to adjust appropirate defragment pg_end
A length that exceeds the real size of the inode may be
specified from user, although these out-of-range areas
are not mapped, but they still need to be check in
while loop, which is unnecessary.

Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-04-02 23:38:07 +00:00
Yunlei He
ac5eecf481 f2fs: remove clear SB_INLINECRYPT flag in default_options
In f2fs_remount, SB_INLINECRYPT flag will be clear and re-set.
If create new file or open file during this gap, these files
will not use inlinecrypt. Worse case, it may lead to data
corruption if wrappedkey_v0 is enable.

Thread A:                               Thread B:

-f2fs_remount				-f2fs_file_open or f2fs_new_inode
  -default_options
	<- clear SB_INLINECRYPT flag

                                          -fscrypt_select_encryption_impl

  -parse_options
	<- set SB_INLINECRYPT again

Signed-off-by: Yunlei He <heyunlei@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-03-29 23:54:42 +00:00
Chao Yu
d3876e34e7 f2fs: fix to wait on page writeback in __clone_blkaddrs()
In below race condition, dst page may become writeback status
in __clone_blkaddrs(), it needs to wait writeback before update,
fix it.

Thread A				GC Thread
- f2fs_move_file_range
  - filemap_write_and_wait_range(dst)
					- gc_data_segment
					 - f2fs_down_write(dst)
					 - move_data_page
					  - set_page_writeback(dst_page)
					  - f2fs_submit_page_write
					 - f2fs_up_write(dst)
  - f2fs_down_write(dst)
  - __exchange_data_block
   - __clone_blkaddrs
    - f2fs_get_new_data_page
    - memcpy_page

Fixes: 0a2aa8fbb9 ("f2fs: refactor __exchange_data_block for speed up")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-03-29 23:54:42 +00:00
Chao Yu
9f0f6bf427 f2fs: support to map continuous holes or preallocated address
This patch supports to map continuous holes or preallocated addresses
to improve performace of lookuping mapping info during read DIO.

[testcase 1]
xfs_io -f /mnt/f2fs/hole -c "truncate 1m" -c "fsync"
xfs_io -d /mnt/f2fs/hole -c "pread -b 1m 0 1m"

[before]
f2fs_direct_IO_enter: dev = (253,16), ino = 6 pos = 0 len = 1048576 ki_flags = 20000 ki_ioprio = 0 rw = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 0, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 1, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 2, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 3, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 4, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 5, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 6, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 7, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 8, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 9, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 10, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 11, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 12, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 13, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 14, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 15, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 16, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
......
f2fs_direct_IO_exit: dev = (253,16), ino = 6 pos = 0 len = 1048576 rw = 0 ret = 1048576

[after]
f2fs_direct_IO_enter: dev = (253,16), ino = 6 pos = 0 len = 1048576 ki_flags = 20000 ki_ioprio = 0 rw = 0
f2fs_map_blocks: dev = (253,16), ino = 6, file offset = 0, start blkaddr = 0x0, len = 0x100, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_direct_IO_exit: dev = (253,16), ino = 6 pos = 0 len = 1048576 rw = 0 ret = 1048576

[testcase 2]
xfs_io -f /mnt/f2fs/preallocated -c "falloc 0 1m" -c "fsync"
xfs_io -d /mnt/f2fs/preallocated -c "pread -b 1m 0 1m"

[before]
f2fs_direct_IO_enter: dev = (253,16), ino = 11 pos = 0 len = 1048576 ki_flags = 20000 ki_ioprio = 0 rw = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 0, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 1, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 2, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 3, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 4, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 5, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 6, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 7, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 8, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 9, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 10, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 11, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 12, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 13, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 14, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 15, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 16, start blkaddr = 0x0, len = 0x0, flags = 0, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
......
f2fs_direct_IO_exit: dev = (253,16), ino = 11 pos = 0 len = 1048576 rw = 0 ret = 1048576

[after]
f2fs_direct_IO_enter: dev = (253,16), ino = 11 pos = 0 len = 1048576 ki_flags = 20000 ki_ioprio = 0 rw = 0
f2fs_map_blocks: dev = (253,16), ino = 11, file offset = 0, start blkaddr = 0xffffffff, len = 0x100, flags = 4, seg_type = 1, may_create = 0, multidevice = 0, flag = 3, err = 0
f2fs_direct_IO_exit: dev = (253,16), ino = 11 pos = 0 len = 1048576 rw = 0 ret = 1048576

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-03-29 23:54:42 +00:00
Chao Yu
33e62cd7b4 f2fs: multidev: fix to recognize valid zero block address
As reported by Yi Zhang in mailing list [1], kernel warning was catched
during zbd/010 test as below:

./check zbd/010
zbd/010 (test gap zone support with F2FS)                    [failed]
    runtime    ...  3.752s
    something found in dmesg:
    [ 4378.146781] run blktests zbd/010 at 2024-02-18 11:31:13
    [ 4378.192349] null_blk: module loaded
    [ 4378.209860] null_blk: disk nullb0 created
    [ 4378.413285] scsi_debug:sdebug_driver_probe: scsi_debug: trim
poll_queues to 0. poll_q/nr_hw = (0/1)
    [ 4378.422334] scsi host15: scsi_debug: version 0191 [20210520]
                     dev_size_mb=1024, opts=0x0, submit_queues=1, statistics=0
    [ 4378.434922] scsi 15:0:0:0: Direct-Access-ZBC Linux
scsi_debug       0191 PQ: 0 ANSI: 7
    [ 4378.443343] scsi 15:0:0:0: Power-on or device reset occurred
    [ 4378.449371] sd 15:0:0:0: Attached scsi generic sg5 type 20
    [ 4378.449418] sd 15:0:0:0: [sdf] Host-managed zoned block device
    ...
    (See '/mnt/tests/gitlab.com/api/v4/projects/19168116/repository/archive.zip/storage/blktests/blk/blktests/results/nodev/zbd/010.dmesg'

WARNING: CPU: 22 PID: 44011 at fs/iomap/iter.c:51
CPU: 22 PID: 44011 Comm: fio Not tainted 6.8.0-rc3+ #1
RIP: 0010:iomap_iter+0x32b/0x350
Call Trace:
 <TASK>
 __iomap_dio_rw+0x1df/0x830
 f2fs_file_read_iter+0x156/0x3d0 [f2fs]
 aio_read+0x138/0x210
 io_submit_one+0x188/0x8c0
 __x64_sys_io_submit+0x8c/0x1a0
 do_syscall_64+0x86/0x170
 entry_SYSCALL_64_after_hwframe+0x6e/0x76

Shinichiro Kawasaki helps to analyse this issue and proposes a potential
fixing patch in [2].

Quoted from reply of Shinichiro Kawasaki:

"I confirmed that the trigger commit is dbf8e63f48 as Yi reported. I took a
look in the commit, but it looks fine to me. So I thought the cause is not
in the commit diff.

I found the WARN is printed when the f2fs is set up with multiple devices,
and read requests are mapped to the very first block of the second device in the
direct read path. In this case, f2fs_map_blocks() and f2fs_map_blocks_cached()
modify map->m_pblk as the physical block address from each block device. It
becomes zero when it is mapped to the first block of the device. However,
f2fs_iomap_begin() assumes that map->m_pblk is the physical block address of the
whole f2fs, across the all block devices. It compares map->m_pblk against
NULL_ADDR == 0, then go into the unexpected branch and sets the invalid
iomap->length. The WARN catches the invalid iomap->length.

This WARN is printed even for non-zoned block devices, by following steps.

 - Create two (non-zoned) null_blk devices memory backed with 128MB size each:
   nullb0 and nullb1.
 # mkfs.f2fs /dev/nullb0 -c /dev/nullb1
 # mount -t f2fs /dev/nullb0 "${mount_dir}"
 # dd if=/dev/zero of="${mount_dir}/test.dat" bs=1M count=192
 # dd if="${mount_dir}/test.dat" of=/dev/null bs=1M count=192 iflag=direct

..."

So, the root cause of this issue is: when multi-devices feature is on,
f2fs_map_blocks() may return zero blkaddr in non-primary device, which is
a verified valid block address, however, f2fs_iomap_begin() treats it as
an invalid block address, and then it triggers the warning in iomap
framework code.

Finally, as discussed, we decide to use a more simple and direct way that
checking (map.m_flags & F2FS_MAP_MAPPED) condition instead of
(map.m_pblk != NULL_ADDR) to fix this issue.

Thanks a lot for the effort of Yi Zhang and Shinichiro Kawasaki on this
issue.

[1] https://lore.kernel.org/linux-f2fs-devel/CAHj4cs-kfojYC9i0G73PRkYzcxCTex=-vugRFeP40g_URGvnfQ@mail.gmail.com/
[2] https://lore.kernel.org/linux-f2fs-devel/gngdj77k4picagsfdtiaa7gpgnup6fsgwzsltx6milmhegmjff@iax2n4wvrqye/

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Closes: https://lore.kernel.org/linux-f2fs-devel/CAHj4cs-kfojYC9i0G73PRkYzcxCTex=-vugRFeP40g_URGvnfQ@mail.gmail.com/
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Fixes: 1517c1a7a4 ("f2fs: implement iomap operations")
Fixes: 8d3c1fa3fa ("f2fs: don't rely on F2FS_MAP_* in f2fs_iomap_begin")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-03-29 23:54:38 +00:00
Chao Yu
ac7805be6c f2fs: introduce map_is_mergeable() for cleanup
No logic changes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-03-26 16:56:44 +00:00
Chao Yu
92c556ed63 f2fs: fix to detect inconsistent nat entry during truncation
As Roman Smirnov reported as below:

"
There is a possible bug in f2fs_truncate_inode_blocks():

    if (err < 0 && err != -ENOENT)
    			goto fail;
        ...
        offset[1] = 0;
        offset[0]++;
        nofs += err;

If err = -ENOENT then nofs will sum with an error code,
which is strange behaviour. Also if nofs < ENOENT this will
cause an overflow. err will be equal to -ENOENT with the
following call stack:

truncate_nodes()
  f2fs_get_node_page()
     __get_node_page()
        read_node_page()
"

If nat is corrupted, truncate_nodes() may return -ENOENT, and
f2fs_truncate_inode_blocks() doesn't handle such error correctly,
fix it.

Reported-by: Roman Smirnov <r.smirnov@omp.ru>
Closes: https://lore.kernel.org/linux-f2fs-devel/085b27fd2b364a3c8c3a9ca77363e246@omp.ru
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-03-26 16:53:00 +00:00
Yeongjin Gil
3127f1010c f2fs: Prevent s_writer rw_sem count mismatch in f2fs_evict_inode
If f2fs_evict_inode is called between freeze_super and thaw_super, the
s_writer rwsem count may become negative, resulting in hang.

CPU1                       CPU2

f2fs_resize_fs()           f2fs_evict_inode()
  f2fs_freeze
    set SBI_IS_FREEZING
                             skip sb_start_intwrite
  f2fs_unfreeze
    clear SBI_IS_FREEZING
                             sb_end_intwrite

To solve this problem, the call to sb_end_write is determined by whether
sb_start_intwrite is called, rather than the current freezing status.

Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com>
Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-03-26 16:52:26 +00:00
Chao Yu
ee745e4736 f2fs: support .shutdown in f2fs_sops
Support .shutdown callback in f2fs_sops, then, it can be called to
shut down the file system when underlying block device is marked dead.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-03-26 02:13:08 +00:00
Linus Torvalds
928a87efa4 gfs2 fix
- Fix boundary check in punch_hole
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmYBaM0UHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTrzqw/9GpK71h1dIA8vYqInumdrUabksLKy
 jRMR2ZxfzBKLdAfgn9AS3nrWNos72vjAxbjYCi/fbY9uvIK1/zzq7Ef7601kCetM
 NzxShY8AwLJa9mO8O5yReLL7O/61gjlcdD6rSjkYwphWuobd5vpudKkibgpdJyH8
 bn6U1/2K5ASFtWyTRbudOIsz4AqPUE6ZB4KxSuCDx7uFiQjnuh6sk8wfg48pdig7
 GAsNPmBFfWAQXClPnI/WFG0hpkuRIK1hk9ITWx1ybu2JqaNeVXRBqGoRZbEkPYju
 qEkp4oT3j/1siBz1sMOjC5tfmAzhLvAeL61pD2EOcm5Bpd3iKJibYt/uCIpYFHM0
 WfRcUmqEduN1zhDuSR4KSe49JQ5dFXVf83YqUgbtrHFiHHXNBYYqFNUVfcDAB1p7
 IH9AlNd82zyxJ3fsBX7VpEbGC2qNa3K8hYO7px8DNVrPGzW7AhPF1Lsh0OE9GlZU
 H5f70Nryi98iwadbePBUchTrx0S3iYjk2TQgLGf5L/lAl6J/MRNG31kittDtehri
 cct/JBr8sUAK014TS5NxPbpxqDnVot3UsYk7h6s7WdmM1svfs7j5f1mo3ovMEGqX
 io5Z6pFEE7n1ce5hbieDKr3JFh6LxP1ArUSY8oz5rR0shE2XHMcIdq3J26Vfi0Q0
 4VjdBic/7rUUBXI=
 =QXEK
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v6.8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Andreas Gruenbacher:

 - Fix boundary check in punch_hole

* tag 'gfs2-v6.8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix invalid metadata access in punch_hole
2024-03-25 10:53:39 -07:00
Linus Torvalds
174fdc93a2 This push fixes a regression that broke iwd as well as a divide by
zero in iaa.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmX9buQACgkQxycdCkmx
 i6dw1w/+IiwZX9A62NLDEBJbaXavQEHqoI1mhXq4Zwqeor5DhUlUGbmAhdZ5bg88
 oA60zjroLgt+Wke3phXFMjavSdwyLMd8zkDU3BjhqHAD6ASZz4ml51MCi48ROKB8
 lKSSAvjJX3VAHlYSxctUB/KWfzZZaGGUWQdtkC/90Tqp6OGeVyBbQCiQPyF4I55P
 e1B8ADY5Ey2d+Rgos1whMuLkKa077yoWdiDgX5PFjfGalRdNh4jNsojrCSx6AEiI
 KijWUtaGBNj9Exc04NeCa0JQNff4vynhn21ygMbgPMEMTde0SlHHdf6FWKrZcm6h
 JlNYYVGXjcobMC62dQdTosTLxuAqSd4kr5UaiCjO52QdM8txfBJCm/PDARS2z9Gl
 xROvBpOfRSn0Z8GpuHaBMNot4DlKv6y/puwmef6o3qYlUJH+CnuHUFaLOHRYVt6d
 B0tWi7PWyx2jorj0VHUCpqiGGYlbUq6lWhGYt8XzPHeQ2ZewzH6EbbF5qKpEWGiB
 3X8Dxl1PpO8sSwOQRabBpoWoXxJLL6G9uvyJCjJeftL9ZDEeTBDvdA9qis7t7kR5
 Ckv4q/5rOq4MK3NsXsL6lJZY7ckQavLhCoZlRU9kcQt3MG06i6KUDoo83L/49rUL
 zq5adU8sI3lP/2VplfA0XEeYFuLlBWH4TwNAmwJBRoWFiG8j8XM=
 =Mau0
 -----END PGP SIGNATURE-----

Merge tag 'v6.9-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
 "This fixes a regression that broke iwd as well as a divide by zero in
  iaa"

* tag 'v6.9-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: iaa - Fix nr_cpus < nr_iaa case
  Revert "crypto: pkcs7 - remove sha1 support"
2024-03-25 10:48:23 -07:00
Linus Torvalds
4cece76496 Linux 6.9-rc1 2024-03-24 14:10:05 -07:00
Linus Torvalds
ab8de2dbfc EFI fixes for v6.9 #2
- Fix logic that is supposed to prevent placement of the kernel image
   below LOAD_PHYSICAL_ADDR
 - Use the firmware stack in the EFI stub when running in mixed mode
 - Clear BSS only once when using mixed mode
 - Check efi.get_variable() function pointer for NULL before trying to
   call it
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQQm/3uucuRGn1Dmh0wbglWLn0tXAUCZgCRgwAKCRAwbglWLn0t
 XHozAP9jLdeGs1ReYZAn+W0QtW/SJHJznoPiHcktdNKG4rNX3QD9G3URu0f4jKCG
 yvjw8qHM1pC2cihXXjABjf7gL7g6LAE=
 =cNP7
 -----END PGP SIGNATURE-----

Merge tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fixes from Ard Biesheuvel:

 - Fix logic that is supposed to prevent placement of the kernel image
   below LOAD_PHYSICAL_ADDR

 - Use the firmware stack in the EFI stub when running in mixed mode

 - Clear BSS only once when using mixed mode

 - Check efi.get_variable() function pointer for NULL before trying to
   call it

* tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi: fix panic in kdump kernel
  x86/efistub: Don't clear BSS twice in mixed mode
  x86/efistub: Call mixed mode boot services on the firmware's stack
  efi/libstub: fix efi_random_alloc() to allocate memory at alloc_min or higher address
2024-03-24 13:54:06 -07:00
Linus Torvalds
5e74df2f8f A set of x86 fixes:
- Ensure that the encryption mask at boot is properly propagated on
     5-level page tables, otherwise the PGD entry is incorrectly set to
     non-encrypted, which causes system crashes during boot.
 
   - Undo the deferred 5-level page table setup as it cannot work with
     memory encryption enabled.
 
   - Prevent inconsistent XFD state on CPU hotplug, where the MSR is reset
     to the default value but the cached variable is not, so subsequent
     comparisons might yield the wrong result and as a consequence the
     result prevents updating the MSR.
 
   - Register the local APIC address only once in the MPPARSE enumeration to
     prevent triggering the related WARN_ONs() in the APIC and topology code.
 
   - Handle the case where no APIC is found gracefully by registering a fake
     APIC in the topology code. That makes all related topology functions
     work correctly and does not affect the actual APIC driver code at all.
 
   - Don't evaluate logical IDs during early boot as the local APIC IDs are
     not yet enumerated and the invoked function returns an error
     code. Nothing requires the logical IDs before the final CPUID
     enumeration takes place, which happens after the enumeration.
 
   - Cure the fallout of the per CPU rework on UP which misplaced the
     copying of boot_cpu_data to per CPU data so that the final update to
     boot_cpu_data got lost which caused inconsistent state and boot
     crashes.
 
   - Use copy_from_kernel_nofault() in the kprobes setup as there is no
     guarantee that the address can be safely accessed.
 
   - Reorder struct members in struct saved_context to work around another
     kmemleak false positive
 
   - Remove the buggy code which tries to update the E820 kexec table for
     setup_data as that is never passed to the kexec kernel.
 
   - Update the resource control documentation to use the proper units.
 
   - Fix a Kconfig warning observed with tinyconfig
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmYAUH4THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXzREAC/HVB7yzUEbjbh7dyYRBEgFU19bcyC
 JKf9HVmEHj03HstUxF1dxguUhwfHVPNTWpjmy/fRwxqgM9JG+QpV6T4DIldWqchv
 AUYFrQBMvql8hTKxRa/Ny75d2IqKPgEEGUuyU+ZHAzEEPwhKrbtVRDPuEiMxpd5I
 9B1Pya4EzUyOv1UhPIg7PRoya1msimBZ0mCw4In6ri6xVRm1uC3Ln4LZPylxn96l
 f77rz5UToUw0gfgDaezF0z4ml1phGEdSX0Z3hhD0PX12wbJGEdvPzL0qTgEq72Ad
 AeLmHx4K8z2zoHMHK7iTEwjoplQxGsWLoezh22cVEEJX0dtzHz6R0ftBCa6uzATJ
 C8FF1oDDHAhTL94YmVSTZHr6AdJ6LwgYHO3zXZUhxuB7PNXAT4FmT0zgU1fU3sC1
 U/1mIFdgOEUOlGll2Ra5uTUKc0K/dc+yC9dcbz37Kwj3KlfqTN+5BWocjySkHomr
 gcv37aU1TJGSC/D1lYWTDWGKVbbP5lk+KIGICT5SBKn0METa/wOo8dE6+T1kIwvS
 t2QTlJdzilLcWGVQ8GiNjjRxFtRKY5i9Shi4K+wUvCee4/XJzRrpxrCEY8w/qceV
 hc3kfUIon3TCv8+rnlSuNRZBvmFhXMYwMt0gQv4YywB+aOITKTzbGUOazLtRNKAH
 lFCnBRS55AB8mg==
 =WyQ2
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:

 - Ensure that the encryption mask at boot is properly propagated on
   5-level page tables, otherwise the PGD entry is incorrectly set to
   non-encrypted, which causes system crashes during boot.

 - Undo the deferred 5-level page table setup as it cannot work with
   memory encryption enabled.

 - Prevent inconsistent XFD state on CPU hotplug, where the MSR is reset
   to the default value but the cached variable is not, so subsequent
   comparisons might yield the wrong result and as a consequence the
   result prevents updating the MSR.

 - Register the local APIC address only once in the MPPARSE enumeration
   to prevent triggering the related WARN_ONs() in the APIC and topology
   code.

 - Handle the case where no APIC is found gracefully by registering a
   fake APIC in the topology code. That makes all related topology
   functions work correctly and does not affect the actual APIC driver
   code at all.

 - Don't evaluate logical IDs during early boot as the local APIC IDs
   are not yet enumerated and the invoked function returns an error
   code. Nothing requires the logical IDs before the final CPUID
   enumeration takes place, which happens after the enumeration.

 - Cure the fallout of the per CPU rework on UP which misplaced the
   copying of boot_cpu_data to per CPU data so that the final update to
   boot_cpu_data got lost which caused inconsistent state and boot
   crashes.

 - Use copy_from_kernel_nofault() in the kprobes setup as there is no
   guarantee that the address can be safely accessed.

 - Reorder struct members in struct saved_context to work around another
   kmemleak false positive

 - Remove the buggy code which tries to update the E820 kexec table for
   setup_data as that is never passed to the kexec kernel.

 - Update the resource control documentation to use the proper units.

 - Fix a Kconfig warning observed with tinyconfig

* tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/64: Move 5-level paging global variable assignments back
  x86/boot/64: Apply encryption mask to 5-level pagetable update
  x86/cpu: Add model number for another Intel Arrow Lake mobile processor
  x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD
  Documentation/x86: Document that resctrl bandwidth control units are MiB
  x86/mpparse: Register APIC address only once
  x86/topology: Handle the !APIC case gracefully
  x86/topology: Don't evaluate logical IDs during early boot
  x86/cpu: Ensure that CPU info updates are propagated on UP
  kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe address
  x86/pm: Work around false positive kmemleak report in msr_build_context()
  x86/kexec: Do not update E820 kexec table for setup_data
  x86/config: Fix warning for 'make ARCH=x86_64 tinyconfig'
2024-03-24 11:13:56 -07:00
Linus Torvalds
b136f68eb0 A single update for the documentation of the base_slice_ns tunable to
clarify that any value which is less than the tick slice has no effect
 because the scheduler tick is not guaranteed to happen within the set time
 slice.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmYASukTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXsRD/41H+nJ9o2Mb5ImF8NfZr2u/bIzf9Rd
 BZxlWsGbTmzhvNN6WJXZ89nM61jUqYlQcJeAYhwzjmCgNDnC/BTa3umQjnWha+M5
 fiawvzJxPjMt4TzMVPAtyrjxpOmfD9iDNtaczr2gv4PwMl2Ko17iItOJCNX0rlrG
 OBFZ0s9cBc4X+OUmVnuetkoJVm4knsLxfHA1ETJ9E0DlWOwdGsAOnue+KsVpecmF
 VSi5bgWT7fMyayOxomBynoDhjxfMCBLygFyctwJqcCWPpiEkt0nMehBzV32V6p/V
 go2rSbBw2w7kl1v2lBngVCdFlVbvMzIGIYQ6zJweVjDKX51+UGs0JNwlU1bIVBih
 dL+6tms+RUqO39Td4xIf7MeVAfplxRJg7fvn/b9oxM+TwwiOsjSi/Pir3h9w6sm8
 vo/JwO2VQtGxAZZA7UJFlZ/8QE06WTeNLO/giatETL+6OILEsm71qqoRIl1h2/d6
 ePqewAqmo8WTSHYjLm2IKzgRfKU8ko04ZJ6Er+12D3MNa3+Ezhx+8lIX7X9j5vtn
 eczwxCEZjvS3oF0O+NKu8mcS57vrDBE1qmGW97CWNLO0FuxF8yzp3RZFApCGCfee
 rfD38Epda0cQyRe2f6F1A0jFyOA5rQDMjWzivrQHbyCULwEuwVQP+UFzaCorzWSU
 UCjInBTY2zLPoA==
 =K76J
 -----END PGP SIGNATURE-----

Merge tag 'sched-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler doc clarification from Thomas Gleixner:
 "A single update for the documentation of the base_slice_ns tunable to
  clarify that any value which is less than the tick slice has no effect
  because the scheduler tick is not guaranteed to happen within the set
  time slice"

* tag 'sched-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/doc: Update documentation for base_slice_ns and CONFIG_HZ relation
2024-03-24 11:11:05 -07:00
Linus Torvalds
864ad046c1 dma-mapping fixes for Linux 6.9
This has a set of swiotlb alignment fixes for sometimes very long
 standing bugs from Will.  We've been discussion them for a while and they
 should be solid now.
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmX/bmILHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYOuKQ//cUR3EywszAc04x8dIYsfegFGdQxUeJD0+1elAPss
 ELiqrlg5A/Yn4uHKpXjWbvJ+v1Ywh3o8+vlgUiG4aFeg4xEd+FsJqm2SDa3jhdMP
 2hV8pwB92kpkKCxyCAqx8O/4o4fY++KCFsOtnammEudFjurJaCrRTlauOn6D1t/i
 JsBYCFtjFIhIPHQe7jmZ6dNiLEfiIJ+q8ImW+UxuB+gOGgU8C4VVW3tHuo3KeU7n
 yVOcz4yJrQ4xYzG3RKtaU0FE0ybA860xwiA5oPvqpI9A2ISGovv7ik0QCUlHXhff
 z+iL8Lj/KsOucq5pBDhbRYeN2n4VVogEwb/hut6mgyqj1ESjqeZaLioVHqOTDbmB
 +vNTVBt6OGTOq1YkNKttK9vBBXs5RdZSBalzBG/QO1ewmrNVVZ7z8fWXVRDipoIl
 sAIXmI8xAy5TNL6UbJ+RDfYeLlTzHjXGKQGB49gumOA8s4w5P5v9diYegX6GcVZV
 PKkYLOvprwcyi8Xxx2mNxFDxh+LWqzMYqzwsN7AoRTW4TRc7Tel0G6Axs+V/cL/Y
 23IHfFfT2HqDUM5PuBfUcgCrtw1hinuD80xqXVcvaU+AYoQhrGHJFLHkj6lTwV2b
 hmuul170froI2A/vm8yGGqcn2Me55AexlpMab+UWL+iisGtqFTWi9b9vK/2Vi+Zj
 wBg=
 =Xaob
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:
 "This has a set of swiotlb alignment fixes for sometimes very long
  standing bugs from Will. We've been discussion them for a while and
  they should be solid now"

* tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE
  iommu/dma: Force swiotlb_max_mapping_size on an untrusted device
  swiotlb: Fix alignment checks when both allocation and DMA masks are present
  swiotlb: Honour dma_alloc_coherent() alignment in swiotlb_alloc()
  swiotlb: Enforce page alignment in swiotlb_alloc()
  swiotlb: Fix double-allocation of slots due to broken alignment handling
2024-03-24 10:45:31 -07:00
Oleksandr Tymoshenko
62b71cd73d efi: fix panic in kdump kernel
Check if get_next_variable() is actually valid pointer before
calling it. In kdump kernel this method is set to NULL that causes
panic during the kexec-ed kernel boot.

Tested with QEMU and OVMF firmware.

Fixes: bad267f9e1 ("efi: verify that variable services are supported")
Signed-off-by: Oleksandr Tymoshenko <ovt@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-03-24 09:28:33 +01:00
Ard Biesheuvel
df7ecce842 x86/efistub: Don't clear BSS twice in mixed mode
Clearing BSS should only be done once, at the very beginning.
efi_pe_entry() is the entrypoint from the firmware, which may not clear
BSS and so it is done explicitly. However, efi_pe_entry() is also used
as an entrypoint by the mixed mode startup code, in which case BSS will
already have been cleared, and doing it again at this point will corrupt
global variables holding the firmware's GDT/IDT and segment selectors.

So make the memset() conditional on whether the EFI stub is running in
native mode.

Fixes: b3810c5a2c ("x86/efistub: Clear decompressor BSS in native EFI entrypoint")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-03-24 09:28:33 +01:00
Ard Biesheuvel
cefcd4fe2e x86/efistub: Call mixed mode boot services on the firmware's stack
Normally, the EFI stub calls into the EFI boot services using the stack
that was live when the stub was entered. According to the UEFI spec,
this stack needs to be at least 128k in size - this might seem large but
all asynchronous processing and event handling in EFI runs from the same
stack and so quite a lot of space may be used in practice.

In mixed mode, the situation is a bit different: the bootloader calls
the 32-bit EFI stub entry point, which calls the decompressor's 32-bit
entry point, where the boot stack is set up, using a fixed allocation
of 16k. This stack is still in use when the EFI stub is started in
64-bit mode, and so all calls back into the EFI firmware will be using
the decompressor's limited boot stack.

Due to the placement of the boot stack right after the boot heap, any
stack overruns have gone unnoticed. However, commit

  5c4feadb0011983b ("x86/decompressor: Move global symbol references to C code")

moved the definition of the boot heap into C code, and now the boot
stack is placed right at the base of BSS, where any overruns will
corrupt the end of the .data section.

While it would be possible to work around this by increasing the size of
the boot stack, doing so would affect all x86 systems, and mixed mode
systems are a tiny (and shrinking) fraction of the x86 installed base.

So instead, record the firmware stack pointer value when entering from
the 32-bit firmware, and switch to this stack every time a EFI boot
service call is made.

Cc: <stable@kernel.org> # v6.1+
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-03-24 09:28:32 +01:00
Tom Lendacky
9843231c97 x86/boot/64: Move 5-level paging global variable assignments back
Commit 63bed96604 ("x86/startup_64: Defer assignment of 5-level paging
global variables") moved assignment of 5-level global variables to later
in the boot in order to avoid having to use RIP relative addressing in
order to set them. However, when running with 5-level paging and SME
active (mem_encrypt=on), the variables are needed as part of the page
table setup needed to encrypt the kernel (using pgd_none(), p4d_offset(),
etc.). Since the variables haven't been set, the page table manipulation
is done as if 4-level paging is active, causing the system to crash on
boot.

While only a subset of the assignments that were moved need to be set
early, move all of the assignments back into check_la57_support() so that
these assignments aren't spread between two locations. Instead of just
reverting the fix, this uses the new RIP_REL_REF() macro when assigning
the variables.

Fixes: 63bed96604 ("x86/startup_64: Defer assignment of 5-level paging global variables")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/2ca419f4d0de719926fd82353f6751f717590a86.1711122067.git.thomas.lendacky@amd.com
2024-03-24 05:00:36 +01:00
Tom Lendacky
4d0d7e7852 x86/boot/64: Apply encryption mask to 5-level pagetable update
When running with 5-level page tables, the kernel mapping PGD entry is
updated to point to the P4D table. The assignment uses _PAGE_TABLE_NOENC,
which, when SME is active (mem_encrypt=on), results in a page table
entry without the encryption mask set, causing the system to crash on
boot.

Change the assignment to use _PAGE_TABLE instead of _PAGE_TABLE_NOENC so
that the encryption mask is set for the PGD entry.

Fixes: 533568e06b ("x86/boot/64: Use RIP_REL_REF() to access early_top_pgt[]")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/8f20345cda7dbba2cf748b286e1bc00816fe649a.1711122067.git.thomas.lendacky@amd.com
2024-03-24 05:00:35 +01:00
Tony Luck
8a8a9c9047 x86/cpu: Add model number for another Intel Arrow Lake mobile processor
This one is the regular laptop CPU.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240322161725.195614-1-tony.luck@intel.com
2024-03-24 04:08:10 +01:00
Adamos Ttofari
10e4b5166d x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD
Commit 672365477a ("x86/fpu: Update XFD state where required") and
commit 8bf26758ca ("x86/fpu: Add XFD state to fpstate") introduced a
per CPU variable xfd_state to keep the MSR_IA32_XFD value cached, in
order to avoid unnecessary writes to the MSR.

On CPU hotplug MSR_IA32_XFD is reset to the init_fpstate.xfd, which
wipes out any stale state. But the per CPU cached xfd value is not
reset, which brings them out of sync.

As a consequence a subsequent xfd_update_state() might fail to update
the MSR which in turn can result in XRSTOR raising a #NM in kernel
space, which crashes the kernel.

To fix this, introduce xfd_set_state() to write xfd_state together
with MSR_IA32_XFD, and use it in all places that set MSR_IA32_XFD.

Fixes: 672365477a ("x86/fpu: Update XFD state where required")
Signed-off-by: Adamos Ttofari <attofari@amazon.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240322230439.456571-1-chang.seok.bae@intel.com

Closes: https://lore.kernel.org/lkml/20230511152818.13839-1-attofari@amazon.de
2024-03-24 04:03:54 +01:00
Tony Luck
a8ed59a3a8 Documentation/x86: Document that resctrl bandwidth control units are MiB
The memory bandwidth software controller uses 2^20 units rather than
10^6. See mbm_bw_count() which computes bandwidth using the "SZ_1M"
Linux define for 0x00100000.

Update the documentation to use MiB when describing this feature.
It's too late to fix the mount option "mba_MBps" as that is now an
established user interface.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240322182016.196544-1-tony.luck@intel.com
2024-03-24 03:58:43 +01:00
Linus Torvalds
70293240c5 Two regression fixes for the timer and timer migration code:
1) Prevent endless timer requeuing which is caused by two CPUs racing out
      of idle. This happens when the last CPU goes idle and therefore has to
      ensure to expire the pending global timers and some other CPU come out
      of idle at the same time and the other CPU wins the race and expires
      the global queue. This causes the last CPU to chase ghost timers
      forever and reprogramming it's clockevent device endlessly.
 
      Cure this by re-evaluating the wakeup time unconditionally.
 
   2) The split into local (pinned) and global timers in the timer wheel
      caused a regression for NOHZ full as it broke the idle tracking of
      global timers. On NOHZ full this prevents an self IPI being sent which
      in turn causes the timer to be not programmed and not being expired on
      time.
 
      Restore the idle tracking for the global timer base so that the self
      IPI condition for NOHZ full is working correctly again.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmX/Mn8THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYX0D/93fKa9qp+qBs72Vvctj4bJuk6auel7
 GRWx3Vc//kk4VOJFCteQ2eykr1/fsLuibPb67iiKp41stvaWKeJPj0kD+RUuVf8E
 dPGPpPU+qY5ynhoiqJekZL7+5NSA48y4bDc00a4U31MPpEcJB5y94zFOCKMiCWtk
 6Tf6I6168bsSFYvKqb2LImVoowu/bf7bXLVUk1HcdNnSC7bfx+yN8nkQ1zSy3K2M
 IyE1CQqMyDmdfKW9Vs68ooTIpuA0n7bxOuXbVaFdJyiJ035v3Z3+m2vQmrHHLDdz
 MfqHbFDEomDC+zfiugFvuxyxLIi2Gf/NXPibu6OxLkVe2Pu1KUJkhFbgZVUR2W6A
 EU6SmZr77zMPAMZyqG8OJqTqlCPiJfJX2KMWDF+ezbXBt+sbMe6LfazPqj9TopnN
 /ECMCt77xl1POCEnP81hPWizKsqf8HCTDDZEi9UlqbIxT3TgrZFlTnKfmmciwWiP
 uGoUXgZmi8qJ+lSGNTVUgbTmIbazvUz43sKgfndUy2yxeCb/SBlx1/8Ys2ntszOy
 XDOF8QroPH0zXlBaYo0QVZbOdB4O0/En1qZuGScBmoUY7bRr0NRD/C3ObhvQHI7C
 iguAsnB+zirwwZSTDzwhQDhXAtWgSaqBB7pb4aCDxC0AvnKtL5HKdDNeIafpPJeA
 4Xh40iu44u/V9w==
 =lY5O
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
 "Two regression fixes for the timer and timer migration code:

   - Prevent endless timer requeuing which is caused by two CPUs racing
     out of idle. This happens when the last CPU goes idle and therefore
     has to ensure to expire the pending global timers and some other
     CPU come out of idle at the same time and the other CPU wins the
     race and expires the global queue. This causes the last CPU to
     chase ghost timers forever and reprogramming it's clockevent device
     endlessly.

     Cure this by re-evaluating the wakeup time unconditionally.

   - The split into local (pinned) and global timers in the timer wheel
     caused a regression for NOHZ full as it broke the idle tracking of
     global timers. On NOHZ full this prevents an self IPI being sent
     which in turn causes the timer to be not programmed and not being
     expired on time.

     Restore the idle tracking for the global timer base so that the
     self IPI condition for NOHZ full is working correctly again"

* tag 'timers-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timers: Fix removed self-IPI on global timer's enqueue in nohz_full
  timers/migration: Fix endless timer requeue after idle interrupts
2024-03-23 14:49:25 -07:00
Linus Torvalds
00164f477f A set of updates for clocksource and clockevent drivers:
- A fix for the prescaler of the ARM global timer where the prescaler
     mask define only covered 4 bits while it is actully 8 bits wide. This
     restricted obviously the possible range of the prescaler adjustments.
 
   - A fix for the RISC-V timer which prevents a timer interrupt being
     raised while the timer is initialized.
 
   - A set of device tree updates to support new system on chips in various
     drivers.
 
   - Kernel-doc and other cleanups all over the place.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmX/MDcTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYofUcD/9hZjJp11tNJNibLGjJrZUIg1Zpiu7c
 avlrG/KT9BgElyp8D4WG3GTY7ng9vzJ7XNXG5eXyrkMClzlDYmXt5oeoUM3KRFDg
 G/kX0ee/+vcP5IKVsYR3w1MQ7QGe7VfE5FU4CxtR3OjG3T5Gueim1AKBtM26QazK
 wHe1/yp0vIPjZO1awlWQm9CJ+DSD2Mb1oPo8c9Bitd5KXXjPg8uR3bk2kOVYMdzC
 B6xYvQWF3Pl7NXcQKOnIazqsNlphsYiBGc5ZbKp5zjDggmp/ChBedt6ePCccU3DO
 VXE6D4eyZ5hmQbFzSEZUYsWcNzkeH8ZWnxkjeAoG60Y2vErLVsceytaIja061VOl
 BP2j8QbQ7ztwLOTtZc8KwJsVfxBh1xhwAUwe5Mpl171m7K2n+zquR8Clq4SqL2DK
 mjUcHKIZzyhZ4HX3rRwVOpJmfbJlRxSRjiLp+oPulJBB9pLMjarxwQ6Ij68yrPst
 kVow4Ur45abmXsDRj3lc3+bDcfMr8AM3l/HjD7AZamC2TAvkp1hkYPghhazdolx5
 qQA3swdQl0tq49SNvvLbVRjRiCiQmUdpOGpnevvqxEO/BESGquVfPrihOYqIQuwh
 +yurmba5L8uxpsBCPtGfiMse9fbE7GVQJkMqhr9KNNOfER30rEe8bmqr6r71+uZJ
 19Jp2U76zNfVXA==
 =M9Gu
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull more clocksource updates from Thomas Gleixner:
 "A set of updates for clocksource and clockevent drivers:

   - A fix for the prescaler of the ARM global timer where the prescaler
     mask define only covered 4 bits while it is actully 8 bits wide.
     This obviously restricted the possible range of prescaler
     adjustments

   - A fix for the RISC-V timer which prevents a timer interrupt being
     raised while the timer is initialized

   - A set of device tree updates to support new system on chips in
     various drivers

   - Kernel-doc and other cleanups all over the place"

* tag 'timers-core-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers/timer-riscv: Clear timer interrupt on timer initialization
  dt-bindings: timer: Add support for cadence TTC PWM
  clocksource/drivers/arm_global_timer: Simplify prescaler register access
  clocksource/drivers/arm_global_timer: Guard against division by zero
  clocksource/drivers/arm_global_timer: Make gt_target_rate unsigned long
  dt-bindings: timer: add Ralink SoCs system tick counter
  clocksource: arm_global_timer: fix non-kernel-doc comment
  clocksource/drivers/arm_global_timer: Remove stray tab
  clocksource/drivers/arm_global_timer: Fix maximum prescaler value
  clocksource/drivers/imx-sysctr: Add i.MX95 support
  clocksource/drivers/imx-sysctr: Drop use global variables
  dt-bindings: timer: nxp,sysctr-timer: support i.MX95
  dt-bindings: timer: renesas: ostm: Document RZ/Five SoC
  dt-bindings: timer: renesas,tmu: Document input capture interrupt
  clocksource/drivers/ti-32K: Fix misuse of "/**" comment
  clocksource/drivers/stm32: Fix all kernel-doc warnings
  dt-bindings: timer: exynos4210-mct: Add google,gs101-mct compatible
  clocksource/drivers/imx: Fix -Wunused-but-set-variable warning
2024-03-23 14:42:45 -07:00
Linus Torvalds
1a39193137 A series of fixes for the Renesas RZG21 interrupt chip driver to prevent
spurious and misrouted interrupts.
 
   - Ensure that posted writes are flushed in the eoi() callback
 
   - Ensure that interrupts are masked at the chip level when the trigger
     type is changed
 
   - Clear the interrupt status register when setting up edge type trigger
     modes.
 
   - Ensure that the trigger type and routing information is set before the
     interrupt is enabled.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmX/LiETHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYofNND/95Mf0Gu9HOYLYaq3vO37B8oA+LcyGP
 SAm5j7VT5ymTLxMGZgVJk5gpaw5jysUVH6TfMG9tk861vGvQGAoAVd7hVgJaZSWz
 +c7WgQQY9YTGOjZqE726GnB/uHZUr/m8Ls7z+H7IEliB9T9Gruvu3pMkfrAZjg6t
 exPxDMdr1d0/j7t/7WVGbincqkz2lJUxW/6BdPFgoJ7Ez0CpePb/TkTGmO7+GXHz
 WQi3xBAVhaPcsTJ4PZnMxm++XG0dAZqyAr4WVjaMKCuDwrultBpDUgHhYCFKHJ/D
 eNNCyPbXzrpEujO9KUdcDSrQv//J44SXejXDNhgGamtRQSkAsPLUNH/zHRtKVOjL
 oJWF7UKX6a1ySdmTs3PadgO/g8ZVkla8IL0u/yVIaVttT7ix32hf66XW8rIpfBw1
 cCc9GXmyXBQs8WRtQAf+OjBcPeyfD3+ZI0m1GdA3ATpK455+TImJ5D2++6O05u8T
 3m62UtJH5KQvKlEUESflSxGEhJCZz/MoGgJwEsJ6lG8b0gQgbhVl2l9r4qOJ6f6y
 rFUTvJKO+UTaPRRkH5xWITpf3JdxmAAHeAMBUzSND3JoPVv60qkVhgGpx/D0Xiwo
 lJVo36g09uEpN1igNTJwi6C/ZnhHa6tIZAaGs5/BNqE3+W7e9p2U8NqCxrRfArq3
 TInh1wSXok2rsw==
 =zIKG
 -----END PGP SIGNATURE-----

Merge tag 'irq-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "A series of fixes for the Renesas RZG21 interrupt chip driver to
  prevent spurious and misrouted interrupts.

   - Ensure that posted writes are flushed in the eoi() callback

   - Ensure that interrupts are masked at the chip level when the
     trigger type is changed

   - Clear the interrupt status register when setting up edge type
     trigger modes

   - Ensure that the trigger type and routing information is set before
     the interrupt is enabled"

* tag 'irq-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/renesas-rzg2l: Do not set TIEN and TINT source at the same time
  irqchip/renesas-rzg2l: Prevent spurious interrupts when setting trigger type
  irqchip/renesas-rzg2l: Rename rzg2l_irq_eoi()
  irqchip/renesas-rzg2l: Rename rzg2l_tint_eoi()
  irqchip/renesas-rzg2l: Flush posted write in irq_eoi()
2024-03-23 14:30:38 -07:00
Linus Torvalds
976b029d06 A single fix for the generic entry code:
THe trace_sys_enter() tracepoint can modify the syscall number via
   kprobes or BPF in pt_regs, but that requires that the syscall number is
   re-evaluted from pt_regs after the tracepoint.
 
   A seccomp fix in that area removed the re-evaluation so the change does
   not take effect as the code just uses the locally cached number.
 
   Restore the original behaviour by re-evaluating the syscall number after
   the tracepoint.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmX/LJoTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodkJEACFbJqxdlpoWvc8lpYd0htqP+EyN6li
 quMk+acE8IOtiqOetphOrmKGzBYdtpNu+7uWVeRTyCarm2Gb5lX1GnIMwyEK8R25
 R+fVbPaECldSUKypAbTze2RpJl/vp2jeKhCpOn3A5MpITd0vV9SEs+9hecBz3ucZ
 CXB4vPR7n8VxHX5ArbHqe05fF5TnSZ4CnpeWONQghGsEaSJ87XrljVfC/fxmb7mZ
 yB2KAIKeYqslvoFoIWSAJk6xvwcGC08vY8mxkM6n61yXmAZ33BWvpnafGHsstF/V
 gT9Kmvs7F6Tn6V5j0SUuC4/nUO/dtEDhvra793OSvR/rB3+AmZfECPQaDNIJqNaG
 0pG558E+hrnpV/YWGzRYJCtqUgtTS0kV5JJY9ffoJRQB3Nb7GCeyykDcs823Yt7j
 A5o3JUQU0kW4X08hGFMU6DSFppxp9nS0eXDoKMlVRmDB8l30bLK+tMPMaID6yGse
 0UafVMWRhtM85mqNVZIZKTsRuONvT+8PV2UH5cNryGmjjbykLmjSLFRvXBmk2Jl+
 uhcc0QaApQleA/H/fSNoyPFOPCqvDBEo4xM9e3ypc8TflOQegZfpzrjAITHzv9hY
 r4ci8faKXTR7jY70a6TiEgcGYYLYDmyvOL0I+TwYA9qHB7tEaJMF6YfU8UDsM9K2
 Gb4l6n8fNQubTg==
 =iW0L
 -----END PGP SIGNATURE-----

Merge tag 'core-entry-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core entry fix from Thomas Gleixner:
 "A single fix for the generic entry code:

  The trace_sys_enter() tracepoint can modify the syscall number via
  kprobes or BPF in pt_regs, but that requires that the syscall number
  is re-evaluted from pt_regs after the tracepoint.

  A seccomp fix in that area removed the re-evaluation so the change
  does not take effect as the code just uses the locally cached number.

  Restore the original behaviour by re-evaluating the syscall number
  after the tracepoint"

* tag 'core-entry-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  entry: Respect changes to system call number by trace_sys_enter()
2024-03-23 14:17:37 -07:00
Linus Torvalds
484193fecd powerpc updates for 6.9 #2
- Handle errors in mark_rodata_ro() and mark_initmem_nx().
 
  - Make struct crash_mem available without CONFIG_CRASH_DUMP.
 
 Thanks to: Christophe Leroy, Hari Bathini.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmX+HgQTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgEGiEACC18EO6YUDVCblbRQsJLgZUZWdskqF
 0TTgrqhemXS1m7yucqp4HwM5N9YV7d3nXtaBTd0nxwrHRCpIef3XDFUyiLpFWbDQ
 pPt9AyUhTkkdH/6JQlYTC9S0/l9xAYPo15As8ZUnTPUNJk2pV6NxSpjYqZuQcmFm
 W5ln2rYDtG57XDI0WMMo/CSZ50YRgWSpINavQLUxn6MxTda2ZrXF39HaKi0FS1u+
 64bTT3uwKHvVWcf4/+KTiY3EOUgbgNeedZ7PjVGOx6VpQvig96/qtbu8zCfqMX0v
 eTz+p6IDtLTeXSu8Ak5YN5wBBNA9EsnR0osk/T99Ru48EEErkTl4+G6bkXlgRu7A
 KRbbyI66JZHXauwaKIwuhOswoYtEDrCrOadYMlYGCqjUsDs9zmxTUWamcZBSow60
 5S/Oo5SgOES+5P/p889hwW8XodC9uzLltMw7M+R36nRDPh9nbwu93Y1Kx2LQiery
 sxnXu+Pg7DTLq4zH75WggwpOH6nPoAP0NYT3QXnbgz8CFArePE7h1OfMzy26Xj1X
 C7wh1npAksdPo/00t2VEO7V38vq8nm41JqQ5lbPsRtoUuwFVqMp7+Agmq+eK4J/B
 zVEroBLIlGYexV5OLu824wIAVhT/PrYm9ONaTi7O7DFZrOFsnNyrRw1fifr7afgu
 xECv0qJ+5I9uAw==
 =edty
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull more powerpc updates from Michael Ellerman:

 - Handle errors in mark_rodata_ro() and mark_initmem_nx()

 - Make struct crash_mem available without CONFIG_CRASH_DUMP

Thanks to Christophe Leroy and Hari Bathini.

* tag 'powerpc-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency
  powerpc/kexec: split CONFIG_KEXEC_FILE and CONFIG_CRASH_DUMP
  kexec/kdump: make struct crash_mem available without CONFIG_CRASH_DUMP
  powerpc: Handle error in mark_rodata_ro() and mark_initmem_nx()
2024-03-23 09:21:26 -07:00
Linus Torvalds
02fb638bed ARM updates for v6.9-rc1
- remove a misuse of kernel-doc comment
 - use "Call trace:" for backtraces like other architectures
 - implement copy_from_kernel_nofault_allowed() to fix a LKDTM test
 - add a "cut here" line for prefetch aborts
 - remove unnecessary Kconfing entry for FRAME_POINTER
 - remove iwmmxy support for PJ4/PJ4B cores
 - use bitfield helpers in ptrace to improve readabililty
 - check if folio is reserved before flushing
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEuNNh8scc2k/wOAE+9OeQG+StrGQFAmX5u38ACgkQ9OeQG+St
 rGTB+xAAh0jUZeBEPUdCEooFPMXpYyTueOrYxth1zqs1UBQNCsfqwfFRY/e8qHgL
 8yrpshbgrQY62NCcpHV2Wso9ZUQO7c8JCrI2CpPS+1+oyg/lJN2hPv3i10WmiFgW
 D0kc4X7hDOn3lapKRRBnWea/Xi7FHl0TTKUfL/+0HrJRtwTW1wPrFk7ECp/vhdIZ
 KBflQyiAJ2QSlovRBvtr+Fbfdbwd55araArHUSlus43uGnDFQh/h6LYe9gifIO+P
 SFX4BAo2FfFIhGOJ4ghzPeVL2zfiMRQNButELiktY+KhRSUijOvumnCxRL2YzqJO
 0zNzKUWEkShV2NEq82X55zVezQ9wOiB9GYAZRSJ3qAZ3eT2+EqHIPbe2+RnLrKN+
 szjCY7S9kKWt4WU0r5P4Au58FjCxJ+gzehvuaE/BYOisGsbejzxoh1uKBf1PR7Xg
 3iS+zlYHGo2gfVTNgFpV4nmlgkPelTnyK3+yYAsTr/IAGkMbLHjb6d7qyJZ1Wsde
 YsRmkXwTkKcE2UlsQB7fGF4S9SrZ6MWqdCvWfCnu3INCMHCEP/8/siXkcX27jRUV
 o4N2JbioUZmeWPAfcBJnh/aDAzgku63yFev2QgB70awPE5YiOAWRQQIn+mIB1aDV
 ut9AB1gg/sNe7VnCVAnZEWzlL5DHXqByTXur2FoGjoVWcf1cqv4=
 =f4r2
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM updates from Russell King:

 - remove a misuse of kernel-doc comment

 - use "Call trace:" for backtraces like other architectures

 - implement copy_from_kernel_nofault_allowed() to fix a LKDTM test

 - add a "cut here" line for prefetch aborts

 - remove unnecessary Kconfing entry for FRAME_POINTER

 - remove iwmmxy support for PJ4/PJ4B cores

 - use bitfield helpers in ptrace to improve readabililty

 - check if folio is reserved before flushing

* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 9359/1: flush: check if the folio is reserved for no-mapping addresses
  ARM: 9354/1: ptrace: Use bitfield helpers
  ARM: 9352/1: iwmmxt: Remove support for PJ4/PJ4B cores
  ARM: 9353/1: remove unneeded entry for CONFIG_FRAME_POINTER
  ARM: 9351/1: fault: Add "cut here" line for prefetch aborts
  ARM: 9350/1: fault: Implement copy_from_kernel_nofault_allowed()
  ARM: 9349/1: unwind: Add missing "Call trace:" line
  ARM: 9334/1: mm: init: remove misuse of kernel-doc comment
2024-03-23 09:17:03 -07:00
Linus Torvalds
b71871395c hardening fixes for v6.9-rc1
- CONFIG_MEMCPY_SLOW_KUNIT_TEST is no longer needed (Guenter Roeck)
 
 - Fix needless UTF-8 character in arch/Kconfig (Liu Song)
 
 - Improve __counted_by warning message in LKDTM (Nathan Chancellor)
 
 - Refactor DEFINE_FLEX() for default use of __counted_by
 
 - Disable signed integer overflow sanitizer on GCC < 8
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmX+Gj8WHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJg1fD/4zrQ3MmMpK/GaNCziayLuCJiMu
 urS85FkRaGkmf3d3PDVVOBw5wFae408O+i4a1b10k2L/At79Z5oi/RqKYtPXthGr
 davVIJt+ZBuTC5l7jUHeK6XAJ/8/28Xsq4dMQjy7pUstYU14sl2a1z+IuV7I+jxm
 hlVqvj7Q0dsPijyF9YAuoWhvXIGs3/p6oaq/p4+p0sGpbbjNEpNka9qgR9vl1NFe
 WrByHrFMxl6gmLPTilNvgdD50id4x7OYEtROdDP8tXK+dx8YGaEXAuPzYkpzmuOb
 9UjhGitxRX6VfWBy7N/TWtDryV493Jq2VFVsrEc2zNr2HR34/vTYCjMyfLlJNXVu
 hD3vZ/tO2tDnDhBeMKigIsbTtev4qn60IElmc4BXudgVbWehjVsJk+kGbKEIkIy6
 ww7vF4OPWEKNTBVsDtFJwKNZixb/NMX3nS1xCALffzNgh3o8qP28gESOKImHeKhI
 U7ZhoWLob3ayEcjCVhCaBEC+fK9a5Eva6L5lTUoObPqzXwoysi2yGEQTqHlYwdfg
 unQuZUaiWEE9w1GG7CEx5xB9VmaY2U9bF7thUw6qMNmvCzqJr2goUcJ8E6y90s4w
 fR71YYkfYsXMuL1MnxCp/xOS5EbodYWskyLicNee7SlBAqInraaQG8syBGUFb+4S
 zvILMp7Zc1sBIxhATA==
 =1LW2
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.9-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull more hardening updates from Kees Cook:

 - CONFIG_MEMCPY_SLOW_KUNIT_TEST is no longer needed (Guenter Roeck)

 - Fix needless UTF-8 character in arch/Kconfig (Liu Song)

 - Improve __counted_by warning message in LKDTM (Nathan Chancellor)

 - Refactor DEFINE_FLEX() for default use of __counted_by

 - Disable signed integer overflow sanitizer on GCC < 8

* tag 'hardening-v6.9-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  lkdtm/bugs: Improve warning message for compilers without counted_by support
  overflow: Change DEFINE_FLEX to take __counted_by member
  Revert "kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST"
  arch/Kconfig: eliminate needless UTF-8 character in Kconfig help
  ubsan: Disable signed integer overflow sanitizer on GCC < 8
2024-03-23 08:43:21 -07:00
Thomas Gleixner
f2208aa12c x86/mpparse: Register APIC address only once
The APIC address is registered twice. First during the early detection and
afterwards when actually scanning the table for APIC IDs. The APIC and
topology core warn about the second attempt.

Restrict it to the early detection call.

Fixes: 81287ad65d ("x86/apic: Sanitize APIC address setup")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240322185305.297774848@linutronix.de
2024-03-23 12:41:48 +01:00
Thomas Gleixner
5e25eb25da x86/topology: Handle the !APIC case gracefully
If there is no local APIC enumerated and registered then the topology
bitmaps are empty. Therefore, topology_init_possible_cpus() will die with
a division by zero exception.

Prevent this by registering a fake APIC id to populate the topology
bitmap. This also allows to use all topology query interfaces
unconditionally. It does not affect the actual APIC code because either
the local APIC address was not registered or no local APIC could be
detected.

Fixes: f1f758a805 ("x86/topology: Add a mechanism to track topology via APIC IDs")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240322185305.242709302@linutronix.de
2024-03-23 12:35:56 +01:00
Thomas Gleixner
7af541cee1 x86/topology: Don't evaluate logical IDs during early boot
The local APICs have not yet been enumerated so the logical ID evaluation
from the topology bitmaps does not work and would return an error code.

Skip the evaluation during the early boot CPUID evaluation and only apply
it on the final run.

Fixes: 380414be78 ("x86/cpu/topology: Use topology logical mapping mechanism")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240322185305.186943142@linutronix.de
2024-03-23 12:28:06 +01:00
Thomas Gleixner
c90399fbd7 x86/cpu: Ensure that CPU info updates are propagated on UP
The boot sequence evaluates CPUID information twice:

  1) During early boot

  2) When finalizing the early setup right before
     mitigations are selected and alternatives are patched.

In both cases the evaluation is stored in boot_cpu_data, but on UP the
copying of boot_cpu_data to the per CPU info of the boot CPU happens
between #1 and #2. So any update which happens in #2 is never propagated to
the per CPU info instance.

Consolidate the whole logic and copy boot_cpu_data right before applying
alternatives as that's the point where boot_cpu_data is in it's final
state and not supposed to change anymore.

This also removes the voodoo mb() from smp_prepare_cpus_common() which
had absolutely no purpose.

Fixes: 71eb4893cf ("x86/percpu: Cure per CPU madness on UP")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240322185305.127642785@linutronix.de
2024-03-23 12:22:04 +01:00
Nathan Chancellor
231dc3f0c9 lkdtm/bugs: Improve warning message for compilers without counted_by support
The current message for telling the user that their compiler does not
support the counted_by attribute in the FAM_BOUNDS test does not make
much sense either grammatically or semantically. Fix it to make it
correct in both aspects.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20240321-lkdtm-improve-lack-of-counted_by-msg-v1-1-0fbf7481a29c@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-22 16:25:31 -07:00
Kees Cook
d8e45f2929 overflow: Change DEFINE_FLEX to take __counted_by member
The norm should be flexible array structures with __counted_by
annotations, so DEFINE_FLEX() is updated to expect that. Rename
the non-annotated version to DEFINE_RAW_FLEX(), and update the
few existing users. Additionally add selftests for the macros.

Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20240306235128.it.933-kees@kernel.org
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-22 16:25:31 -07:00
Linus Torvalds
bfa8f18691 SCSI misc on 20240322
The vfs has long had a write lifetime hint mechanism that gives the
 expected longevity on storage of the data being written.  f2fs was the
 original consumer of this and used the hint for flash data placement
 (mostly to avoid write amplification by placing objects with similar
 lifetimes in the same erase block).  More recently the SCSI based UFS
 (Universal Flash Storage) drivers have wanted to take advantage of
 this as well, for the same reasons as f2fs, necessitating plumbing the
 write hints through the block layer and then adding it to the SCSI
 core.  The vfs write_hints pull you've already taken plumbs this as
 far as block and this pull request completes the SCSI core enabling
 based on a recently agreed reuse of the old write command group
 number.  The additions to the scsi_debug driver are for emulating this
 property so we can run tests on it in the absence of an actual UFS
 device.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZf3owyYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishdaeAQCdEb5j
 d7sJiKTpSpgH0znAJ7UTvwUh6UJu/dj47D4qngEA43IULG8+L4GXDHL9mwCfeHC5
 PU7UewCEqnRBSBTdiEw=
 =9Xr2
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull more SCSI updates from James Bottomley:
 "The vfs has long had a write lifetime hint mechanism that gives the
  expected longevity on storage of the data being written. f2fs was the
  original consumer of this and used the hint for flash data placement
  (mostly to avoid write amplification by placing objects with similar
  lifetimes in the same erase block).

  More recently the SCSI based UFS (Universal Flash Storage) drivers
  have wanted to take advantage of this as well, for the same reasons as
  f2fs, necessitating plumbing the write hints through the block layer
  and then adding it to the SCSI core.

  The vfs write_hints already taken plumbs this as far as block and this
  completes the SCSI core enabling based on a recently agreed reuse of
  the old write command group number. The additions to the scsi_debug
  driver are for emulating this property so we can run tests on it in
  the absence of an actual UFS device"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: scsi_debug: Maintain write statistics per group number
  scsi: scsi_debug: Implement GET STREAM STATUS
  scsi: scsi_debug: Implement the IO Advice Hints Grouping mode page
  scsi: scsi_debug: Allocate the MODE SENSE response from the heap
  scsi: scsi_debug: Rework subpage code error handling
  scsi: scsi_debug: Rework page code error handling
  scsi: scsi_debug: Support the block limits extension VPD page
  scsi: scsi_debug: Reduce code duplication
  scsi: sd: Translate data lifetime information
  scsi: scsi_proto: Add structures and constants related to I/O groups and streams
  scsi: core: Query the Block Limits Extension VPD page
2024-03-22 13:31:07 -07:00