Commit Graph

1309150 Commits

Author SHA1 Message Date
Thorsten Blum
04afb0a3c3 ksmbd: Use struct_size() to improve get_file_alternate_info()
Use struct_size() to calculate the output buffer length.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-10-01 14:50:51 -05:00
Hans de Goede
ac78288fe0 ACPI: video: Add backlight=native quirk for Dell OptiPlex 5480 AIO
Dell All In One (AIO) models released after 2017 may use a backlight
controller board connected to an UART.

In DSDT this uart port will be defined as:

   Name (_HID, "DELL0501")
   Name (_CID, EisaId ("PNP0501")

The Dell OptiPlex 5480 AIO has an ACPI device for one of its UARTs with
the above _HID + _CID. Loading the dell-uart-backlight driver fails with
the following errors:

[   18.261353] dell_uart_backlight serial0-0: Timed out waiting for response.
[   18.261356] dell_uart_backlight serial0-0: error -ETIMEDOUT: getting firmware version
[   18.261359] dell_uart_backlight serial0-0: probe with driver dell_uart_backlight failed with error -110

Indicating that there is no backlight controller board attached to
the UART, while the GPU's native backlight control method does work.

Add a quirk to use the GPU's native backlight control method on this model.

Fixes: cd8e468efb ("ACPI: video: Add Dell UART backlight controller detection")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://patch.msgid.link/20240918153849.37221-1-hdegoede@redhat.com
[ rjw: Changelog edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-10-01 20:51:26 +02:00
Miquel Sabaté Solà
c0f02536ff cpufreq: Avoid a bad reference count on CPU node
In the parse_perf_domain function, if the call to
of_parse_phandle_with_args returns an error, then the reference to the
CPU device node that was acquired at the start of the function would not
be properly decremented.

Address this by declaring the variable with the __free(device_node)
cleanup attribute.

Signed-off-by: Miquel Sabaté Solà <mikisabate@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://patch.msgid.link/20240917134246.584026-1-mikisabate@gmail.com
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-10-01 20:43:44 +02:00
Uwe Kleine-König
8b4865cd90 cpufreq: intel_pstate: Make hwp_notify_lock a raw spinlock
notify_hwp_interrupt() is called via sysvec_thermal() ->
smp_thermal_vector() -> intel_thermal_interrupt() in hard irq context.
For this reason it must not use a simple spin_lock that sleeps with
PREEMPT_RT enabled. So convert it to a raw spinlock.

Reported-by: xiao sheng wen <atzlinux@sina.com>
Link: https://bugs.debian.org/1076483
Signed-off-by: Uwe Kleine-König <ukleinek@debian.org>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: xiao sheng wen <atzlinux@sina.com>
Link: https://patch.msgid.link/20240919081121.10784-2-ukleinek@debian.org
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-10-01 20:43:39 +02:00
Ben Dooks
68a16708d2
spi: s3c64xx: fix timeout counters in flush_fifo
In the s3c64xx_flush_fifo() code, the loops counter is post-decremented
in the do { } while(test && loops--) condition. This means the loops is
left at the unsigned equivalent of -1 if the loop times out. The test
after will never pass as if tests for loops == 0.

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Fixes: 230d42d422 ("spi: Add s3c64xx SPI Controller driver")
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://patch.msgid.link/20240924134009.116247-2-ben.dooks@codethink.co.uk
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-10-01 19:02:01 +01:00
Mark Brown
aafbb9af7c
ASoC: Intel: soc-acpi: Fix missing empty terminators
Merge series from Bard Liao <yung-chuan.liao@linux.intel.com>:

There is no links_num in struct snd_soc_acpi_mach {}, and we test
!link->num_adr as a condition to end the loop in hda_sdw_machine_select().
So an empty item in struct snd_soc_acpi_link_adr array is required.
2024-10-01 18:54:52 +01:00
Leo Martins
d6e7ac65d4 btrfs: disable rate limiting when debug enabled
Disable ratelimiting for btrfs_printk when CONFIG_BTRFS_DEBUG is
enabled. This allows for more verbose output which is often needed by
functions like btrfs_dump_space_info().

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Leo Martins <loemra.dev@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01 19:29:41 +02:00
Filipe Manana
41fd1e9406 btrfs: wait for fixup workers before stopping cleaner kthread during umount
During unmount, at close_ctree(), we have the following steps in this order:

1) Park the cleaner kthread - this doesn't destroy the kthread, it basically
   halts its execution (wake ups against it work but do nothing);

2) We stop the cleaner kthread - this results in freeing the respective
   struct task_struct;

3) We call btrfs_stop_all_workers() which waits for any jobs running in all
   the work queues and then free the work queues.

Syzbot reported a case where a fixup worker resulted in a crash when doing
a delayed iput on its inode while attempting to wake up the cleaner at
btrfs_add_delayed_iput(), because the task_struct of the cleaner kthread
was already freed. This can happen during unmount because we don't wait
for any fixup workers still running before we call kthread_stop() against
the cleaner kthread, which stops and free all its resources.

Fix this by waiting for any fixup workers at close_ctree() before we call
kthread_stop() against the cleaner and run pending delayed iputs.

The stack traces reported by syzbot were the following:

  BUG: KASAN: slab-use-after-free in __lock_acquire+0x77/0x2050 kernel/locking/lockdep.c:5065
  Read of size 8 at addr ffff8880272a8a18 by task kworker/u8:3/52

  CPU: 1 UID: 0 PID: 52 Comm: kworker/u8:3 Not tainted 6.12.0-rc1-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
  Workqueue: btrfs-fixup btrfs_work_helper
  Call Trace:
   <TASK>
   __dump_stack lib/dump_stack.c:94 [inline]
   dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
   print_address_description mm/kasan/report.c:377 [inline]
   print_report+0x169/0x550 mm/kasan/report.c:488
   kasan_report+0x143/0x180 mm/kasan/report.c:601
   __lock_acquire+0x77/0x2050 kernel/locking/lockdep.c:5065
   lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
   __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
   _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
   class_raw_spinlock_irqsave_constructor include/linux/spinlock.h:551 [inline]
   try_to_wake_up+0xb0/0x1480 kernel/sched/core.c:4154
   btrfs_writepage_fixup_worker+0xc16/0xdf0 fs/btrfs/inode.c:2842
   btrfs_work_helper+0x390/0xc50 fs/btrfs/async-thread.c:314
   process_one_work kernel/workqueue.c:3229 [inline]
   process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310
   worker_thread+0x870/0xd30 kernel/workqueue.c:3391
   kthread+0x2f0/0x390 kernel/kthread.c:389
   ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
   </TASK>

  Allocated by task 2:
   kasan_save_stack mm/kasan/common.c:47 [inline]
   kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
   unpoison_slab_object mm/kasan/common.c:319 [inline]
   __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:345
   kasan_slab_alloc include/linux/kasan.h:247 [inline]
   slab_post_alloc_hook mm/slub.c:4086 [inline]
   slab_alloc_node mm/slub.c:4135 [inline]
   kmem_cache_alloc_node_noprof+0x16b/0x320 mm/slub.c:4187
   alloc_task_struct_node kernel/fork.c:180 [inline]
   dup_task_struct+0x57/0x8c0 kernel/fork.c:1107
   copy_process+0x5d1/0x3d50 kernel/fork.c:2206
   kernel_clone+0x223/0x880 kernel/fork.c:2787
   kernel_thread+0x1bc/0x240 kernel/fork.c:2849
   create_kthread kernel/kthread.c:412 [inline]
   kthreadd+0x60d/0x810 kernel/kthread.c:765
   ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

  Freed by task 61:
   kasan_save_stack mm/kasan/common.c:47 [inline]
   kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
   kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
   poison_slab_object mm/kasan/common.c:247 [inline]
   __kasan_slab_free+0x59/0x70 mm/kasan/common.c:264
   kasan_slab_free include/linux/kasan.h:230 [inline]
   slab_free_hook mm/slub.c:2343 [inline]
   slab_free mm/slub.c:4580 [inline]
   kmem_cache_free+0x1a2/0x420 mm/slub.c:4682
   put_task_struct include/linux/sched/task.h:144 [inline]
   delayed_put_task_struct+0x125/0x300 kernel/exit.c:228
   rcu_do_batch kernel/rcu/tree.c:2567 [inline]
   rcu_core+0xaaa/0x17a0 kernel/rcu/tree.c:2823
   handle_softirqs+0x2c5/0x980 kernel/softirq.c:554
   __do_softirq kernel/softirq.c:588 [inline]
   invoke_softirq kernel/softirq.c:428 [inline]
   __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637
   irq_exit_rcu+0x9/0x30 kernel/softirq.c:649
   instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1037 [inline]
   sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1037
   asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702

  Last potentially related work creation:
   kasan_save_stack+0x3f/0x60 mm/kasan/common.c:47
   __kasan_record_aux_stack+0xac/0xc0 mm/kasan/generic.c:541
   __call_rcu_common kernel/rcu/tree.c:3086 [inline]
   call_rcu+0x167/0xa70 kernel/rcu/tree.c:3190
   context_switch kernel/sched/core.c:5318 [inline]
   __schedule+0x184b/0x4ae0 kernel/sched/core.c:6675
   schedule_idle+0x56/0x90 kernel/sched/core.c:6793
   do_idle+0x56a/0x5d0 kernel/sched/idle.c:354
   cpu_startup_entry+0x42/0x60 kernel/sched/idle.c:424
   start_secondary+0x102/0x110 arch/x86/kernel/smpboot.c:314
   common_startup_64+0x13e/0x147

  The buggy address belongs to the object at ffff8880272a8000
   which belongs to the cache task_struct of size 7424
  The buggy address is located 2584 bytes inside of
   freed 7424-byte region [ffff8880272a8000, ffff8880272a9d00)

  The buggy address belongs to the physical page:
  page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x272a8
  head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
  flags: 0xfff00000000040(head|node=0|zone=1|lastcpupid=0x7ff)
  page_type: f5(slab)
  raw: 00fff00000000040 ffff88801bafa500 dead000000000122 0000000000000000
  raw: 0000000000000000 0000000080040004 00000001f5000000 0000000000000000
  head: 00fff00000000040 ffff88801bafa500 dead000000000122 0000000000000000
  head: 0000000000000000 0000000080040004 00000001f5000000 0000000000000000
  head: 00fff00000000003 ffffea00009caa01 ffffffffffffffff 0000000000000000
  head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
  page dumped because: kasan: bad access detected
  page_owner tracks the page as allocated
  page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 2, tgid 2 (kthreadd), ts 71247381401, free_ts 71214998153
   set_page_owner include/linux/page_owner.h:32 [inline]
   post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537
   prep_new_page mm/page_alloc.c:1545 [inline]
   get_page_from_freelist+0x3039/0x3180 mm/page_alloc.c:3457
   __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4733
   alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265
   alloc_slab_page+0x6a/0x120 mm/slub.c:2413
   allocate_slab+0x5a/0x2f0 mm/slub.c:2579
   new_slab mm/slub.c:2632 [inline]
   ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3819
   __slab_alloc+0x58/0xa0 mm/slub.c:3909
   __slab_alloc_node mm/slub.c:3962 [inline]
   slab_alloc_node mm/slub.c:4123 [inline]
   kmem_cache_alloc_node_noprof+0x1fe/0x320 mm/slub.c:4187
   alloc_task_struct_node kernel/fork.c:180 [inline]
   dup_task_struct+0x57/0x8c0 kernel/fork.c:1107
   copy_process+0x5d1/0x3d50 kernel/fork.c:2206
   kernel_clone+0x223/0x880 kernel/fork.c:2787
   kernel_thread+0x1bc/0x240 kernel/fork.c:2849
   create_kthread kernel/kthread.c:412 [inline]
   kthreadd+0x60d/0x810 kernel/kthread.c:765
   ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
  page last free pid 5230 tgid 5230 stack trace:
   reset_page_owner include/linux/page_owner.h:25 [inline]
   free_pages_prepare mm/page_alloc.c:1108 [inline]
   free_unref_page+0xcd0/0xf00 mm/page_alloc.c:2638
   discard_slab mm/slub.c:2678 [inline]
   __put_partials+0xeb/0x130 mm/slub.c:3146
   put_cpu_partial+0x17c/0x250 mm/slub.c:3221
   __slab_free+0x2ea/0x3d0 mm/slub.c:4450
   qlink_free mm/kasan/quarantine.c:163 [inline]
   qlist_free_all+0x9a/0x140 mm/kasan/quarantine.c:179
   kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
   __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:329
   kasan_slab_alloc include/linux/kasan.h:247 [inline]
   slab_post_alloc_hook mm/slub.c:4086 [inline]
   slab_alloc_node mm/slub.c:4135 [inline]
   kmem_cache_alloc_noprof+0x135/0x2a0 mm/slub.c:4142
   getname_flags+0xb7/0x540 fs/namei.c:139
   do_sys_openat2+0xd2/0x1d0 fs/open.c:1409
   do_sys_open fs/open.c:1430 [inline]
   __do_sys_openat fs/open.c:1446 [inline]
   __se_sys_openat fs/open.c:1441 [inline]
   __x64_sys_openat+0x247/0x2a0 fs/open.c:1441
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

  Memory state around the buggy address:
   ffff8880272a8900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   ffff8880272a8980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  >ffff8880272a8a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                              ^
   ffff8880272a8a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   ffff8880272a8b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ==================================================================

Reported-by: syzbot+8aaf2df2ef0164ffe1fb@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/66fb36b1.050a0220.aab67.003b.GAE@google.com/
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01 19:29:33 +02:00
Qu Wenruo
c3b47f49e8 btrfs: fix a NULL pointer dereference when failed to start a new trasacntion
[BUG]
Syzbot reported a NULL pointer dereference with the following crash:

  FAULT_INJECTION: forcing a failure.
   start_transaction+0x830/0x1670 fs/btrfs/transaction.c:676
   prepare_to_relocate+0x31f/0x4c0 fs/btrfs/relocation.c:3642
   relocate_block_group+0x169/0xd20 fs/btrfs/relocation.c:3678
  ...
  BTRFS info (device loop0): balance: ended with status: -12
  Oops: general protection fault, probably for non-canonical address 0xdffffc00000000cc: 0000 [#1] PREEMPT SMP KASAN NOPTI
  KASAN: null-ptr-deref in range [0x0000000000000660-0x0000000000000667]
  RIP: 0010:btrfs_update_reloc_root+0x362/0xa80 fs/btrfs/relocation.c:926
  Call Trace:
   <TASK>
   commit_fs_roots+0x2ee/0x720 fs/btrfs/transaction.c:1496
   btrfs_commit_transaction+0xfaf/0x3740 fs/btrfs/transaction.c:2430
   del_balance_item fs/btrfs/volumes.c:3678 [inline]
   reset_balance_state+0x25e/0x3c0 fs/btrfs/volumes.c:3742
   btrfs_balance+0xead/0x10c0 fs/btrfs/volumes.c:4574
   btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3673
   vfs_ioctl fs/ioctl.c:51 [inline]
   __do_sys_ioctl fs/ioctl.c:907 [inline]
   __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f

[CAUSE]
The allocation failure happens at the start_transaction() inside
prepare_to_relocate(), and during the error handling we call
unset_reloc_control(), which makes fs_info->balance_ctl to be NULL.

Then we continue the error path cleanup in btrfs_balance() by calling
reset_balance_state() which will call del_balance_item() to fully delete
the balance item in the root tree.

However during the small window between set_reloc_contrl() and
unset_reloc_control(), we can have a subvolume tree update and created a
reloc_root for that subvolume.

Then we go into the final btrfs_commit_transaction() of
del_balance_item(), and into btrfs_update_reloc_root() inside
commit_fs_roots().

That function checks if fs_info->reloc_ctl is in the merge_reloc_tree
stage, but since fs_info->reloc_ctl is NULL, it results a NULL pointer
dereference.

[FIX]
Just add extra check on fs_info->reloc_ctl inside
btrfs_update_reloc_root(), before checking
fs_info->reloc_ctl->merge_reloc_tree.

That DEAD_RELOC_TREE handling is to prevent further modification to the
reloc tree during merge stage, but since there is no reloc_ctl at all,
we do not need to bother that.

Reported-by: syzbot+283673dbc38527ef9f3d@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/66f6bfa7.050a0220.38ace9.0019.GAE@google.com/
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01 19:22:37 +02:00
Filipe Manana
fa630df665 btrfs: send: fix invalid clone operation for file that got its size decreased
During an incremental send we may end up sending an invalid clone
operation, for the last extent of a file which ends at an unaligned offset
that matches the final i_size of the file in the send snapshot, in case
the file had its initial size (the size in the parent snapshot) decreased
in the send snapshot. In this case the destination will fail to apply the
clone operation because its end offset is not sector size aligned and it
ends before the current size of the file.

Sending the truncate operation always happens when we finish processing an
inode, after we process all its extents (and xattrs, names, etc). So fix
this by ensuring the file has a valid size before we send a clone
operation for an unaligned extent that ends at the final i_size of the
file. The size we truncate to matches the start offset of the clone range
but it could be any value between that start offset and the final size of
the file since the clone operation will expand the i_size if the current
size is smaller than the end offset. The start offset of the range was
chosen because it's always sector size aligned and avoids a truncation
into the middle of a page, which results in dirtying the page due to
filling part of it with zeroes and then making the clone operation at the
receiver trigger IO.

The following test reproduces the issue:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/sdi
  MNT=/mnt/sdi

  mkfs.btrfs -f $DEV
  mount $DEV $MNT

  # Create a file with a size of 256K + 5 bytes, having two extents, one
  # with a size of 128K and another one with a size of 128K + 5 bytes.
  last_ext_size=$((128 * 1024 + 5))
  xfs_io -f -d -c "pwrite -S 0xab -b 128K 0 128K" \
         -c "pwrite -S 0xcd -b $last_ext_size 128K $last_ext_size" \
         $MNT/foo

  # Another file which we will later clone foo into, but initially with
  # a larger size than foo.
  xfs_io -f -c "pwrite -S 0xef 0 1M" $MNT/bar

  btrfs subvolume snapshot -r $MNT/ $MNT/snap1

  # Now resize bar and clone foo into it.
  xfs_io -c "truncate 0" \
         -c "reflink $MNT/foo" $MNT/bar

  btrfs subvolume snapshot -r $MNT/ $MNT/snap2

  rm -f /tmp/send-full /tmp/send-inc
  btrfs send -f /tmp/send-full $MNT/snap1
  btrfs send -p $MNT/snap1 -f /tmp/send-inc $MNT/snap2

  umount $MNT
  mkfs.btrfs -f $DEV
  mount $DEV $MNT

  btrfs receive -f /tmp/send-full $MNT
  btrfs receive -f /tmp/send-inc $MNT

  umount $MNT

Running it before this patch:

  $ ./test.sh
  (...)
  At subvol snap1
  At snapshot snap2
  ERROR: failed to clone extents to bar: Invalid argument

A test case for fstests will be sent soon.

Reported-by: Ben Millwood <thebenmachine@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAJhrHS2z+WViO2h=ojYvBPDLsATwLbg+7JaNCyYomv0fUxEpQQ@mail.gmail.com/
Fixes: 46a6e10a1a ("btrfs: send: allow cloning non-aligned extent if it ends at i_size")
CC: stable@vger.kernel.org # 6.11
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01 19:15:12 +02:00
Filipe Manana
50c6f6e680 btrfs: tracepoints: end assignment with semicolon at btrfs_qgroup_extent event class
While running checkpatch.pl against a patch that modifies the
btrfs_qgroup_extent event class, it complained about using a comma instead
of a semicolon:

  $ ./scripts/checkpatch.pl qgroups/0003-btrfs-qgroups-remove-bytenr-field-from-struct-btrfs_.patch
  WARNING: Possible comma where semicolon could be used
  #215: FILE: include/trace/events/btrfs.h:1720:
  +		__entry->bytenr		= bytenr,
		__entry->num_bytes	= rec->num_bytes;

  total: 0 errors, 1 warnings, 184 lines checked

So replace the comma with a semicolon to silence checkpatch and possibly
other tools. It also makes the code consistent with the rest.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01 19:14:10 +02:00
Josef Bacik
db7e68b522 btrfs: drop the backref cache during relocation if we commit
Since the inception of relocation we have maintained the backref cache
across transaction commits, updating the backref cache with the new
bytenr whenever we COWed blocks that were in the cache, and then
updating their bytenr once we detected a transaction id change.

This works as long as we're only ever modifying blocks, not changing the
structure of the tree.

However relocation does in fact change the structure of the tree.  For
example, if we are relocating a data extent, we will look up all the
leaves that point to this data extent.  We will then call
do_relocation() on each of these leaves, which will COW down to the leaf
and then update the file extent location.

But, a key feature of do_relocation() is the pending list.  This is all
the pending nodes that we modified when we updated the file extent item.
We will then process all of these blocks via finish_pending_nodes, which
calls do_relocation() on all of the nodes that led up to that leaf.

The purpose of this is to make sure we don't break sharing unless we
absolutely have to.  Consider the case that we have 3 snapshots that all
point to this leaf through the same nodes, the initial COW would have
created a whole new path.  If we did this for all 3 snapshots we would
end up with 3x the number of nodes we had originally.  To avoid this we
will cycle through each of the snapshots that point to each of these
nodes and update their pointers to point at the new nodes.

Once we update the pointer to the new node we will drop the node we
removed the link for and all of its children via btrfs_drop_subtree().
This is essentially just btrfs_drop_snapshot(), but for an arbitrary
point in the snapshot.

The problem with this is that we will never reflect this in the backref
cache.  If we do this btrfs_drop_snapshot() for a node that is in the
backref tree, we will leave the node in the backref tree.  This becomes
a problem when we change the transid, as now the backref cache has
entire subtrees that no longer exist, but exist as if they still are
pointed to by the same roots.

In the best case scenario you end up with "adding refs to an existing
tree ref" errors from insert_inline_extent_backref(), where we attempt
to link in nodes on roots that are no longer valid.

Worst case you will double free some random block and re-use it when
there's still references to the block.

This is extremely subtle, and the consequences are quite bad.  There
isn't a way to make sure our backref cache is consistent between
transid's.

In order to fix this we need to simply evict the entire backref cache
anytime we cross transid's.  This reduces performance in that we have to
rebuild this backref cache every time we change transid's, but fixes the
bug.

This has existed since relocation was added, and is a pretty critical
bug.  There's a lot more cleanup that can be done now that this
functionality is going away, but this patch is as small as possible in
order to fix the problem and make it easy for us to backport it to all
the kernels it needs to be backported to.

Followup series will dismantle more of this code and simplify relocation
drastically to remove this functionality.

We have a reproducer that reproduced the corruption within a few minutes
of running.  With this patch it survives several iterations/hours of
running the reproducer.

Fixes: 3fd0a5585e ("Btrfs: Metadata ENOSPC handling for balance")
CC: stable@vger.kernel.org
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01 19:10:26 +02:00
Johannes Thumshirn
97f9782276 btrfs: also add stripe entries for NOCOW writes
NOCOW writes do not generate stripe_extent entries in the RAID stripe
tree, as the RAID stripe-tree feature initially was designed with a
zoned filesystem in mind and on a zoned filesystem, we do not allow NOCOW
writes. But the RAID stripe-tree feature is independent from the zoned
feature, so we must also do NOCOW writes for RAID stripe-tree filesystems.

Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01 19:09:04 +02:00
Filipe Manana
96c6ca7157 btrfs: send: fix buffer overflow detection when copying path to cache entry
Starting with commit c0247d289e ("btrfs: send: annotate struct
name_cache_entry with __counted_by()") we annotated the variable length
array "name" from the name_cache_entry structure with __counted_by() to
improve overflow detection. However that alone was not correct, because
the length of that array does not match the "name_len" field - it matches
that plus 1 to include the NUL string terminator, so that makes a
fortified kernel think there's an overflow and report a splat like this:

  strcpy: detected buffer overflow: 20 byte write of buffer size 19
  WARNING: CPU: 3 PID: 3310 at __fortify_report+0x45/0x50
  CPU: 3 UID: 0 PID: 3310 Comm: btrfs Not tainted 6.11.0-prnet #1
  Hardware name: CompuLab Ltd.  sbc-ihsw/Intense-PC2 (IPC2), BIOS IPC2_3.330.7 X64 03/15/2018
  RIP: 0010:__fortify_report+0x45/0x50
  Code: 48 8b 34 (...)
  RSP: 0018:ffff97ebc0d6f650 EFLAGS: 00010246
  RAX: 7749924ef60fa600 RBX: ffff8bf5446a521a RCX: 0000000000000027
  RDX: 00000000ffffdfff RSI: ffff97ebc0d6f548 RDI: ffff8bf84e7a1cc8
  RBP: ffff8bf548574080 R08: ffffffffa8c40e10 R09: 0000000000005ffd
  R10: 0000000000000004 R11: ffffffffa8c70e10 R12: ffff8bf551eef400
  R13: 0000000000000000 R14: 0000000000000013 R15: 00000000000003a8
  FS:  00007fae144de8c0(0000) GS:ffff8bf84e780000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fae14691690 CR3: 00000001027a2003 CR4: 00000000001706f0
  Call Trace:
   <TASK>
   ? __warn+0x12a/0x1d0
   ? __fortify_report+0x45/0x50
   ? report_bug+0x154/0x1c0
   ? handle_bug+0x42/0x70
   ? exc_invalid_op+0x1a/0x50
   ? asm_exc_invalid_op+0x1a/0x20
   ? __fortify_report+0x45/0x50
   __fortify_panic+0x9/0x10
  __get_cur_name_and_parent+0x3bc/0x3c0
   get_cur_path+0x207/0x3b0
   send_extent_data+0x709/0x10d0
   ? find_parent_nodes+0x22df/0x25d0
   ? mas_nomem+0x13/0x90
   ? mtree_insert_range+0xa5/0x110
   ? btrfs_lru_cache_store+0x5f/0x1e0
   ? iterate_extent_inodes+0x52d/0x5a0
   process_extent+0xa96/0x11a0
   ? __pfx_lookup_backref_cache+0x10/0x10
   ? __pfx_store_backref_cache+0x10/0x10
   ? __pfx_iterate_backrefs+0x10/0x10
   ? __pfx_check_extent_item+0x10/0x10
   changed_cb+0x6fa/0x930
   ? tree_advance+0x362/0x390
   ? memcmp_extent_buffer+0xd7/0x160
   send_subvol+0xf0a/0x1520
   btrfs_ioctl_send+0x106b/0x11d0
   ? __pfx___clone_root_cmp_sort+0x10/0x10
   _btrfs_ioctl_send+0x1ac/0x240
   btrfs_ioctl+0x75b/0x850
   __se_sys_ioctl+0xca/0x150
   do_syscall_64+0x85/0x160
   ? __count_memcg_events+0x69/0x100
   ? handle_mm_fault+0x1327/0x15c0
   ? __se_sys_rt_sigprocmask+0xf1/0x180
   ? syscall_exit_to_user_mode+0x75/0xa0
   ? do_syscall_64+0x91/0x160
   ? do_user_addr_fault+0x21d/0x630
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7fae145eeb4f
  Code: 00 48 89 (...)
  RSP: 002b:00007ffdf1cb09b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fae145eeb4f
  RDX: 00007ffdf1cb0ad0 RSI: 0000000040489426 RDI: 0000000000000004
  RBP: 00000000000078fe R08: 00007fae144006c0 R09: 00007ffdf1cb0927
  R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffdf1cb1ce8
  R13: 0000000000000003 R14: 000055c499fab2e0 R15: 0000000000000004
   </TASK>

Fix this by not storing the NUL string terminator since we don't actually
need it for name cache entries, this way "name_len" corresponds to the
actual size of the "name" array. This requires marking the "name" array
field with __nonstring and using memcpy() instead of strcpy() as
recommended by the guidelines at:

   https://github.com/KSPP/linux/issues/90

Reported-by: David Arendt <admin@prnet.org>
Link: https://lore.kernel.org/linux-btrfs/cee4591a-3088-49ba-99b8-d86b4242b8bd@prnet.org/
Fixes: c0247d289e ("btrfs: send: annotate struct name_cache_entry with __counted_by()")
CC: stable@vger.kernel.org # 6.11
Tested-by: David Arendt <admin@prnet.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01 19:06:30 +02:00
Boris Brezillon
f9e7ac6e2e drm/panthor: Don't add write fences to the shared BOs
The only user (the mesa gallium driver) is already assuming explicit
synchronization and doing the export/import dance on shared BOs. The
only reason we were registering ourselves as writers on external BOs
is because Xe, which was the reference back when we developed Panthor,
was doing so. Turns out Xe was wrong, and we really want bookkeep on
all registered fences, so userspace can explicitly upgrade those to
read/write when needed.

Fixes: 4bdca11507 ("drm/panthor: Add the driver frontend block")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Cc: <stable@vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240905070155.3254011-1-boris.brezillon@collabora.com
2024-10-01 18:41:02 +02:00
Boris Brezillon
7a1f30afe9 drm/panthor: Don't declare a queue blocked if deferred operations are pending
If deferred operations are pending, we want to wait for those to
land before declaring the queue blocked on a SYNC_WAIT. We need
this to deal with the case where the sync object is signalled through
a deferred SYNC_{ADD,SET} from the same queue. If we don't do that
and the group gets scheduled out before the deferred SYNC_{SET,ADD}
is executed, we'll end up with a timeout, because no external
SYNC_{SET,ADD} will make the scheduler reconsider the group for
execution.

Fixes: de85488138 ("drm/panthor: Add the scheduler logical block")
Cc: <stable@vger.kernel.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240905071914.3278599-1-boris.brezillon@collabora.com
2024-10-01 18:40:29 +02:00
Boris Brezillon
282864cc5d drm/panthor: Fix access to uninitialized variable in tick_ctx_cleanup()
The group variable can't be used to retrieve ptdev in our second loop,
because it points to the previously iterated list_head, not a valid
group. Get the ptdev object from the scheduler instead.

Cc: <stable@vger.kernel.org>
Fixes: d72f049087 ("drm/panthor: Allow driver compilation")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Closes: https://lore.kernel.org/r/202409302306.UDikqa03-lkp@intel.com/
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240930163742.87036-1-boris.brezillon@collabora.com
2024-10-01 18:37:14 +02:00
Boris Brezillon
fa998a9eac drm/panthor: Lock the VM resv before calling drm_gpuvm_bo_obtain_prealloc()
drm_gpuvm_bo_obtain_prealloc() will call drm_gpuvm_bo_put() on our
pre-allocated BO if the <BO,VM> association exists. Given we
only have one ref on preallocated_vm_bo, drm_gpuvm_bo_destroy() will
be called immediately, and we have to hold the VM resv lock when
calling this function.

Fixes: 647810ec24 ("drm/panthor: Add the MMU/VM logical block")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240913112722.492144-1-boris.brezillon@collabora.com
2024-10-01 18:37:13 +02:00
Liviu Dudau
2b55639a4e drm/panthor: Add FOP_UNSIGNED_OFFSET to fop_flags
Since commit 641bb4394f ("fs: move FMODE_UNSIGNED_OFFSET to fop_flags")
the FMODE_UNSIGNED_OFFSET flag has been moved to fop_flags and renamed,
but the patch failed to make the changes for the panthor driver.
When user space opens the render node the WARN() added by the patch
gets triggered.

Fixes: 641bb4394f ("fs: move FMODE_UNSIGNED_OFFSET to fop_flags")
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240920102802.2483367-1-liviu.dudau@arm.com
2024-10-01 18:37:13 +02:00
Uladzislau Rezki (Sony)
3c5d61ae91 rcu/kvfree: Refactor kvfree_rcu_queue_batch()
Improve readability of kvfree_rcu_queue_batch() function
in away that, after a first batch queuing, the loop is break
and success value is returned to a caller.

There is no reason to loop and check batches further as all
outstanding objects have already been picked and attached to
a certain batch to complete an offloading.

Fixes: 2b55d6a42d ("rcu/kvfree: Add kvfree_rcu_barrier() API")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Closes: https://lore.kernel.org/lkml/ZvWUt2oyXRsvJRNc@pc636/T/
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-10-01 18:30:42 +02:00
Nilay Shroff
77ced98f0f mm, slab: fix use of SLAB_SUPPORTS_SYSFS in kmem_cache_release()
The fix implemented in commit 4ec10268ed ("mm, slab: unlink slabinfo,
sysfs and debugfs immediately") caused a subtle side effect due to which
while destroying the kmem cache, the code path would never get into
sysfs_slab_release() function even though SLAB_SUPPORTS_SYSFS is defined
and slab state is FULL. Due to this side effect, we would never release
kobject defined for kmem cache and leak the associated memory.

The issue here's with the use of __is_defined() macro in kmem_cache_
release(). The __is_defined() macro expands to __take_second_arg(
arg1_or_junk 1, 0). If "arg1_or_junk" is defined to 1 then it expands to
__take_second_arg(0, 1, 0) and returns 1. If "arg1_or_junk" is NOT defined
to any value then it expands to __take_second_arg(... 1, 0) and returns 0.

In this particular issue, SLAB_SUPPORTS_SYSFS is defined without any
associated value and that causes __is_defined(SLAB_SUPPORTS_SYSFS) to
always evaluate to 0 and hence it would never invoke sysfs_slab_release().

This patch helps fix this issue by defining SLAB_SUPPORTS_SYSFS to 1.

Fixes: 4ec10268ed ("mm, slab: unlink slabinfo, sysfs and debugfs immediately")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Closes: https://lore.kernel.org/all/CAHj4cs9YCCcfmdxN43-9H3HnTYQsRtTYw1Kzq-L468GfLKAENA@mail.gmail.com/
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2024-10-01 18:26:53 +02:00
Maarten Lankhorst
2cd86f02c0 Merge remote-tracking branch 'drm/drm-fixes' into drm-misc-fixes
Required for a panthor fix that broke when
FOP_UNSIGNED_OFFSET was added in place of FMODE_UNSIGNED_OFFSET.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2024-10-01 18:09:41 +02:00
Mark Brown
76f972c2cf KVM: selftests: Fix build on architectures other than x86_64
The recent addition of support for testing with the x86 specific quirk
KVM_X86_QUIRK_SLOT_ZAP_ALL disabled in the generic memslot tests broke the
build of the KVM selftests for all other architectures:

In file included from include/kvm_util.h:8,
                 from include/memstress.h:13,
                 from memslot_modification_stress_test.c:21:
memslot_modification_stress_test.c: In function ‘main’:
memslot_modification_stress_test.c:176:38: error: ‘KVM_X86_QUIRK_SLOT_ZAP_ALL’ undeclared (first use in this function)
  176 |                                      KVM_X86_QUIRK_SLOT_ZAP_ALL);
      |                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~

Add __x86_64__ guard defines to avoid building the relevant code on other
architectures.

Fixes: 61de4c34b5 ("KVM: selftests: Test memslot move in memslot_perf_test with quirk disabled")
Fixes: 218f641500 ("KVM: selftests: Allow slot modification stress test with quirk disabled")
Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Message-ID: <20240930-kvm-build-breakage-v1-1-866fad3cc164@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-01 11:07:26 -04:00
Christian Brauner
59d39b9259
Documentation: add missing folio_queue entry
Add missing folio_queue entry.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/r/20241001133920.6e28637b@canb.auug.org.au
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-01 17:04:32 +02:00
Christian Brauner
f5c82730be
folio_queue: fix documentation
s/folioq_count/folioq_full/

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/r/20241001134729.3f65ae78@canb.auug.org.au
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-01 17:01:40 +02:00
Nuno Sa
c684771630 Input: adp5589-keys - fix adp5589_gpio_get_value()
The adp5589 seems to have the same behavior as similar devices as
explained in commit 910a9f5636 ("Input: adp5588-keys - get value from
data out when dir is out").

Basically, when the gpio is set as output we need to get the value from
ADP5589_GPO_DATA_OUT_A register instead of ADP5589_GPI_STATUS_A.

Fixes: 9d2e173644 ("Input: ADP5589 - new driver for I2C Keypad Decoder and I/O Expander")
Signed-off-by: Nuno Sa <nuno.sa@analog.com>
Link: https://lore.kernel.org/r/20241001-b4-dev-adp5589-fw-conversion-v1-2-fca0149dfc47@analog.com
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2024-10-01 07:48:49 -07:00
Nuno Sa
fb5cc65f97 Input: adp5589-keys - fix NULL pointer dereference
We register a devm action to call adp5589_clear_config() and then pass
the i2c client as argument so that we can call i2c_get_clientdata() in
order to get our device object. However, i2c_set_clientdata() is only
being set at the end of the probe function which means that we'll get a
NULL pointer dereference in case the probe function fails early.

Fixes: 30df385e35 ("Input: adp5589-keys - use devm_add_action_or_reset() for register clear")
Signed-off-by: Nuno Sa <nuno.sa@analog.com>
Link: https://lore.kernel.org/r/20241001-b4-dev-adp5589-fw-conversion-v1-1-fca0149dfc47@analog.com
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2024-10-01 07:48:48 -07:00
Yun Lu
160c826b4d selftest: hid: add missing run-hid-tools-tests.sh
HID test cases run tests using the run-hid-tools-tests.sh script.
When installed with "make install", the run-hid-tools-tests.sh
script will not be copied over, resulting in the following error message.

  make -C tools/testing/selftests/ TARGETS=hid install \
  	  INSTALL_PATH=$KSFT_INSTALL_PATH

  cd $KSFT_INSTALL_PATH
  ./run_kselftest.sh -c hid

selftests: hid: hid-core.sh
bash: ./run-hid-tools-tests.sh: No such file or directory

Add the run-hid-tools-tests.sh script to the TEST_FILES in the Makefile
for it to be installed.

Fixes: ffb85d5c9e ("selftests: hid: import hid-tools hid-core tests")
Signed-off-by: Yun Lu <luyun@kylinos.cn>
Acked-by: Benjamin Tissoires <bentiss@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-10-01 08:39:13 -06:00
Marek Vasut
048bbbdbf8 i2c: stm32f7: Do not prepare/unprepare clock during runtime suspend/resume
In case there is any sort of clock controller attached to this I2C bus
controller, for example Versaclock or even an AIC32x4 I2C codec, then
an I2C transfer triggered from the clock controller clk_ops .prepare
callback may trigger a deadlock on drivers/clk/clk.c prepare_lock mutex.

This is because the clock controller first grabs the prepare_lock mutex
and then performs the prepare operation, including its I2C access. The
I2C access resumes this I2C bus controller via .runtime_resume callback,
which calls clk_prepare_enable(), which attempts to grab the prepare_lock
mutex again and deadlocks.

Since the clock are already prepared since probe() and unprepared in
remove(), use simple clk_enable()/clk_disable() calls to enable and
disable the clock on runtime suspend and resume, to avoid hitting the
prepare_lock mutex.

Acked-by: Alain Volmat <alain.volmat@foss.st.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Fixes: 4e7bca6fc0 ("i2c: i2c-stm32f7: add PM Runtime support")
Cc: <stable@vger.kernel.org> # v5.0+
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-10-01 16:39:00 +02:00
Jason A. Donenfeld
a18c835779 selftests: vDSO: align getrandom states to cache line
This prevents false sharing, which makes a large difference on machines
with several NUMA nodes, such as on a dual socket Intel(R) Xeon(R) Gold
6338 CPU @ 2.00GHz, where the "bench-multi" test goes from 2.7s down to
1.9s. While this is just test code, it also forms the basis of how folks
will wind up implementing this in libraries, so we should implement this
simple cache alignment improvement here.

Suggested-by: Florian Weimer <fweimer@redhat.com>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-10-01 08:30:58 -06:00
Marc Zyngier
64a1d71661 KVM: arm64: Another reviewer reshuffle
It has been a while since James had any significant bandwidth to
review KVM/arm64 patches. But in the meantime, Joey has stepped up
and did a really good job reviewing some terrifying patch series.

Having talked with the interested parties, it appears that James
is unlikely to have time for KVM in the near future, and that Joey
is willing to take more responsibilities.

So let's appoint Joey as an official reviewer, and give James some
breathing space, as well as my personal thanks. I'm sure he will
be back one way or another!

Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Oliver Upton <oliver.upton@linux.dev>
Acked-by: Joey Gouly <joey.gouly@arm.com>
Acked-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20240927104956.1223658-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-10-01 15:25:23 +01:00
Mark Brown
a9f41588a9 KVM: arm64: Constrain the host to the maximum shared SVE VL with pKVM
When pKVM saves and restores the host floating point state on a SVE system,
it programs the vector length in ZCR_EL2.LEN to be whatever the maximum VL
for the PE is. But it uses a buffer allocated with kvm_host_sve_max_vl, the
maximum VL shared by all PEs in the system. This means that if we run on a
system where the maximum VLs are not consistent, we will overflow the buffer
on PEs which support larger VLs.

Since the host will not currently attempt to make use of non-shared VLs, fix
this by explicitly setting the EL2 VL to be the maximum shared VL when we
save and restore. This will enforce the limit on host VL usage. Should we
wish to support asymmetric VLs, this code will need to be updated along with
the required changes for the host:

  https://lore.kernel.org/r/20240730-kvm-arm64-fix-pkvm-sve-vl-v6-0-cae8a2e0bd66@kernel.org

Fixes: b5b9955617 ("KVM: arm64: Eagerly restore host fpsimd/sve state in pKVM")
Signed-off-by: Mark Brown <broonie@kernel.org>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/r/20240912-kvm-arm64-limit-guest-vl-v2-1-dd2c29cb2ac9@kernel.org
[maz: added punctuation to the commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-10-01 15:25:23 +01:00
Vincent Donnefort
78fee4198b KVM: arm64: Fix __pkvm_init_vcpu cptr_el2 error path
On an error, hyp_vcpu will be accessed while this memory has already
been relinquished to the host and unmapped from the hypervisor. Protect
the CPTR assignment with an early return.

Fixes: b5b9955617 ("KVM: arm64: Eagerly restore host fpsimd/sve state in pKVM")
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20240919110500.2345927-1-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-10-01 15:25:23 +01:00
Chang Yu
f6023535b5
netfs: Fix a KMSAN uninit-value error in netfs_clear_buffer
Use folioq_count instead of folioq_nr_slots to fix a KMSAN uninit-value
error in netfs_clear_buffer

Signed-off-by: Chang Yu <marcus.yu.56@gmail.com>
Link: https://lore.kernel.org/r/ZvuXWC2bYpvQsWgS@gmail.com
Fixes: cd0277ed0c ("netfs: Use new folio_queue data type and iterator instead of xarray iter")
Acked-by: David Howells <dhowells@redhat.com>
Reported-by: syzbot+921873345a95f4dae7e9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=921873345a95f4dae7e9
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-01 16:02:42 +02:00
Dan Carpenter
a04dae6fa4 ALSA: silence integer wrapping warning
This patch doesn't change runtime at all, it's just for kernel hardening.

The "count" here comes from the user and on 32bit systems, it leads to
integer wrapping when we pass it to compute_user_elem_size():

	alloc_size = compute_user_elem_size(private_size, count);

However, the integer over is harmless because later "count" is checked
when we pass it to snd_ctl_new():

	err = snd_ctl_new(&kctl, count, access, file);

These days as part of kernel hardening we're trying to avoid integer
overflows when they affect size_t type.  So to avoid the integer overflow
copy the check from snd_ctl_new() and do it at the start of the
snd_ctl_elem_add() function as well.

Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Link: https://patch.msgid.link/5457e8c1-01ff-4dd9-b49c-15b817f65ee7@stanley.mountain
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-10-01 14:56:40 +02:00
Mark Rutland
b3d6121eae arm64: fix selection of HAVE_DYNAMIC_FTRACE_WITH_ARGS
The Kconfig logic to select HAVE_DYNAMIC_FTRACE_WITH_ARGS is incorrect,
and HAVE_DYNAMIC_FTRACE_WITH_ARGS may be selected when it is not
supported by the combination of clang and GNU LD, resulting in link-time
errors:

  aarch64-linux-gnu-ld: .init.data has both ordered [`__patchable_function_entries' in init/main.o] and unordered [`.meminit.data' in mm/sparse.o] sections
  aarch64-linux-gnu-ld: final link failed: bad value

... which can be seen when building with CC=clang using a binutils
version older than 2.36.

We originally fixed that in commit:

  45bd895180 ("arm64: Improve HAVE_DYNAMIC_FTRACE_WITH_REGS selection for clang")

... by splitting the "select HAVE_DYNAMIC_FTRACE_WITH_ARGS" statement
into separete CLANG_SUPPORTS_DYNAMIC_FTRACE_WITH_ARGS and
GCC_SUPPORTS_DYNAMIC_FTRACE_WITH_ARGS options which individually select
HAVE_DYNAMIC_FTRACE_WITH_ARGS.

Subsequently we accidentally re-introduced the common "select
HAVE_DYNAMIC_FTRACE_WITH_ARGS" statement in commit:

  26299b3f6b ("ftrace: arm64: move from REGS to ARGS")

... then we removed it again in commit:

  68a63a412d ("arm64: Fix build with CC=clang, CONFIG_FTRACE=y and CONFIG_STACK_TRACER=y")

... then we accidentally re-introduced it again in commit:

  2aa6ac0351 ("arm64: ftrace: Add direct call support")

Fix this for the third time by keeping the unified select statement and
making this depend onf either GCC_SUPPORTS_DYNAMIC_FTRACE_WITH_ARGS or
CLANG_SUPPORTS_DYNAMIC_FTRACE_WITH_ARGS. This is more consistent with
usual style and less likely to go wrong in future.

Fixes: 2aa6ac0351 ("arm64: ftrace: Add direct call support")
Cc: <stable@vger.kernel.org> # 6.4.x
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240930120448.3352564-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-01 12:47:19 +01:00
Mark Rutland
081eb7932c arm64: errata: Expand speculative SSBS workaround once more
A number of Arm Ltd CPUs suffer from errata whereby an MSR to the SSBS
special-purpose register does not affect subsequent speculative
instructions, permitting speculative store bypassing for a window of
time.

We worked around this for a number of CPUs in commits:

* 7187bb7d0b ("arm64: errata: Add workaround for Arm errata 3194386 and 3312417")
* 75b3c43eab ("arm64: errata: Expand speculative SSBS workaround")
* 145502cac7ea70b5 ("arm64: errata: Expand speculative SSBS workaround (again)")

Since then, a (hopefully final) batch of updates have been published,
with two more affected CPUs. For the affected CPUs the existing
mitigation is sufficient, as described in their respective Software
Developer Errata Notice (SDEN) documents:

* Cortex-A715 (MP148) SDEN v15.0, erratum 3456084
  https://developer.arm.com/documentation/SDEN-2148827/1500/

* Neoverse-N3 (MP195) SDEN v5.0, erratum 3456111
  https://developer.arm.com/documentation/SDEN-3050973/0500/

Enable the existing mitigation by adding the relevant MIDRs to
erratum_spec_ssbs_list, and update silicon-errata.rst and the
Kconfig text accordingly.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240930111705.3352047-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-01 12:46:55 +01:00
Mark Rutland
924725707d arm64: cputype: Add Neoverse-N3 definitions
Add cputype definitions for Neoverse-N3. These will be used for errata
detection in subsequent patches.

These values can be found in Table A-261 ("MIDR_EL1 bit descriptions")
in issue 02 of the Neoverse-N3 TRM, which can be found at:

  https://developer.arm.com/documentation/107997/0000/?lang=en

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240930111705.3352047-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-01 12:46:54 +01:00
Mark Rutland
9abe390e68 arm64: Force position-independent veneers
Certain portions of code always need to be position-independent
regardless of CONFIG_RELOCATABLE, including code which is executed in an
idmap or which is executed before relocations are applied. In some
kernel configurations the LLD linker generates position-dependent
veneers for such code, and when executed these result in early boot-time
failures.

Marc Zyngier encountered a boot failure resulting from this when
building a (particularly cursed) configuration with LLVM, as he reported
to the list:

  https://lore.kernel.org/linux-arm-kernel/86wmjwvatn.wl-maz@kernel.org/

In Marc's kernel configuration, the .head.text and .rodata.text sections
end up more than 128MiB apart, requiring a veneer to branch between the
two:

| [mark@lakrids:~/src/linux]% usekorg 14.1.0 aarch64-linux-objdump -t vmlinux | grep -w _text
| ffff800080000000 g       .head.text     0000000000000000 _text
| [mark@lakrids:~/src/linux]% usekorg 14.1.0 aarch64-linux-objdump -t vmlinux | grep -w primary_entry
| ffff8000889df0e0 g       .rodata.text   000000000000006c primary_entry,

... consequently, LLD inserts a position-dependent veneer for the branch
from _stext (in .head.text) to primary_entry (in .rodata.text):

| ffff800080000000 <_text>:
| ffff800080000000:       fa405a4d        ccmp    x18, #0x0, #0xd, pl     // pl = nfrst
| ffff800080000004:       14003fff        b       ffff800080010000 <__AArch64AbsLongThunk_primary_entry>
...
| ffff800080010000 <__AArch64AbsLongThunk_primary_entry>:
| ffff800080010000:       58000050        ldr     x16, ffff800080010008 <__AArch64AbsLongThunk_primary_entry+0x8>
| ffff800080010004:       d61f0200        br      x16
| ffff800080010008:       889df0e0        .word   0x889df0e0
| ffff80008001000c:       ffff8000        .word   0xffff8000

... and as this is executed early in boot before the kernel is mapped in
TTBR1 this results in a silent boot failure.

Fix this by passing '--pic-veneer' to the linker, which will cause the
linker to use position-independent veneers, e.g.

| ffff800080000000 <_text>:
| ffff800080000000:       fa405a4d        ccmp    x18, #0x0, #0xd, pl     // pl = nfrst
| ffff800080000004:       14003fff        b       ffff800080010000 <__AArch64ADRPThunk_primary_entry>
...
| ffff800080010000 <__AArch64ADRPThunk_primary_entry>:
| ffff800080010000:       f004e3f0        adrp    x16, ffff800089c8f000 <__idmap_text_start>
| ffff800080010004:       91038210        add     x16, x16, #0xe0
| ffff800080010008:       d61f0200        br      x16

I've opted to pass '--pic-veneer' unconditionally, as:

* In addition to solving the boot failure, these sequences are generally
  nicer as they require fewer instructions and don't need to perform
  data accesses.

* While the position-independent veneer sequences have a limited +/-2GiB
  range, this is not a new restriction. Even kernels built with
  CONFIG_RELOCATABLE=n are limited to 2GiB in size as we have several
  structues using 32-bit relative offsets and PPREL32 relocations, which
  are similarly limited to +/-2GiB in range. These include extable
  entries, jump table entries, and alt_instr entries.

* GNU LD defaults to using position-independent veneers, and supports
  the same '--pic-veneer' option, so this change is not expected to
  adversely affect GNU LD.

I've tested with GNU LD 2.30 to 2.42 inclusive and LLVM 13.0.1 to 19.1.0
inclusive, using the kernel.org binaries from:

* https://mirrors.edge.kernel.org/pub/tools/crosstool/
* https://mirrors.edge.kernel.org/pub/tools/llvm/

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Marc Zyngier <maz@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240927101838.3061054-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-01 12:46:27 +01:00
Alexander Dahl
162d9b5d23
spi: atmel-quadspi: Fix wrong register value written to MR
aq->mr should go to MR, nothing else.

Fixes: 329ca3eed4 ("spi: atmel-quadspi: Avoid overwriting delay register settings")
Signed-off-by: Alexander Dahl <ada@thorsis.com>
Link: https://lore.kernel.org/linux-spi/20240926-macarena-wincing-7c4995487a29@thorsis.com/T/#u
Link: https://patch.msgid.link/20240926090356.105789-1-ada@thorsis.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-10-01 12:07:53 +01:00
Charles Keepax
cccb586f51
ASoC: Intel: soc-acpi: arl: Fix some missing empty terminators
Fixes: c052406765 ("ASoC: Intel: soc-acpi: arl: Add match entries for new cs42l43 laptops")
Signed-off-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://patch.msgid.link/20241001061738.34854-3-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-10-01 12:07:47 +01:00
Bard Liao
5afc29ba44
ASoC: Intel: soc-acpi-intel-rpl-match: add missing empty item
There is no links_num in struct snd_soc_acpi_mach {}, and we test
!link->num_adr as a condition to end the loop in hda_sdw_machine_select().
So an empty item in struct snd_soc_acpi_link_adr array is required.

Fixes: 65ab45b906 ("ASoC: Intel: soc-acpi: Add match entries for some cs42l43 laptops")
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://patch.msgid.link/20241001061738.34854-2-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2024-10-01 12:07:46 +01:00
Anton Danilov
c4a14f6d9d ipv4: ip_gre: Fix drops of small packets in ipgre_xmit
Regression Description:

Depending on the options specified for the GRE tunnel device, small
packets may be dropped. This occurs because the pskb_network_may_pull
function fails due to the packet's insufficient length.

For example, if only the okey option is specified for the tunnel device,
original (before encapsulation) packets smaller than 28 bytes (including
the IPv4 header) will be dropped. This happens because the required
length is calculated relative to the network header, not the skb->head.

Here is how the required length is computed and checked:

* The pull_len variable is set to 28 bytes, consisting of:
  * IPv4 header: 20 bytes
  * GRE header with Key field: 8 bytes

* The pskb_network_may_pull function adds the network offset, shifting
the checkable space further to the beginning of the network header and
extending it to the beginning of the packet. As a result, the end of
the checkable space occurs beyond the actual end of the packet.

Instead of ensuring that 28 bytes are present in skb->head, the function
is requesting these 28 bytes starting from the network header. For small
packets, this requested length exceeds the actual packet size, causing
the check to fail and the packets to be dropped.

This issue affects both locally originated and forwarded packets in
DMVPN-like setups.

How to reproduce (for local originated packets):

  ip link add dev gre1 type gre ikey 1.9.8.4 okey 1.9.8.4 \
          local <your-ip> remote 0.0.0.0

  ip link set mtu 1400 dev gre1
  ip link set up dev gre1
  ip address add 192.168.13.1/24 dev gre1
  ip neighbor add 192.168.13.2 lladdr <remote-ip> dev gre1
  ping -s 1374 -c 10 192.168.13.2
  tcpdump -vni gre1
  tcpdump -vni <your-ext-iface> 'ip proto 47'
  ip -s -s -d link show dev gre1

Solution:

Use the pskb_may_pull function instead the pskb_network_may_pull.

Fixes: 80d875cfc9 ("ipv4: ip_gre: Avoid skb_pull() failure in ipgre_xmit()")
Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240924235158.106062-1-littlesmilingcloud@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01 13:04:03 +02:00
Dmitry Torokhov
a3f9a74d21 Revert "Input: Add driver for PixArt PS/2 touchpad"
This reverts commit 740ff03d72 because
current PixArt detection is too greedy and claims devices that are
not PixArt.

Reported-by: Benjamin Tissoires <bentiss@kernel.org>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2314756
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2024-10-01 03:54:05 -07:00
Shenwei Wang
4c1b56671b net: stmmac: dwmac4: extend timeout for VLAN Tag register busy bit check
Increase the timeout for checking the busy bit of the VLAN Tag register
from 10µs to 500ms. This change is necessary to accommodate scenarios
where Energy Efficient Ethernet (EEE) is enabled.

Overnight testing revealed that when EEE is active, the busy bit can
remain set for up to approximately 300ms. The new 500ms timeout provides
a safety margin.

Fixes: ed64639bc1 ("net: stmmac: Add support for VLAN Rx filtering")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com>
Link: https://patch.msgid.link/20240924205424.573913-1-shenwei.wang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01 12:35:55 +02:00
Omar Sandoval
0d24852bd7
iov_iter: fix advancing slot in iter_folioq_get_pages()
iter_folioq_get_pages() decides to advance to the next folioq slot when
it has reached the end of the current folio. However, it is checking
offset, which is the beginning of the current part, instead of
iov_offset, which is adjusted to the end of the current part, so it
doesn't advance the slot when it's supposed to. As a result, on the next
iteration, we'll use the same folio with an out-of-bounds offset and
return an unrelated page.

This manifested as various crashes and other failures in 9pfs in drgn's
VM testing setup and BPF CI.

Fixes: db0aa2e956 ("mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios")
Link: https://lore.kernel.org/linux-fsdevel/20240923183432.1876750-1-chantr4@gmail.com/
Tested-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Link: https://lore.kernel.org/r/cbaf141ba6c0e2e209717d02746584072844841a.1727722269.git.osandov@fb.com
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Tested-by: Leon Romanovsky <leon@kernel.org>
Tested-by: Joey Gouly <joey.gouly@arm.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-01 11:49:57 +02:00
Pu Lehui
c625154993
drivers/perf: riscv: Align errno for unsupported perf event
RISC-V perf driver does not yet support PERF_TYPE_BREAKPOINT. It would
be more appropriate to return -EOPNOTSUPP or -ENOENT for this type in
pmu_sbi_event_map. Considering that other implementations return -ENOENT
for unsupported perf types, let's synchronize this behavior. Due to this
reason, a riscv bpf testcases perf_skip fail. Meanwhile, align that
behavior to the rest of proper place.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Fixes: 9b3e150e31 ("RISC-V: Add a simple platform driver for RISC-V legacy perf")
Fixes: 16d3b1af09 ("perf: RISC-V: Check standard event availability")
Fixes: e999143459 ("RISC-V: Add perf platform driver based on SBI PMU extension")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240831071520.1630360-1-pulehui@huaweicloud.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-10-01 02:47:39 -07:00
Paolo Abeni
23e19f2473 Merge branch 'net-two-fixes-for-qdisc_pkt_len_init'
Eric Dumazet says:

====================
net: two fixes for qdisc_pkt_len_init()

Inspired by one syzbot report.

At least one qdisc (fq_codel) depends on qdisc_skb_cb(skb)->pkt_len
having a sane value (not zero)

With the help of af_packet, syzbot was able to fool qdisc_pkt_len_init()
to precisely set qdisc_skb_cb(skb)->pkt_len to zero.

First patch fixes this issue.

Second one (a separate one to help future bisections) adds
more sanity check to SKB_GSO_DODGY users.
====================

Link: https://patch.msgid.link/20240924150257.1059524-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01 11:47:08 +02:00
Eric Dumazet
ab9a9a9e96 net: add more sanity checks to qdisc_pkt_len_init()
One path takes care of SKB_GSO_DODGY, assuming
skb->len is bigger than hdr_len.

virtio_net_hdr_to_skb() does not fully dissect TCP headers,
it only make sure it is at least 20 bytes.

It is possible for an user to provide a malicious 'GSO' packet,
total length of 80 bytes.

- 20 bytes of IPv4 header
- 60 bytes TCP header
- a small gso_size like 8

virtio_net_hdr_to_skb() would declare this packet as a normal
GSO packet, because it would see 40 bytes of payload,
bigger than gso_size.

We need to make detect this case to not underflow
qdisc_skb_cb(skb)->pkt_len.

Fixes: 1def9238d4 ("net_sched: more precise pkt_len computation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01 11:47:05 +02:00
Eric Dumazet
c20029db28 net: avoid potential underflow in qdisc_pkt_len_init() with UFO
After commit 7c6d2ecbda ("net: be more gentle about silly gso
requests coming from user") virtio_net_hdr_to_skb() had sanity check
to detect malicious attempts from user space to cook a bad GSO packet.

Then commit cf9acc90c8 ("net: virtio_net_hdr_to_skb: count
transport header in UFO") while fixing one issue, allowed user space
to cook a GSO packet with the following characteristic :

IPv4 SKB_GSO_UDP, gso_size=3, skb->len = 28.

When this packet arrives in qdisc_pkt_len_init(), we end up
with hdr_len = 28 (IPv4 header + UDP header), matching skb->len

Then the following sets gso_segs to 0 :

gso_segs = DIV_ROUND_UP(skb->len - hdr_len,
                        shinfo->gso_size);

Then later we set qdisc_skb_cb(skb)->pkt_len to back to zero :/

qdisc_skb_cb(skb)->pkt_len += (gso_segs - 1) * hdr_len;

This leads to the following crash in fq_codel [1]

qdisc_pkt_len_init() is best effort, we only want an estimation
of the bytes sent on the wire, not crashing the kernel.

This patch is fixing this particular issue, a following one
adds more sanity checks for another potential bug.

[1]
[   70.724101] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   70.724561] #PF: supervisor read access in kernel mode
[   70.724561] #PF: error_code(0x0000) - not-present page
[   70.724561] PGD 10ac61067 P4D 10ac61067 PUD 107ee2067 PMD 0
[   70.724561] Oops: Oops: 0000 [#1] SMP NOPTI
[   70.724561] CPU: 11 UID: 0 PID: 2163 Comm: b358537762 Not tainted 6.11.0-virtme #991
[   70.724561] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   70.724561] RIP: 0010:fq_codel_enqueue (net/sched/sch_fq_codel.c:120 net/sched/sch_fq_codel.c:168 net/sched/sch_fq_codel.c:230) sch_fq_codel
[ 70.724561] Code: 24 08 49 c1 e1 06 44 89 7c 24 18 45 31 ed 45 31 c0 31 ff 89 44 24 14 4c 03 8b 90 01 00 00 eb 04 39 ca 73 37 4d 8b 39 83 c7 01 <49> 8b 17 49 89 11 41 8b 57 28 45 8b 5f 34 49 c7 07 00 00 00 00 49
All code
========
   0:	24 08                	and    $0x8,%al
   2:	49 c1 e1 06          	shl    $0x6,%r9
   6:	44 89 7c 24 18       	mov    %r15d,0x18(%rsp)
   b:	45 31 ed             	xor    %r13d,%r13d
   e:	45 31 c0             	xor    %r8d,%r8d
  11:	31 ff                	xor    %edi,%edi
  13:	89 44 24 14          	mov    %eax,0x14(%rsp)
  17:	4c 03 8b 90 01 00 00 	add    0x190(%rbx),%r9
  1e:	eb 04                	jmp    0x24
  20:	39 ca                	cmp    %ecx,%edx
  22:	73 37                	jae    0x5b
  24:	4d 8b 39             	mov    (%r9),%r15
  27:	83 c7 01             	add    $0x1,%edi
  2a:*	49 8b 17             	mov    (%r15),%rdx		<-- trapping instruction
  2d:	49 89 11             	mov    %rdx,(%r9)
  30:	41 8b 57 28          	mov    0x28(%r15),%edx
  34:	45 8b 5f 34          	mov    0x34(%r15),%r11d
  38:	49 c7 07 00 00 00 00 	movq   $0x0,(%r15)
  3f:	49                   	rex.WB

Code starting with the faulting instruction
===========================================
   0:	49 8b 17             	mov    (%r15),%rdx
   3:	49 89 11             	mov    %rdx,(%r9)
   6:	41 8b 57 28          	mov    0x28(%r15),%edx
   a:	45 8b 5f 34          	mov    0x34(%r15),%r11d
   e:	49 c7 07 00 00 00 00 	movq   $0x0,(%r15)
  15:	49                   	rex.WB
[   70.724561] RSP: 0018:ffff95ae85e6fb90 EFLAGS: 00000202
[   70.724561] RAX: 0000000002000000 RBX: ffff95ae841de000 RCX: 0000000000000000
[   70.724561] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001
[   70.724561] RBP: ffff95ae85e6fbf8 R08: 0000000000000000 R09: ffff95b710a30000
[   70.724561] R10: 0000000000000000 R11: bdf289445ce31881 R12: ffff95ae85e6fc58
[   70.724561] R13: 0000000000000000 R14: 0000000000000040 R15: 0000000000000000
[   70.724561] FS:  000000002c5c1380(0000) GS:ffff95bd7fcc0000(0000) knlGS:0000000000000000
[   70.724561] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   70.724561] CR2: 0000000000000000 CR3: 000000010c568000 CR4: 00000000000006f0
[   70.724561] Call Trace:
[   70.724561]  <TASK>
[   70.724561] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
[   70.724561] ? page_fault_oops (arch/x86/mm/fault.c:715)
[   70.724561] ? exc_page_fault (./arch/x86/include/asm/irqflags.h:26 ./arch/x86/include/asm/irqflags.h:87 ./arch/x86/include/asm/irqflags.h:147 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539)
[   70.724561] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
[   70.724561] ? fq_codel_enqueue (net/sched/sch_fq_codel.c:120 net/sched/sch_fq_codel.c:168 net/sched/sch_fq_codel.c:230) sch_fq_codel
[   70.724561] dev_qdisc_enqueue (net/core/dev.c:3784)
[   70.724561] __dev_queue_xmit (net/core/dev.c:3880 (discriminator 2) net/core/dev.c:4390 (discriminator 2))
[   70.724561] ? irqentry_enter (kernel/entry/common.c:237)
[   70.724561] ? sysvec_apic_timer_interrupt (./arch/x86/include/asm/hardirq.h:74 (discriminator 2) arch/x86/kernel/apic/apic.c:1043 (discriminator 2) arch/x86/kernel/apic/apic.c:1043 (discriminator 2))
[   70.724561] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:58 (discriminator 4))
[   70.724561] ? asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:702)
[   70.724561] ? virtio_net_hdr_to_skb.constprop.0 (./include/linux/virtio_net.h:129 (discriminator 1))
[   70.724561] packet_sendmsg (net/packet/af_packet.c:3145 (discriminator 1) net/packet/af_packet.c:3177 (discriminator 1))
[   70.724561] ? _raw_spin_lock_bh (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:127 (discriminator 4) kernel/locking/spinlock.c:178 (discriminator 4))
[   70.724561] ? netdev_name_node_lookup_rcu (net/core/dev.c:325 (discriminator 1))
[   70.724561] __sys_sendto (net/socket.c:730 (discriminator 1) net/socket.c:745 (discriminator 1) net/socket.c:2210 (discriminator 1))
[   70.724561] ? __sys_setsockopt (./include/linux/file.h:34 net/socket.c:2355)
[   70.724561] __x64_sys_sendto (net/socket.c:2222 (discriminator 1) net/socket.c:2218 (discriminator 1) net/socket.c:2218 (discriminator 1))
[   70.724561] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
[   70.724561] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[   70.724561] RIP: 0033:0x41ae09

Fixes: cf9acc90c8 ("net: virtio_net_hdr_to_skb: count transport header in UFO")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jonathan Davies <jonathan.davies@nutanix.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jonathan Davies <jonathan.davies@nutanix.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01 11:47:05 +02:00