Fstests runs on my VMs have show several kmemleak reports like the following.
unreferenced object 0xffff88811ae59080 (size 64):
comm "xfs_io", pid 12124, jiffies 4294987392 (age 6.368s)
hex dump (first 32 bytes):
00 c0 1c 00 00 00 00 00 ff cf 1c 00 00 00 00 00 ................
90 97 e5 1a 81 88 ff ff 90 97 e5 1a 81 88 ff ff ................
backtrace:
[<00000000ac0176d2>] ulist_add_merge+0x60/0x150 [btrfs]
[<0000000076e9f312>] set_state_bits+0x86/0xc0 [btrfs]
[<0000000014fe73d6>] set_extent_bit+0x270/0x690 [btrfs]
[<000000004f675208>] set_record_extent_bits+0x19/0x20 [btrfs]
[<00000000b96137b1>] qgroup_reserve_data+0x274/0x310 [btrfs]
[<0000000057e9dcbb>] btrfs_check_data_free_space+0x5c/0xa0 [btrfs]
[<0000000019c4511d>] btrfs_delalloc_reserve_space+0x1b/0xa0 [btrfs]
[<000000006d37e007>] btrfs_dio_iomap_begin+0x415/0x970 [btrfs]
[<00000000fb8a74b8>] iomap_iter+0x161/0x1e0
[<0000000071dff6ff>] __iomap_dio_rw+0x1df/0x700
[<000000002567ba53>] iomap_dio_rw+0x5/0x20
[<0000000072e555f8>] btrfs_file_write_iter+0x290/0x530 [btrfs]
[<000000005eb3d845>] new_sync_write+0x106/0x180
[<000000003fb505bf>] vfs_write+0x24d/0x2f0
[<000000009bb57d37>] __x64_sys_pwrite64+0x69/0xa0
[<000000003eba3fdf>] do_syscall_64+0x43/0x90
In case brtfs_qgroup_reserve_data() or btrfs_delalloc_reserve_metadata()
fail the allocated extent_changeset will not be freed.
So in btrfs_check_data_free_space() and btrfs_delalloc_reserve_space()
free the allocated extent_changeset to get rid of the allocated memory.
The issue currently only happens in the direct IO write path, but only
after 65b3c08606e5 ("btrfs: fix ENOSPC failure when attempting direct IO
write into NOCOW range"), and also at defrag_one_locked_target(). Every
other place is always calling extent_changeset_free() even if its call
to btrfs_delalloc_reserve_space() or btrfs_check_data_free_space() has
failed.
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is a report of a transaction abort of -EAGAIN with the following
script.
#!/bin/sh
for d in sda sdb; do
mkfs.btrfs -d single -m single -f /dev/\${d}
done
mount /dev/sda /mnt/test
mount /dev/sdb /mnt/scratch
for dir in test scratch; do
echo 3 >/proc/sys/vm/drop_caches
fio --directory=/mnt/\${dir} --name=fio.\${dir} --rw=read --size=50G --bs=64m \
--numjobs=$(nproc) --time_based --ramp_time=5 --runtime=480 \
--group_reporting |& tee /dev/shm/fio.\${dir}
echo 3 >/proc/sys/vm/drop_caches
done
for d in sda sdb; do
umount /dev/\${d}
done
The stack trace is shown in below.
[3310.967991] BTRFS: error (device sda) in btrfs_commit_transaction:2341: errno=-11 unknown (Error while writing out transaction)
[3310.968060] BTRFS info (device sda): forced readonly
[3310.968064] BTRFS warning (device sda): Skipping commit of aborted transaction.
[3310.968065] ------------[ cut here ]------------
[3310.968066] BTRFS: Transaction aborted (error -11)
[3310.968074] WARNING: CPU: 14 PID: 1684 at fs/btrfs/transaction.c:1946 btrfs_commit_transaction.cold+0x209/0x2c8
[3310.968131] CPU: 14 PID: 1684 Comm: fio Not tainted 5.14.10-300.fc35.x86_64 #1
[3310.968135] Hardware name: DIAWAY Tartu/Tartu, BIOS V2.01.B10 04/08/2021
[3310.968137] RIP: 0010:btrfs_commit_transaction.cold+0x209/0x2c8
[3310.968144] RSP: 0018:ffffb284ce393e10 EFLAGS: 00010282
[3310.968147] RAX: 0000000000000026 RBX: ffff973f147b0f60 RCX: 0000000000000027
[3310.968149] RDX: ffff974ecf098a08 RSI: 0000000000000001 RDI: ffff974ecf098a00
[3310.968150] RBP: ffff973f147b0f08 R08: 0000000000000000 R09: ffffb284ce393c48
[3310.968151] R10: ffffb284ce393c40 R11: ffffffff84f47468 R12: ffff973f101bfc00
[3310.968153] R13: ffff971f20cf2000 R14: 00000000fffffff5 R15: ffff973f147b0e58
[3310.968154] FS: 00007efe65468740(0000) GS:ffff974ecf080000(0000) knlGS:0000000000000000
[3310.968157] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3310.968158] CR2: 000055691bcbe260 CR3: 000000105cfa4001 CR4: 0000000000770ee0
[3310.968160] PKRU: 55555554
[3310.968161] Call Trace:
[3310.968167] ? dput+0xd4/0x300
[3310.968174] btrfs_sync_file+0x3f1/0x490
[3310.968180] __x64_sys_fsync+0x33/0x60
[3310.968185] do_syscall_64+0x3b/0x90
[3310.968190] entry_SYSCALL_64_after_hwframe+0x44/0xae
[3310.968194] RIP: 0033:0x7efe6557329b
[3310.968200] RSP: 002b:00007ffe0236ebc0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[3310.968203] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007efe6557329b
[3310.968204] RDX: 0000000000000000 RSI: 00007efe58d77010 RDI: 0000000000000006
[3310.968205] RBP: 0000000004000000 R08: 0000000000000000 R09: 00007efe58d77010
[3310.968207] R10: 0000000016cacc0c R11: 0000000000000293 R12: 00007efe5ce95980
[3310.968208] R13: 0000000000000000 R14: 00007efe6447c790 R15: 0000000c80000000
[3310.968212] ---[ end trace 1a346f4d3c0d96ba ]---
[3310.968214] BTRFS: error (device sda) in cleanup_transaction:1946: errno=-11 unknown
The abort occurs because of a write hole while writing out freeing tree
nodes of a tree-log tree. For zoned btrfs, we re-dirty a freed tree
node to ensure btrfs can write the region and does not leave a hole on
write on a zoned device. The current code fails to re-dirty a node
when the tree-log tree's depth is greater or equal to 2. That leads to
a transaction abort with -EAGAIN.
Fix the issue by properly re-dirtying a node on walking up the tree.
Fixes: d3575156f6 ("btrfs: zoned: redirty released extent buffers")
CC: stable@vger.kernel.org # 5.12+
Link: https://github.com/kdave/btrfs-progs/issues/415
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
generic/484 fails sometimes with compression on because the write ends
up small enough that it goes into the btree. This means that we never
call mapping_set_error() on the inode itself, because the page gets
marked as fine when we inline it into the metadata. When the metadata
writeback happens we see it and abort the transaction properly and mark
the fs as readonly, however we don't do the mapping_set_error() on
anything. In syncfs() we will simply return 0 if the sb is marked
read-only, so we can't check for this in our syncfs callback. The only
way the error gets returned if we called mapping_set_error() on
something. Fix this by calling mapping_set_error() on the btree inode
mapping. This allows us to properly return an error on syncfs and pass
generic/484 with compression on.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
I got dmesg errors on generic/281 on our overnight fstests. Looking at
the history this happens occasionally, with errors like this
WARNING: CPU: 0 PID: 673217 at fs/btrfs/extent_io.c:6848 assert_eb_page_uptodate+0x3f/0x50
CPU: 0 PID: 673217 Comm: kworker/u4:13 Tainted: G W 5.16.0-rc2+ #469
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Workqueue: btrfs-cache btrfs_work_helper
RIP: 0010:assert_eb_page_uptodate+0x3f/0x50
RSP: 0018:ffffae598230bc60 EFLAGS: 00010246
RAX: 0017ffffc0002112 RBX: ffffebaec4100900 RCX: 0000000000001000
RDX: ffffebaec45733c7 RSI: ffffebaec4100900 RDI: ffff9fd98919f340
RBP: 0000000000000d56 R08: ffff9fd98e300000 R09: 0000000000000000
R10: 0001207370a91c50 R11: 0000000000000000 R12: 00000000000007b0
R13: ffff9fd98919f340 R14: 0000000001500000 R15: 0000000001cb0000
FS: 0000000000000000(0000) GS:ffff9fd9fbc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f549fcf8940 CR3: 0000000114908004 CR4: 0000000000370ef0
Call Trace:
extent_buffer_test_bit+0x3f/0x70
free_space_test_bit+0xa6/0xc0
load_free_space_tree+0x1d6/0x430
caching_thread+0x454/0x630
? rcu_read_lock_sched_held+0x12/0x60
? rcu_read_lock_sched_held+0x12/0x60
? rcu_read_lock_sched_held+0x12/0x60
? lock_release+0x1f0/0x2d0
btrfs_work_helper+0xf2/0x3e0
? lock_release+0x1f0/0x2d0
? finish_task_switch.isra.0+0xf9/0x3a0
process_one_work+0x270/0x5a0
worker_thread+0x55/0x3c0
? process_one_work+0x5a0/0x5a0
kthread+0x174/0x1a0
? set_kthread_struct+0x40/0x40
ret_from_fork+0x1f/0x30
This happens because we're trying to read from a extent buffer page that
is !PageUptodate. This happens because we will clear the page uptodate
when we have an IO error, but we don't clear the extent buffer uptodate.
If we do a read later and find this extent buffer we'll think its valid
and not return an error, and then trip over this warning.
Fix this by also clearing uptodate on the extent buffer when this
happens, so that we get an error when we do a btrfs_search_slot() and
find this block later.
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This commit adds BPF verifier selftests that cover all corner cases by
packet boundary checks. Specifically, 8-byte packet reads are tested at
the beginning of data and at the beginning of data_meta, using all kinds
of boundary checks (all comparison operators: <, >, <=, >=; both
permutations of operands: data + length compared to end, end compared to
data + length). For each case there are three tests:
1. Length is just enough for an 8-byte read. Length is either 7 or 8,
depending on the comparison.
2. Length is increased by 1 - should still pass the verifier. These
cases are useful, because they failed before commit 2fa7d94afc
("bpf: Fix the off-by-two error in range markings").
3. Length is decreased by 1 - should be rejected by the verifier.
Some existing tests are just renamed to avoid duplication.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20211207081521.41923-1-maximmi@nvidia.com
We've always been failing generic/260 because it's testing things we
actually don't care about and thus won't fail for. However we probably
should fail for fstrim_range->start == U64_MAX since we clearly can't
trim anything past that. This in combination with an update to
generic/260 will allow us to pass this test properly.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
If memdup_user() fails the error handing will crash when it tries
to kfree() an error pointer. Just return directly because there is
no cleanup required.
Fixes: 1a15eb724a ("btrfs: use btrfs_get_dev_args_from_path in dev removal ioctls")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The VCoRefLow CPU FIVR register definition for Tiger Lake is incorrect.
Current implementation reads it from MMIO offset 0x5A18 and bit
offset [12:14], but the actual correct register definition is from
bit offset [11:13].
Update to fix the bit offset.
Fixes: 473be51142 ("thermal: int340x: processor_thermal: Add RFIM driver")
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Cc: 5.14+ <stable@vger.kernel.org> # 5.14+
[ rjw: New subject, changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The kerneldoc comment of pm_runtime_active() does not reflect the
behavior of the function, so update it accordingly.
Fixes: 403d2d116e ("PM: runtime: Add kerneldoc comments to multiple helpers")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Compiling the ACPI tools when output directory parameter is specified,
but the output directory is not present, triggers the following error:
make O=/data/test/tmp/ -C tools/power/acpi/
make: Entering directory '/data/src/kernel/linux/tools/power/acpi'
DESCEND tools/acpidbg
make[1]: Entering directory '/data/src/kernel/linux/tools/power/acpi/tools/acpidbg'
MKDIR include
CP include
CC tools/acpidbg/acpidbg.o
Assembler messages:
Fatal error: can't create /data/test/tmp/tools/power/acpi/tools/acpidbg/acpidbg.o: No such file or directory
make[1]: *** [../../Makefile.rules:24: /data/test/tmp/tools/power/acpi/tools/acpidbg/acpidbg.o] Error 1
make[1]: Leaving directory '/data/src/kernel/linux/tools/power/acpi/tools/acpidbg'
make: *** [Makefile:18: acpidbg] Error 2
make: Leaving directory '/data/src/kernel/linux/tools/power/acpi'
which occurs because the output directory has not been created yet.
Fix this issue by creating the output directory before compiling.
Reported-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
[ rjw: New subject, changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
As people have been asking to allow non-root processes to have access to
the tracefs directory, it was considered best to only allow groups to have
access to the directory, where it is easier to just set the tracefs file
system to a specific group (as other would be too dangerous), and that way
the admins could pick which processes would have access to tracefs.
Unfortunately, this broke tooling on Android that expected the other bit
to be set. For some special cases, for non-root tools to trace the system,
tracefs would be mounted and change the permissions of the top level
directory which gave access to all running tasks permission to the
tracing directory. Even though this would be dangerous to do in a
production environment, for testing environments this can be useful.
Now with the new changes to not allow other (which is still the proper
thing to do), it breaks the testing tooling. Now more code needs to be
loaded on the system to change ownership of the tracing directory.
The real solution is to have tracefs honor the gid=xxx option when
mounting. That is,
(tracing group tracing has value 1003)
mount -t tracefs -o gid=1003 tracefs /sys/kernel/tracing
should have it that all files in the tracing directory should be of the
given group.
Copy the logic from d_walk() from dcache.c and simplify it for the mount
case of tracefs if gid is set. All the files in tracefs will be walked and
their group will be set to the value passed in.
Link: https://lkml.kernel.org/r/20211207171729.2a54e1b3@gandalf.local.home
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reported-by: Kalesh Singh <kaleshsingh@google.com>
Reported-by: Yabin Cui <yabinc@google.com>
Fixes: 49d67e4457 ("tracefs: Have tracefs directories not set OTH permission bits by default")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
If directories in tracefs have their ownership changed, then any new files
and directories that are created under those directories should inherit
the ownership of the director they are created in.
Link: https://lkml.kernel.org/r/20211208075720.4855d180@gandalf.local.home
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Yabin Cui <yabinc@google.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: stable@vger.kernel.org
Fixes: 4282d60689 ("tracefs: Add new tracefs file system")
Reported-by: Kalesh Singh <kaleshsingh@google.com>
Reported: https://lore.kernel.org/all/CAC_TJve8MMAv+H_NdLSJXZUSoxOEq2zB_pVaJ9p=7H6Bu3X76g@mail.gmail.com/
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Note 1st set were before the merge window.
Biggest set in here fix what happens when things go wrong in
the interrupt handlers for an IIO trigger.
Otherwise normal mix of recent and ancient bugs.
trigger core
- Fix reference counting bug that was preventing the iio_trig
structures from being released.
adxrs290
- Correctly sign extend the rate and temperature data.
at91-sama5d2
- Fix sign extension from the wrong bit and use the scan_type
values to avoid it being open coded in two places (which were
out of sync)
axp20x_adc
- Fix current reporting bit depth.
dln2-adc
- Fix a lock ordering issue and lockdep complaint that results.
- Add error handling for failure to register the trigger.
imx8qxp
- Wrong config dependency
kxcjk-1013
- Potential leak due to wrong guard on cleanup.
ltr501, kxsd9, stk3310, itg3200, ad7768
- Don't return error codes from interrupt handler and call
iio_trigger_notify_done() on all paths to avoid leaving
trigger disabled on an intermittent fault.
mma8452
- Fix missing iio_trigger_get() that could lead to use after free.
stm32
- Fix a current leak.
- Avoid null pointer derefence on defer_probe error due to wrong
struct device being passed.
stm32-timer
- Drop space in MODULE_ALIAS.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEbilms4eEBlKRJoGxVIU0mcT0FogFAmGwgs0RHGppYzIzQGtl
cm5lbC5vcmcACgkQVIU0mcT0Foiovw//bGSpzC3Jyr+rAw47/i+2lem2kjfiaJAg
OqET3KCjk5auKv6rAA8BtyQiaq4uqhpykC6Niz7klUhwA1gMUqswqFHQtOocnz2T
YzyRN3ew4IKOiTUykOUhr4HqyHVSKJhEgXB/1E86EXA5hjqXypaKE3ObWNE46SkO
1vw8Kyx4cH/HaOnF7/piZxMmTAIUQSKDjRkcpUJtCx6IVlqnGGsj6bF2npY0MoWJ
jAcB/u4XTYMH8oKEJpKvRmJ6l8BJWtbd/3e4fb0lP0pZGMyRWhfbuLWUkUOr6+8/
0p5ZSuKqOOjogNsrj/ENvv2UJskWkrLtLLaS6C2mo+4nD2mn3ZVo2vSpnSOifR/J
TPSOAHH4+Vfb+OjlN791PB9vJE9D9/j340EtEGRDkbqA3uYTbYsMYr3+1ySG8g4E
VJTkkeCz6cNQjLvPL+rBd4v7PepCEtx8risBoClcWuRmAJ4ffVjeApl2e817ZJNe
bDTrQun6wTY1Jscwhzb8y5tKnlQvZx//tUdINLTiICLSQRikkmuiMTu1/M4UD209
k1+2jNX3pxlgBdqGzn57G3JQgT69X92l4PzHtX7rn28P5IY+j3j9Vp0tpc17VSTV
Z32HQPkB+cxHcKPTmddWOm9C7rA7s8OfqGQ/kD40Cj7vVO5epWAaMc4lglALPxeW
wqAzgugeyjQ=
=V2My
-----END PGP SIGNATURE-----
Merge tag 'iio-fixes-for-5.16b' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next
Jonathan writes:
2nd set of IIO fixes for 5.16
Note 1st set were before the merge window.
Biggest set in here fix what happens when things go wrong in
the interrupt handlers for an IIO trigger.
Otherwise normal mix of recent and ancient bugs.
trigger core
- Fix reference counting bug that was preventing the iio_trig
structures from being released.
adxrs290
- Correctly sign extend the rate and temperature data.
at91-sama5d2
- Fix sign extension from the wrong bit and use the scan_type
values to avoid it being open coded in two places (which were
out of sync)
axp20x_adc
- Fix current reporting bit depth.
dln2-adc
- Fix a lock ordering issue and lockdep complaint that results.
- Add error handling for failure to register the trigger.
imx8qxp
- Wrong config dependency
kxcjk-1013
- Potential leak due to wrong guard on cleanup.
ltr501, kxsd9, stk3310, itg3200, ad7768
- Don't return error codes from interrupt handler and call
iio_trigger_notify_done() on all paths to avoid leaving
trigger disabled on an intermittent fault.
mma8452
- Fix missing iio_trigger_get() that could lead to use after free.
stm32
- Fix a current leak.
- Avoid null pointer derefence on defer_probe error due to wrong
struct device being passed.
stm32-timer
- Drop space in MODULE_ALIAS.
* tag 'iio-fixes-for-5.16b' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio:
iio: trigger: stm32-timer: fix MODULE_ALIAS
iio: adc: stm32: fix null pointer on defer_probe error
iio: at91-sama5d2: Fix incorrect sign extension
iio: adc: axp20x_adc: fix charging current reporting on AXP22x
iio: gyro: adxrs290: fix data signedness
iio: ad7768-1: Call iio_trigger_notify_done() on error
iio: itg3200: Call iio_trigger_notify_done() on error
iio: imx8qxp-adc: fix dependency to the intended ARCH_MXC config
iio: dln2: Check return value of devm_iio_trigger_register()
iio: trigger: Fix reference counting
iio: dln2-adc: Fix lockdep complaint
iio: adc: stm32: fix a current leak by resetting pcsel before disabling vdda
iio: mma8452: Fix trigger reference couting
iio: stk3310: Don't return error code in interrupt handler
iio: kxsd9: Don't return error code in trigger handler
iio: ltr501: Don't return error code in trigger handler
iio: accel: kxcjk-1013: Fix possible memory leak in probe and remove
INVALL CMD specifies that the ITS must ensure any caching associated with
the interrupt collection defined by ICID is consistent with the LPI
configuration tables held in memory for all Redistributors. SYNC is
required to ensure that INVALL is executed.
Currently, LPI configuration data may be inconsistent with that in the
memory within a short period of time after the INVALL command is executed.
Signed-off-by: Wudi Wang <wangwudi@hisilicon.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Fixes: cc2d3216f5 ("irqchip: GICv3: ITS command queue")
Link: https://lore.kernel.org/r/20211208015429.5007-1-zhangshaokun@hisilicon.com
When KVM runs as a nested hypervisor on top of Hyper-V it uses Enlightened
VMCS and enables Enlightened MSR Bitmap feature for its L1s and L2s (which
are actually L2s and L3s from Hyper-V's perspective). When MSR bitmap is
updated, KVM has to reset HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP from
clean fields to make Hyper-V aware of the change. For KVM's L1s, this is
done in vmx_disable_intercept_for_msr()/vmx_enable_intercept_for_msr().
MSR bitmap for L2 is build in nested_vmx_prepare_msr_bitmap() by blending
MSR bitmap for L1 and L1's idea of MSR bitmap for L2. KVM, however, doesn't
check if the resulting bitmap is different and never cleans
HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP in eVMCS02. This is incorrect and
may result in Hyper-V missing the update.
The issue could've been solved by calling evmcs_touch_msr_bitmap() for
eVMCS02 from nested_vmx_prepare_msr_bitmap() unconditionally but doing so
would not give any performance benefits (compared to not using Enlightened
MSR Bitmap at all). 3-level nesting is also not a very common setup
nowadays.
Don't enable 'Enlightened MSR Bitmap' feature for KVM's L2s (real L3s) for
now.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211129094704.326635-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fix typo which will cause fpe and privilege exception error.
Signed-off-by: Kelly Devilliv <kelly.devilliv@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Background:
We have a customer is running a Profinet stack on the 8MM which receives and
responds PNIO packets every 4ms and PNIO-CM packets every 40ms. However, from
time to time the received PNIO-CM package is "stock" and is only handled when
receiving a new PNIO-CM or DCERPC-Ping packet (tcpdump shows the PNIO-CM and
the DCERPC-Ping packet at the same time but the PNIO-CM HW timestamp is from
the expected 40 ms and not the 2s delay of the DCERPC-Ping).
After debugging, we noticed PNIO, PNIO-CM and DCERPC-Ping packets would
be handled by different RX queues.
The root cause should be driver ack all queues' interrupt when handle a
specific queue in fec_enet_rx_queue(). The blamed patch is introduced to
receive as much packets as possible once to avoid interrupt flooding.
But it's unreasonable to clear other queues'interrupt when handling one
queue, this patch tries to fix it.
Fixes: ed63f1dcd5 (net: fec: clear receive interrupts before processing a packet)
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Reported-by: Nicolas Diaz <nicolas.diaz@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20211206135457.15946-1-qiangqing.zhang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2021-12-06
This series contains updates to iavf and i40e drivers.
Mitch adds restoration of MSI state during reset for iavf.
Michal fixes checking and reporting of descriptor count changes to
communicate changes and/or issues for iavf.
Karen resolves an issue with failed handling of VF requests while a VF
reset is occurring for i40e.
Mateusz removes clearing of VF requested queue count when configuring
VF ADQ for i40e.
Norbert fixes a NULL pointer dereference that can occur when getting VSI
descriptors for i40e.
====================
Link: https://lore.kernel.org/r/20211206183519.2733180-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Avoid passing NULL skb to __skb_put() function call if
napi_alloc_skb() returns NULL.
Fixes: 37149e9374 ("gve: Implement packet continuation for RX.")
Signed-off-by: Ameer Hamza <amhamza.mgc@gmail.com>
Link: https://lore.kernel.org/r/20211205183810.8299-1-amhamza.mgc@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The exit function is wrongly placed in the __init section and this leads
to a crash when the module is unloaded. Just remove both the init and
exit functions since this module does not need them.
Fixes: 71c0286324 ("cifs: fork arc4 and create a separate module...")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@vger.kernel.org # 5.15
Signed-off-by: Steve French <stfrench@microsoft.com>
Jiri has moved on and will not carry out the mlxsw maintainership duty any
longer. Add myself as a co-maintainer instead.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/45b54312cdebaf65c5d110b15a5dd2df795bf2be.1638807297.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vadim Fedorenko says:
====================
net: tls: cover all ciphers with tests
Recent patches to Kernel TLS showed that some ciphers are not covered
with tests. Let's cover missed.
====================
Link: https://lore.kernel.org/r/20211206213932.7508-1-vfedorenko@novek.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add tests for TLSv1.2 and TLSv1.3 with AES256-GCM cipher
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add tests for TLSv1.2 and TLSv1.3 with AES-CCM cipher.
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Correct indentation to 4 spaces, same as the other JSON files.
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Andrew Kilroy <andrew.kilroy@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20211203123525.31127-2-andrew.kilroy@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In previous patch, we have supported the syntax which enables
the event on a specified pmu, such as:
cpu_core/<event>/
cpu_atom/<event>/
While this syntax is not very easy for applying on a set of
events or applying on a group. In following example, we have to
explicitly assign the pmu prefix.
# ./perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' -- sleep 1
Performance counter stats for 'sleep 1':
1,158,545 cpu_core/cycles/
1,003,113 cpu_core/instructions/
1.002428712 seconds time elapsed
A much easier way is:
# ./perf stat --cputype core -e '{cycles,instructions}' -- sleep 1
Performance counter stats for 'sleep 1':
1,101,071 cpu_core/cycles/
939,892 cpu_core/instructions/
1.002363142 seconds time elapsed
For this example, the '--cputype' enables the events from specified
pmu (cpu_core).
If '--cputype' conflicts with pmu prefix, '--cputype' is ignored.
# ./perf stat --cputype core -e cycles,cpu_atom/instructions/ -a -- sleep 1
Performance counter stats for 'system wide':
21,003,407 cpu_core/cycles/
367,886 cpu_atom/instructions/
1.002203520 seconds time elapsed
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210909062215.10278-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It's possible to link against libopencsd_c_api without having
libstdc++.so available, only libstdc++.so.6.0.28 (or whatever version is
in use) needs to be available. The same holds true for libopencsd.so.
When -lstdc++ (or -lopencsd) is explicitly passed to the linker however
the .so file must be available.
So wrap adding the dependencies into a check for static linking that
actually requires adding them all. The same construct is already used
for some other tests in the same file to reduce dependencies in the
dynamic linking case.
Fixes: 573cf5c9a1 ("perf build: Add missing -lstdc++ when linking with libopencsd")
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org>
Cc: Adrian Bunk <bunk@debian.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Branislav Rankov <branislav.rankov@arm.com>
Cc: Diederik de Haas <didi.debian@cknow.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/all/20211203210544.1137935-1-uwe@kleine-koenig.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently topdown events must appear after a slots event:
$ perf stat -e '{slots,topdown-fe-bound}' /bin/true
Performance counter stats for '/bin/true':
3,183,090 slots
986,133 topdown-fe-bound
Reversing the events yields:
$ perf stat -e '{topdown-fe-bound,slots}' /bin/true
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-fe-bound).
For metrics the order of events is determined by iterating over a
hashmap, and so slots isn't guaranteed to be first which can yield this
error.
Change the set_leader in parse-events, called when a group is closed, so
that rather than always making the first event the leader, if the slots
event exists then it is made the leader. It is then moved to the head of
the evlist otherwise it won't be opened in the correct order.
The result is:
$ perf stat -e '{topdown-fe-bound,slots}' /bin/true
Performance counter stats for '/bin/true':
3,274,795 slots
1,001,702 topdown-fe-bound
A problem with this approach is the slots event is identified by name,
names can be overwritten like 'cpu/slots,name=foo/' and this causes the
leader change to fail.
The change also modifies and fixes mixed groups like, with the change:
$ perf stat -e '{instructions,slots,topdown-fe-bound}' -a -- sleep 2
Performance counter stats for 'system wide':
5574985410 slots
971981616 instructions
1348461887 topdown-fe-bound
2.001263120 seconds time elapsed
Without the change:
$ perf stat -e '{instructions,slots,topdown-fe-bound}' -a -- sleep 2
Performance counter stats for 'system wide':
<not counted> instructions
<not counted> slots
<not supported> topdown-fe-bound
2.006247990 seconds time elapsed
Something that may be undesirable here is that the events are reordered
in the output.
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vineet Singh <vineet.singh@intel.com>
Link: http://lore.kernel.org/lkml/20211130174945.247604-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The leader of a group is the first, but allow it to be an arbitrary list
member so that for Intel topdown events slots may always be the group
leader.
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vineet Singh <vineet.singh@intel.com>
Link: http://lore.kernel.org/lkml/20211130174945.247604-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It is common to use the same counters with and without duration_time.
The ID sharing code treats duration_time as if it were a hardware event
placed in the same group. This causes unnecessary multiplexing such as
in the following example where l3_cache_access isn't shared:
$ perf stat -M l3 -a sleep 1
Performance counter stats for 'system wide':
3,117,007 l3_cache_miss # 199.5 MB/s l3_rd_bw
# 43.6 % l3_hits
# 56.4 % l3_miss (50.00%)
5,526,447 l3_cache_access (50.00%)
5,392,435 l3_cache_access # 5389191.2 access/s l3_access_rate (50.00%)
1,000,601,901 ns duration_time
1.000601901 seconds time elapsed
Fix this by placing duration_time in all groups unless metric
sharing has been disabled on the command line:
$ perf stat -M l3 -a sleep 1
Performance counter stats for 'system wide':
3,597,972 l3_cache_miss # 230.3 MB/s l3_rd_bw
# 48.0 % l3_hits
# 52.0 % l3_miss
6,914,459 l3_cache_access # 6909935.9 access/s l3_access_rate
1,000,654,579 ns duration_time
1.000654579 seconds time elapsed
$ perf stat --metric-no-merge -M l3 -a sleep 1
Performance counter stats for 'system wide':
3,501,834 l3_cache_miss # 53.5 % l3_miss (24.99%)
6,548,173 l3_cache_access (24.99%)
3,417,622 l3_cache_miss # 45.7 % l3_hits (25.04%)
6,294,062 l3_cache_access (25.04%)
5,923,238 l3_cache_access # 5919688.1 access/s l3_access_rate (24.99%)
1,000,599,683 ns duration_time
3,607,486 l3_cache_miss # 230.9 MB/s l3_rd_bw (49.97%)
1.000599683 seconds time elapsed
v2. Doesn't count duration_time in the metric_list_cmp function that
sorts larger metrics first. Without this a metric with duration_time
and an event is sorted the same as a metric with two events,
possibly not allowing the first metric to share with the second.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20211124015226.3317994-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This updates the link to documentation on AMD processors. The new link
points to a page where users can find the Processor Programming
Reference (PPR) documents for the family and model codes corresponding
to processors they are using.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Robert Richter <rrichter@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Link: https://lore.kernel.org/r/20211123084613.243792-2-sandipan.das@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
AMD processors have events with event select codes and unit masks larger
than a byte. The core PMU, for example, uses 12-bit event select codes
split between bits 0-7 and 32-35 of the PERF_CTL MSRs as can be seen
from /sys/bus/event_sources/devices/cpu/format/*.
The Processor Programming Reference (PPR) lists the event codes as
unified 12-bit hexadecimal values instead and the split between the bits
is not apparent to someone who is not aware of the layout of the
PERF_CTL MSRs.
8-bit event select codes continue to work as the layout matches that of
the PERF_CTL MSRs i.e. bits 0-7 for event select and 8-15 for unit mask.
This adds more details in the perf man pages about using
/sys/bus/event_sources/devices/*/format/* for determining the correct
raw event encoding scheme.
E.g. the "op_cache_hit_miss.op_cache_hit" event with code 0x28f and
umask 0x03 can be programmed using its symbolic name as:
$ sudo perf --debug perf-event-open stat -e op_cache_hit_miss.op_cache_hit sleep 1
------------------------------------------------------------
perf_event_attr:
type 4
size 128
config 0x20000038f
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
[...]
One might use a simple eventsel+umask combination based on what the
current man pages say and incorrectly program the event as:
$ sudo perf --debug perf-event-open stat -e r0328f sleep 1
------------------------------------------------------------
perf_event_attr:
type 4
size 128
config 0x328f
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
[...]
When it should have been based on the format from sysfs:
$ cat /sys/bus/event_source/devices/cpu/format/event
config:0-7,32-35
$ sudo perf --debug perf-event-open stat -e r20000038f sleep 1
------------------------------------------------------------
perf_event_attr:
type 4
size 128
config 0x20000038f
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
[...]
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Robert Richter <rrichter@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Link: https://lore.kernel.org/r/20211123084613.243792-1-sandipan.das@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Remove the scaling process from perf_mmap__read_self(), and unify the
counters that can be obtained from perf_evsel__read() to "no scaling".
Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211109085831.3770594-3-nakamura.shun@fujitsu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Move perf_counts_values__scale() from tools/perf/util to tools/lib/perf
so that it can be used with libperf.
Committer notes:
As noted by Jiri, use __s8 instead of s8 on the exported function.
Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211109085831.3770594-2-nakamura.shun@fujitsu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The tools build system uses KBUILD_HOSTCFLAGS symbol for obvious purposes.
However this is not set for anything under tools/
As such, host tools apps built have no compiler warnings enabled.
Declare HOSTCFLAGS for perf tools build, and also use that symbol in
declaration of host_c_flags. HOSTCFLAGS comes from EXTRA_WARNINGS, which
is independent of target platform/arch warning flags.
Suggested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/1635525041-151876-1-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Helps a bit the user figuring out why it is failing:
Before:
$ perf test sigtrap
73: Sigtrap : FAILED!
$ perf test -v sigtrap
73: Sigtrap :
--- start ---
test child forked, pid 3816772
FAILED sys_perf_event_open()
test child finished with -1
---- end ----
Sigtrap: FAILED!
$
After:
$ perf test sigtrap
73: Sigtrap : FAILED!
$ perf test -v sigtrap
73: Sigtrap :
--- start ---
test child forked, pid 3816772
FAILED sys_perf_event_open(): Permission denied
test child finished with -1
---- end ----
Sigtrap: FAILED!
$
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Fabian Hemmer <copy@copy.sh>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kasan-dev@googlegroups.com
Link: http://lore.kernel.org/lkml/YZOpSVOCXe0zWeRs@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add basic stress test for sigtrap handling as a perf tool built-in test.
This allows sanity checking the basic sigtrap functionality from within
the perf tool.
Committer notes:
Reported that !root was getting -EPERM, applied a fixup from Marco to
set .exclude_{hv,kernel} that made it work.
Signed-off-by: Marco Elver <elver@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Fabian Hemmer <copy@copy.sh>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kasan-dev@googlegroups.com
Link: http://lore.kernel.org/lkml/20211115112822.4077224-1-elver@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
centos9 has nmap-ncat which doesn't like the '-q' option, use socat.
While at it, mark test skipped if needed tools are missing.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The existing net,mac test didn't cover the issue recently reported
by Nikita Yushchenko, where MAC addresses wouldn't match if given
as first field of a concatenated set with AVX2 and 8-bit groups,
because there's a different code path covering the lookup of six
8-bit groups (MAC addresses) if that's the first field.
Add a similar mac,net test, with MAC address and IPv4 address
swapped in the set specification.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The sixth byte of packet data has to be looked up in the sixth group,
not in the seventh one, even if we load the bucket data into ymm6
(and not ymm5, for convenience of tracking stalls).
Without this fix, matching on a MAC address as first field of a set,
if 8-bit groups are selected (due to a small set size) would fail,
that is, the given MAC address would never match.
Reported-by: Nikita Yushchenko <nikita.yushchenko@virtuozzo.com>
Cc: <stable@vger.kernel.org> # 5.6.x
Fixes: 7400b06396 ("nft_set_pipapo: Introduce AVX2-based lookup implementation")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-By: Nikita Yushchenko <nikita.yushchenko@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
After the below patch, the conntrack attached to skb is set to "notrack" in
the context of vrf device, for locally generated packets.
But this is true only when the default qdisc is set to the vrf device. When
changing the qdisc, notrack is not set anymore.
In fact, there is a shortcut in the vrf driver, when the default qdisc is
set, see commit dcdd43c41e ("net: vrf: performance improvements for
IPv4") for more details.
This patch ensures that the behavior is always the same, whatever the qdisc
is.
To demonstrate the difference, a new test is added in conntrack_vrf.sh.
Fixes: 8c9c296adf ("vrf: run conntrack only in context of lower/physdev for locally generated packets")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
- Fix SMT detection fast read path on sysfs.
- Fix memory leaks when processing feature headers in perf.data files.
- Fix 'Simple expression parser' 'perf test' on arch without CPU die topology
info, such as s/390.
- Fix building perf with BUILD_BPF_SKEL=1.
- Fix 'perf bench' by reverting "perf bench: Fix two memory leaks detected with ASan".
- Fix itrace space allowed for new attributes in 'perf script'.
- Fix the build feature detection fast path, that was always failing on systems
with python3 development packages, speeding up the build.
- Reset shadow counts before loading, fixing metrics using duration_time.
- Sync more kernel headers changed by the new futex_waitv syscall: s390 and powerpc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYa9kXwAKCRCyPKLppCJ+
J1pSAQDAk1vQa82x5bqq8yHZTeym4Y6QtKivveTmqCagFT7FOwEAmzwOsdv/v3cQ
GA0u97zPUsTiN1ma/nh+YZTA9IK1nw0=
=YPeN
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-fixes-for-v5.16-2021-12-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix SMT detection fast read path on sysfs.
- Fix memory leaks when processing feature headers in perf.data files.
- Fix 'Simple expression parser' 'perf test' on arch without CPU die
topology info, such as s/390.
- Fix building perf with BUILD_BPF_SKEL=1.
- Fix 'perf bench' by reverting "perf bench: Fix two memory leaks
detected with ASan".
- Fix itrace space allowed for new attributes in 'perf script'.
- Fix the build feature detection fast path, that was always failing on
systems with python3 development packages, speeding up the build.
- Reset shadow counts before loading, fixing metrics using
duration_time.
- Sync more kernel headers changed by the new futex_waitv syscall: s390
and powerpc.
* tag 'perf-tools-fixes-for-v5.16-2021-12-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf bpf_skel: Do not use typedef to avoid error on old clang
perf bpf: Fix building perf with BUILD_BPF_SKEL=1 by default in more distros
perf header: Fix memory leaks when processing feature headers
perf test: Reset shadow counts before loading
perf test: Fix 'Simple expression parser' test on arch without CPU die topology info
tools build: Remove needless libpython-version feature check that breaks test-all fast path
perf tools: Fix SMT detection fast read path
tools headers UAPI: Sync powerpc syscall table file changed by new futex_waitv syscall
perf inject: Fix itrace space allowed for new attributes
tools headers UAPI: Sync s390 syscall table file changed by new futex_waitv syscall
Revert "perf bench: Fix two memory leaks detected with ASan"
BUG: KASAN: use-after-free in io_submit_one+0x496/0x2fe0 fs/aio.c:1882
CPU: 2 PID: 15100 Comm: syz-executor873 Not tainted 5.16.0-rc1-syzk #1
Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29
04/01/2014
Call Trace:
[...]
refcount_dec_and_test include/linux/refcount.h:333 [inline]
iocb_put fs/aio.c:1161 [inline]
io_submit_one+0x496/0x2fe0 fs/aio.c:1882
__do_sys_io_submit fs/aio.c:1938 [inline]
__se_sys_io_submit fs/aio.c:1908 [inline]
__x64_sys_io_submit+0x1c7/0x4a0 fs/aio.c:1908
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3a/0x80 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
__blkdev_direct_IO_async() returns errors from bio_iov_iter_get_pages()
directly, in which case upper layers won't be expecting ->ki_complete
to be called by the block layer and will terminate the request. However,
there is also bio_endio() leading to a second ->ki_complete and a double
free.
Fixes: 54a88eb838 ("block: add single bio async direct IO helper")
Reported-by: George Kennedy <george.kennedy@oracle.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c9eb786f6cef041e159e6287de131bec0719ad5c.1638907997.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>