WQ_HIGHPRI increases throughput and decreases disk latency when using
dm-verity. This is important in Android for camera startup speed.
The following tests were run by doing 60 seconds of random reads using
a dm-verity device backed by two ramdisks.
Without WQ_HIGHPRI
lat (usec): min=13, max=3947, avg=69.53, stdev=50.55
READ: bw=51.1MiB/s (53.6MB/s), 51.1MiB/s-51.1MiB/s (53.6MB/s-53.6MB/s)
With WQ_HIGHPRI:
lat (usec): min=13, max=7854, avg=31.15, stdev=30.42
READ: bw=116MiB/s (121MB/s), 116MiB/s-116MiB/s (121MB/s-121MB/s)
Further testing was done by measuring how long it takes to open a
camera on an Android device.
Without WQ_HIGHPRI
Total verity work queue wait times (ms):
880.960, 789.517, 898.852
With WQ_HIGHPRI:
Total verity work queue wait times (ms):
528.824, 439.191, 433.300
The average time to open the camera is reduced by 350ms (or 40-50%).
Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Change DMWARN to DMERR in cases when there is an unrecoverable error.
Change DMWARN to DMCRIT when handling of a case is unimplemented.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmNNTxsACgkQxWXV+ddt
WDs7QA//WaEPFWO/086pWBhlJF8k+QCNwM/EEKQL5x4hxC6pEGbHk04q63IY3XKh
B9WEoxTfkxpHz8p+p9wDVRJl1Fdby/UFc1l2xTLcU273wL4Iweqf00N4WyEYmqQ3
DtZcYmh4r7gZ9cTuHk8Ex+llhqAUiN7mY3FoEU8naZCE+Fdn/h3T8DT79V2XLgzv
4f46ci0ao74o30EE7vc/Yw3gr1ouJJ4Ajw/UCEXUVC9tWOLcDNE6501AshT/ozDp
m2tljY630QIayaMjtR+HCJHmdmB5bNGdE01Cssqc8M+M7AtQKvf+A/nQNTiI0UfK
6ODdukvteTEEVKL2XHHkW6RWzR1rfhT1JOrl3YRKZwAKYsURXI/t+2UIjZtVstY9
GRb3YGBDVggtbjyXxC04i4WyF3RoHRehGiF/G303BBGFMXfgZ17rvSp7DfL9KLcc
VNycW17CcQLVZXueNWJrNSu2dQ0X8Lx0X+OTcsxRkNCJW+JQHffDl/TwMt0GtRoO
Vhwjp8vUKuJDZbjvGXg0ZKrmk0T12+L8ubt5o5fQtMiFf+RGq77xzI1112ZIwsL0
OtGOD3ShgKDvz24HoxSAVTbHq+/s+bmhIL/xU4QAeol3sOVPfx6b+KqcmTyG9E9u
+gbqB9js/2vbDFNtmhOV8Fv1HbGT8bwtMCIlq5CzsiX+aT5rT88=
=aPaQ
-----END PGP SIGNATURE-----
Merge tag 'for-6.1-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fiemap fixes:
- add missing path cache update
- fix processing of delayed data and tree refs during backref
walking, this could lead to reporting incorrect extent sharing
- fix extent range locking under heavy contention to avoid deadlocks
- make it possible to test send v3 in debugging mode
- update links in MAINTAINERS
* tag 'for-6.1-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
MAINTAINERS: update btrfs website links and files
btrfs: ignore fiemap path cache if we have multiple leaves for a data extent
btrfs: fix processing of delayed tree block refs during backref walking
btrfs: fix processing of delayed data refs during backref walking
btrfs: delete stale comments after merge conflict resolution
btrfs: unlock locked extent area if we have contention
btrfs: send: update command for protocol version check
btrfs: send: allow protocol version 3 with CONFIG_BTRFS_DEBUG
btrfs: add missing path cache update during fiemap
AMD systems support zero CBM (capacity bit mask) for cache allocation.
That is reflected in rdt_init_res_defs_amd() by:
r->cache.arch_has_empty_bitmaps = true;
However given the unified code in cbm_validate(), checking for:
val == 0 && !arch_has_empty_bitmaps
is not enough because of another check in cbm_validate():
if ((zero_bit - first_bit) < r->cache.min_cbm_bits)
The default value of r->cache.min_cbm_bits = 1.
Leading to:
$ cd /sys/fs/resctrl
$ mkdir foo
$ cd foo
$ echo L3:0=0 > schemata
-bash: echo: write error: Invalid argument
$ cat /sys/fs/resctrl/info/last_cmd_status
Need at least 1 bits in the mask
Initialize the min_cbm_bits to 0 for AMD. Also, remove the default
setting of min_cbm_bits and initialize it separately.
After the fix:
$ cd /sys/fs/resctrl
$ mkdir foo
$ cd foo
$ echo L3:0=0 > schemata
$ cat /sys/fs/resctrl/info/last_cmd_status
ok
Fixes: 316e7f901f ("x86/resctrl: Add struct rdt_cache::arch_has_{sparse, empty}_bitmaps")
Co-developed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/lkml/20220517001234.3137157-1-eranian@google.com
- Fix illegal unmapped accesses when initializing compressed inodes;
- Fix up very rare hung on page lock after enabling compressed data
deduplication;
- Fix up inplace decompression success rate;
- Take s_inode_list_lock to protect sb->s_inodes for fscache shared
domain.
-----BEGIN PGP SIGNATURE-----
iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCY05u9xEceGlhbmdAa2Vy
bmVsLm9yZwAKCRA5NzHcH7XmBHp1AQCXNLts5Af2PdRZUcG8RP0IAM0cZbXM9oZE
e520B85VRQD8CuHE4xemQ6pN7iTcBRS/i60p5UF3pCr5PJ0WGcf5gwU=
=/mdV
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-6.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
- Fix invalid unmapped accesses when initializing compressed inodes
- Fix up very rare hung on page lock after enabling compressed data
deduplication
- Fix up inplace decompression success rate
- Take s_inode_list_lock to protect sb->s_inodes for fscache shared
domain
* tag 'erofs-for-6.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: protect s_inodes with s_inode_list_lock for fscache
erofs: fix up inplace decompression success rate
erofs: shouldn't churn the mapping page for duplicated copies
erofs: fix illegal unmapped accesses in z_erofs_fill_inode_lazy()
The function test_bit doesn't provide any memory barrier. It may be
possible that the read requests that follow test_bit(B_READING, &b->state)
are reordered before the test, reading invalid data that existed before
B_READING was cleared.
Fix this bug by changing test_bit to test_bit_acquire. This is
particularly important on arches with weak(er) memory ordering
(e.g. arm64).
Depends-On: 8238b45798 ("wait_on_bit: add an acquire memory barrier")
Depends-On: d6ffe6067a ("provide arch_test_bit_acquire for architectures that define test_bit")
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
We already set rc to this return code further down in the function but
we can set it earlier in order to suppress a smash warning.
Also fix a false positive for Coverity. The reason this is a false positive is
that this happens during umount after all files and directories have been closed
but mosetting on ->on_list to suppress the warning.
Reported-by: Dan carpenter <dan.carpenter@oracle.com>
Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1525256 ("Concurrent data access violations")
Fixes: a350d6e73f5e ("cifs: enable caching of directories for which a lease is held")
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
list_head can be initialized automatically with LIST_HEAD()
instead of calling INIT_LIST_HEAD().
Using list_move() instead of list_del() and list_add().
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
If stardup the symlink target failed, should free the xid,
otherwise the xid will be leaked.
Fixes: 76894f3e2f ("cifs: improve symlink handling for smb2+")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Before return, should free the xid, otherwise, the
xid will be leaked.
Fixes: d70e9fa558 ("cifs: try opening channels after mounting")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
If not flock, before return -ENOLCK, should free the xid,
otherwise, the xid will be leaked.
Fixes: d0677992d2 ("cifs: add support for flock")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
If the file is used by swap, before return -EOPNOTSUPP, should
free the xid, otherwise, the xid will be leaked.
Fixes: 4e8aea30f7 ("smb3: enable swap on SMB3 mounts")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
If the cifs already shutdown, we should free the xid before return,
otherwise, the xid will be leaked.
Fixes: 087f757b01 ("cifs: add shutdown support")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
- Fix module loading in Tegra124 driver (Jon Hunter).
- Fix memory leak and update to read-only region in qcom driver (Fabien
Parent).
- Miscellaneous minor cleanups to cpufreq drivers (Fabien Parent and
Yang Yingliang).
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEx73Crsp7f6M6scA70rkcPK6BEhwFAmNOhnoACgkQ0rkcPK6B
Ehy8Cg/9HejtmwyW0S7mgBdG8+mleDDwOMSsWxnQ2dHBPeip4f1ivJbSeyst+/k0
Y4wVwggAFKf6295s93+aBuyomtJR+18PwJe/HrGemj2SiicL4gQ53n5WjO2Xz4ce
VMVQY8lZ9u97t6TxF8SarG2pFK/bHF+wk541NKJ+3EVIGYltlFN+rDEwIEPHqq1S
EjF7+qXAxDDMKLlNWD7gqFxylPU0o+SHHsmgb4TY72hjpL9i1KeHY5BT+FLgCto0
d0ZEENdHhLR1zILJziFDWaDMTG0W3mgztMJK/4HYtLCNYn14iaEf/Q2Ry/5t8f1u
N1Kzy3msnIA0B8YfcFyUaOBpGW9nnVHHwZaqO3gbvEGAPoXz2wiKg4MbC9x+cAho
vk2JrGqG80j5lEDLPuQtj0x6NBl8+4Vo6e/0uTnYpupNfqTE7LXkOWoiYLBxYbDX
7J4fP4PRvZHubOdKovBx6Um8FVioWKW41X7zgiNoVL7wCh7Y/Op8ecWGOXa/hOcZ
MQV2TWMTKZweTleB4hclHk5nrmLgPt+qU+5mN2cPhFBkF8Jsg8vIcGJ1lozYnJM9
+KK1ukISaqMlDzWPcX7Gt5Mk1tZMv/jXhtSFdEvZNkk8PM2Zyqwv7bLZ4vuv479r
rIlMO3k8Bxjd30WsMtQmeR0mDJ3l5I2W965XQp1c2zFVk9GHe3o=
=1R1A
-----END PGP SIGNATURE-----
Merge tag 'cpufreq-arm-fixes-6.1-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm
Pull cpufreq ARM fixes / cleanups for 6.1-rc from Viresh Kumar:
"- Fix module loading in Tegra124 driver (Jon Hunter).
- Fix memory leak and update to read-only region in qcom driver (Fabien
Parent).
- Miscellaneous minor cleanups to cpufreq drivers (Fabien Parent and
Yang Yingliang)."
* tag 'cpufreq-arm-fixes-6.1-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: sun50i: Switch to use dev_err_probe() helper
cpufreq: qcom-nvmem: Switch to use dev_err_probe() helper
cpufreq: imx6q: Switch to use dev_err_probe() helper
cpufreq: dt: Switch to use dev_err_probe() helper
cpufreq: qcom: remove unused parameter in function definition
cpufreq: qcom: fix writes in read-only memory region
cpufreq: qcom: fix memory leak in error path
cpufreq: tegra194: Fix module loading
Don't populate the read-only array tp10ubkbd_led on the stack but instead
make it static const. Also makes the object code a little smaller.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The MadCatz variant of the MMO7 mouse has the ID 0738:1713 and the same
quirks as the Saitek variant.
Signed-off-by: Samuel Bailey <samuel.bailey1@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
In the probe path, convert pr_err() to dev_err_probe() which will
check if error code is -EPROBE_DEFER and prints the error name.
It also sets the defer probe reason which can be checked later
through debugfs. It's more simple in error path.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
In the probe path, dev_err() can be replaced with dev_err_probe()
which will check if error code is -EPROBE_DEFER and prints the
error name. It also sets the defer probe reason which can be
checked later through debugfs. It's more simple in error path.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
In the probe path, dev_err() can be replaced with dev_err_probe()
which will check if error code is -EPROBE_DEFER and prints the
error name. It also sets the defer probe reason which can be
checked later through debugfs. It's more simple in error path.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
In the probe path, dev_err() can be replaced with dev_err_probe()
which will check if error code is -EPROBE_DEFER and prints the
error name. It also sets the defer probe reason which can be
checked later through debugfs. It's more simple in error path.
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The speedbin_nvmem parameter is not used for
get_krait_bin_format_{a,b}. Let's remove the parameter to make the code
cleaner.
Signed-off-by: Fabien Parent <fabien.parent@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
This commit fixes a kernel oops because of a write in some read-only memory:
[ 9.068287] Unable to handle kernel write to read-only memory at virtual address ffff800009240ad8
..snip..
[ 9.138790] Internal error: Oops: 9600004f [#1] PREEMPT SMP
..snip..
[ 9.269161] Call trace:
[ 9.276271] __memcpy+0x5c/0x230
[ 9.278531] snprintf+0x58/0x80
[ 9.282002] qcom_cpufreq_msm8939_name_version+0xb4/0x190
[ 9.284869] qcom_cpufreq_probe+0xc8/0x39c
..snip..
The following line defines a pointer that point to a char buffer stored
in read-only memory:
char *pvs_name = "speedXX-pvsXX-vXX";
This pointer is meant to hold a template "speedXX-pvsXX-vXX" where the
XX values get overridden by the qcom_cpufreq_krait_name_version function. Since
the template is actually stored in read-only memory, when the function
executes the following call we get an oops:
snprintf(*pvs_name, sizeof("speedXX-pvsXX-vXX"), "speed%d-pvs%d-v%d",
speed, pvs, pvs_ver);
To fix this issue, we instead store the template name onto the stack by
using the following syntax:
char pvs_name_buffer[] = "speedXX-pvsXX-vXX";
Because the `pvs_name` needs to be able to be assigned to NULL, the
template buffer is stored in the pvs_name_buffer and not under the
pvs_name variable.
Cc: v5.7+ <stable@vger.kernel.org> # v5.7+
Fixes: a8811ec764 ("cpufreq: qcom: Add support for krait based socs")
Signed-off-by: Fabien Parent <fabien.parent@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
If for some reason the speedbin length is incorrect, then there is a
memory leak in the error path because we never free the speedbin buffer.
This commit fixes the error path to always free the speedbin buffer.
Cc: v5.7+ <stable@vger.kernel.org> # v5.7+
Fixes: a8811ec764 ("cpufreq: qcom: Add support for krait based socs")
Signed-off-by: Fabien Parent <fabien.parent@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
When the Tegra194 CPUFREQ driver is built as a module it is not
automatically loaded as expected on Tegra194 devices. Populate the
MODULE_DEVICE_TABLE to fix this.
Cc: v5.9+ <stable@vger.kernel.org> # v5.9+
Fixes: df320f8935 ("cpufreq: Add Tegra194 cpufreq driver")
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Currently, the patch application logic checks whether the revision
needs to be applied on each logical CPU (SMT thread). Therefore, on SMT
designs where the microcode engine is shared between the two threads,
the application happens only on one of them as that is enough to update
the shared microcode engine.
However, there are microcode patches which do per-thread modification,
see Link tag below.
Therefore, drop the revision check and try applying on each thread. This
is what the BIOS does too so this method is very much tested.
Btw, change only the early paths. On the late loading paths, there's no
point in doing per-thread modification because if is it some case like
in the bugzilla below - removing a CPUID flag - the kernel cannot go and
un-use features it has detected are there early. For that, one should
use early loading anyway.
[ bp: Fixes does not contain the oldest commit which did check for
equality but that is good enough. ]
Fixes: 8801b3fcb5 ("x86/microcode/AMD: Rework container parsing")
Reported-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Ștefan Talpalaru <stefantalpalaru@yahoo.com>
Cc: <stable@vger.kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216211
When we call connect() for a UDP socket in a reuseport group, we have
to update sk->sk_reuseport_cb->has_conns to 1. Otherwise, the kernel
could select a unconnected socket wrongly for packets sent to the
connected socket.
However, the current way to set has_conns is illegal and possible to
trigger that problem. reuseport_has_conns() changes has_conns under
rcu_read_lock(), which upgrades the RCU reader to the updater. Then,
it must do the update under the updater's lock, reuseport_lock, but
it doesn't for now.
For this reason, there is a race below where we fail to set has_conns
resulting in the wrong socket selection. To avoid the race, let's split
the reader and updater with proper locking.
cpu1 cpu2
+----+ +----+
__ip[46]_datagram_connect() reuseport_grow()
. .
|- reuseport_has_conns(sk, true) |- more_reuse = __reuseport_alloc(more_socks_size)
| . |
| |- rcu_read_lock()
| |- reuse = rcu_dereference(sk->sk_reuseport_cb)
| |
| | | /* reuse->has_conns == 0 here */
| | |- more_reuse->has_conns = reuse->has_conns
| |- reuse->has_conns = 1 | /* more_reuse->has_conns SHOULD BE 1 HERE */
| | |
| | |- rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb,
| | | more_reuse)
| `- rcu_read_unlock() `- kfree_rcu(reuse, rcu)
|
|- sk->sk_state = TCP_ESTABLISHED
Note the likely(reuse) in reuseport_has_conns_set() is always true,
but we put the test there for ease of review. [0]
For the record, usually, sk_reuseport_cb is changed under lock_sock().
The only exception is reuseport_grow() & TCP reqsk migration case.
1) shutdown() TCP listener, which is moved into the latter part of
reuse->socks[] to migrate reqsk.
2) New listen() overflows reuse->socks[] and call reuseport_grow().
3) reuse->max_socks overflows u16 with the new listener.
4) reuseport_grow() pops the old shutdown()ed listener from the array
and update its sk->sk_reuseport_cb as NULL without lock_sock().
shutdown()ed TCP sk->sk_reuseport_cb can be changed without lock_sock(),
but, reuseport_has_conns_set() is called only for UDP under lock_sock(),
so likely(reuse) never be false in reuseport_has_conns_set().
[0]: https://lore.kernel.org/netdev/CANn89iLja=eQHbsM_Ta2sQF0tOGU8vAGrh_izRuuHjuO1ouUag@mail.gmail.com/
Fixes: acdcecc612 ("udp: correct reuseport selection with connected sockets")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20221014182625.89913-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Commit 5e633302ac ("scsi: lpfc: vmid: Add support for VMID in mailbox
command") introduced allocations for the VMID resources in
lpfc_create_port() after the call to scsi_host_alloc(). Upon failure on the
VMID allocations, the new code would branch to the 'out' label, which
returns NULL without unwinding anything, thus skipping the call to
scsi_host_put().
Fix the problem by creating a separate label 'out_free_vmid' to unwind the
VMID resources and make the 'out_put_shost' label call only
scsi_host_put(), as was done before the introduction of allocations for
VMID.
Fixes: 5e633302ac ("scsi: lpfc: vmid: Add support for VMID in mailbox command")
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Link: https://lore.kernel.org/r/20220916035908.712799-1-rafaelmendsr@gmail.com
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Userspace can currently write to sysfs to transition sdev_state to RUNNING
or OFFLINE from any source state. This causes issues because proper
transitioning out of some states involves steps besides just changing
sdev_state, so allowing userspace to change sdev_state regardless of the
source state can result in inconsistencies; e.g. with ISCSI we can end up
with sdev_state == SDEV_RUNNING while the device queue is quiesced. Any
task attempting I/O on the device will then hang, and in more recent
kernels, iscsid will hang as well.
More detail about this bug is provided in my first attempt:
https://groups.google.com/g/open-iscsi/c/PNKca4HgPDs/m/CXaDkntOAQAJ
Link: https://lore.kernel.org/r/20220924000241.2967323-1-ushankar@purestorage.com
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Suggested-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* Fix a recent regression where a sleeping kernfs function is called with
css_set_lock (spinlock) held.
* Revert the commit to enable cgroup1 support for cgroup_get_from_fd/file().
Multiple users assume that the lookup only works for cgroup2 and breaks
when fed a cgroup1 file. Instead, introduce a separate set of functions to
lookup both v1 and v2 and use them where the user explicitly wants to
support both versions.
* Compat update for tools/perf/util/bpf_skel/bperf_cgroup.bpf.c.
* Add Josef Bacik as a blkcg maintainer.
-----BEGIN PGP SIGNATURE-----
iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCY03MlA4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGTkUAQD7fNcSPuc2m/BvW+gySKQkp9PZMA6E6yOIqirc
QKmIgwEAwWECW7RR1alhOGD50RtYkuYVsLD1+6Ka4eMHe+EhwA4=
=kGLI
-----END PGP SIGNATURE-----
Merge tag 'cgroup-for-6.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- Fix a recent regression where a sleeping kernfs function is called
with css_set_lock (spinlock) held
- Revert the commit to enable cgroup1 support for cgroup_get_from_fd/file()
Multiple users assume that the lookup only works for cgroup2 and
breaks when fed a cgroup1 file. Instead, introduce a separate set of
functions to lookup both v1 and v2 and use them where the user
explicitly wants to support both versions.
- Compat update for tools/perf/util/bpf_skel/bperf_cgroup.bpf.c.
- Add Josef Bacik as a blkcg maintainer.
* tag 'cgroup-for-6.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
blkcg: Update MAINTAINERS entry
mm: cgroup: fix comments for get from fd/file helpers
perf stat: Support old kernels for bperf cgroup counting
bpf: cgroup_iter: support cgroup1 using cgroup fd
cgroup: add cgroup_v1v2_get_from_[fd/file]()
Revert "cgroup: enable cgroup_get_from_file() on cgroup1"
cgroup: Reorganize css_set_lock and kernfs path processing
When compiling with clang and W=1, the following warning is generated:
drivers/ata/ahci_qoriq.c:283:22: error: cast to smaller integer type
'enum ahci_qoriq_type' from 'const void *'
[-Werror,-Wvoid-pointer-to-enum-cast]
qoriq_priv->type = (enum ahci_qoriq_type)of_id->data;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix this by using a cast to unsigned long to match the "void *" type
size of of_id->data.
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
When compiling with clang and W=1, the following warning is generated:
drivers/ata/ahci_imx.c:1070:18: error: cast to smaller integer type
'enum ahci_imx_type' from 'const void *'
[-Werror,-Wvoid-pointer-to-enum-cast]
imxpriv->type = (enum ahci_imx_type)of_id->data;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix this by using a cast to unsigned long to match the "void *" type
size of of_id->data.
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
When compiling with clang and W=1, the following warning is generated:
drivers/ata/ahci_xgene.c:788:14: error: cast to smaller integer type
'enum xgene_ahci_version' from 'const void *'
[-Werror,-Wvoid-pointer-to-enum-cast]
version = (enum xgene_ahci_version) of_devid->data;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix this by using a cast to unsigned long to match the "void *" type
size of of_devid->data.
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
When compiling with clang and W=1, the following warning is generated:
drivers/ata/ahci_brcm.c:451:18: error: cast to smaller integer type
'enum brcm_ahci_version' from 'const void *'
[-Werror,-Wvoid-pointer-to-enum-cast]
priv->version = (enum brcm_ahci_version)of_id->data;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix this by using a cast to unsigned long to match the "void *" type
size of of_id->data.
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
When compiling with clang and W=1, the following warning is generated:
drivers/ata/sata_rcar.c:878:15: error: cast to smaller integer type
'enum sata_rcar_type' from 'const void *'
[-Werror,-Wvoid-pointer-to-enum-cast]
priv->type = (enum sata_rcar_type)of_device_get_match_data(dev);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix this by using a cast to unsigned long to match the "void *" type
size returned by of_device_get_match_data().
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Josef wrote iolatency and iocost is missing from the files list. Let's add
Josef as a maintainer and add blk-iocost.c to the files list.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jens Axboe <axboe@kernel.dk>
Today, core ID is assumed to be unique within each package.
But an AlderLake-N platform adds a Module level between core and package,
Linux excludes the unknown modules bits from the core ID, resulting in
duplicate core ID's.
To keep core ID unique within a package, Linux must include all APIC-ID
bits for known or unknown levels above the core and below the package
in the core ID.
It is important to understand that core ID's have always come directly
from the APIC-ID encoding, which comes from the BIOS. Thus there is no
guarantee that they start at 0, or that they are contiguous.
As such, naively using them for array indexes can be problematic.
[ dhansen: un-known -> unknown ]
Fixes: 7745f03eb3 ("x86/topology: Add CPUID.1F multi-die/package support")
Suggested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20221014090147.1836-5-rui.zhang@intel.com
CPUID.1F/B does not enumerate Package level explicitly, instead, all the
APIC-ID bits above the enumerated levels are assumed to be package ID
bits.
Current code gets package ID by shifting out all the APIC-ID bits that
Linux supports, rather than shifting out all the APIC-ID bits that
CPUID.1F enumerates. This introduces problems when CPUID.1F enumerates a
level that Linux does not support.
For example, on a single package AlderLake-N, there are 2 Ecore Modules
with 4 atom cores in each module. Linux does not support the Module
level and interprets the Module ID bits as package ID and erroneously
reports a multi module system as a multi-package system.
Fix this by using APIC-ID bits above all the CPUID.1F enumerated levels
as package ID.
[ dhansen: spelling fix ]
Fixes: 7745f03eb3 ("x86/topology: Add CPUID.1F multi-die/package support")
Suggested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20221014090147.1836-4-rui.zhang@intel.com
The coretemp driver supports up to a hard-coded limit of 128 cores.
Today, the driver can not support a core with an ID above that limit.
Yet, the encoding of core ID's is arbitrary (BIOS APIC-ID) and so they
may be sparse and they may be large.
Update the driver to map arbitrary core ID numbers into appropriate
array indexes so that 128 cores can be supported, no matter the encoding
of core ID's.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Len Brown <len.brown@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20221014090147.1836-3-rui.zhang@intel.com
untrusted sources into /dev/random.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmNJQmQACgkQxycdCkmx
i6e/khAAtGyr94H0+41qHPuiNitGErKmAhcoMoVqxcer0kZLVo5ls+8Ftf6GTYcN
+RQdqfD/GlW2Li5nsQ//xgr7UX95mZP6F2OQ8KyMLOCtMQSj+0pOaVf59m5TZvdY
2RD0Q7RBYg3In5kGTNbeVkRY6Mmk6oYil9BhYDaSc3okhzyFUT67y3czxKNWj7In
iP5T51fQDz8wCUnS1nhHpJJypzYgQ0JI0uyf6uJMgQ8V/eC6xGojEOO5TdVAi8Mb
5FRcF4p6IZEqMOFksKyfFNxPgZOanVR5F3PIxsH8PYw0U0svELhpZmkixMyWAPt5
WlqlcW5V+o/tyMUZOfF6Rb9qkOAG5+OVYRxOLk06SOURUFT9v+bEGjYJujXIZW7N
ncPN9eM0XujTgVFqVPpKkcFyBRubjJOfXI/mbQTRu7m7Fd9txSIMT3Jnn8yeZADq
sKP+1EzZGgxNjN2vVVp/23whOpnPJoiEs+zF6RN6+SEx4G+Fixe9NvfvindOskhS
clVMx4/XBpyITN2o9T1Wo13LPBl/O5w481dOhZ3vqKBaW16EaTWRwmFRyolRwaP2
l7iNwZcRCieiidhiK+3+IqnDmbqqNvmRMQsfVarI5dhwDtYWATY9IO0H906A1d/c
J9QcO5UdoKNFOcvOFQsczXl4xqrlOI01nrRuweZGs3kVqOT4j0Y=
=qugm
-----END PGP SIGNATURE-----
Merge tag 'v6.1-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes an issue exposed by the recent change to feed untrusted
sources into /dev/random"
* tag 'v6.1-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
hwrng: bcm2835 - use hwrng_msleep() instead of cpu_relax()
A recent change in LLVM made CONFIG_EFI_STUB unselectable because it no
longer pretends to support -mabi=ms, breaking the dependency in
Kconfig. Lack of CONFIG_EFI_STUB can prevent kernels from booting via
EFI in certain circumstances.
This check was added by
8f24f8c2fc ("efi/libstub: Annotate firmware routines as __efiapi")
to ensure that __attribute__((ms_abi)) was available, as -mabi=ms is
not actually used in any cflags.
According to the GCC documentation, this attribute has been supported
since GCC 4.4.7. The kernel currently requires GCC 5.1 so this check is
not necessary; even when that change landed in 5.6, the kernel required
GCC 4.9 so it was unnecessary then as well.
Clang supports __attribute__((ms_abi)) for all versions that are
supported for building the kernel so no additional check is needed.
Remove the 'depends on' line altogether to allow CONFIG_EFI_STUB to be
selected when CONFIG_EFI is enabled, regardless of compiler.
Fixes: 8f24f8c2fc ("efi/libstub: Annotate firmware routines as __efiapi")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Cc: stable@vger.kernel.org
Link: d1ad006a8f
This reverts commit 8bb7ff12a9.
Commit 8bb7ff12a9 ("PCI: tegra: Use PCI_CONF1_EXT_ADDRESS() macro")
updated the Tegra PCI driver to use the macro PCI_CONF1_EXT_ADDRESS()
instead of a local function in the Tegra PCI driver. This broke PCI for
some Tegra platforms because, when calculating the offset value, the mask
applied to the lower 8-bits changed from 0xff to 0xfc.
For now, fix this by reverting this commit.
Fixes: 8bb7ff12a9 ("PCI: tegra: Use PCI_CONF1_EXT_ADDRESS() macro")
Link: https://lore.kernel.org/r/20221017084006.11770-1-jonathanh@nvidia.com
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Introduce distinct struct balance_callback instead of performing function
pointer casting which will trip CFI. Avoids warnings as found by Clang's
future -Wcast-function-type-strict option:
In file included from kernel/sched/core.c:84:
kernel/sched/sched.h:1755:15: warning: cast from 'void (*)(struct rq *)' to 'void (*)(struct callback_head *)' converts to incompatible function type [-Wcast-function-type-strict]
head->func = (void (*)(struct callback_head *))func;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
No binary differences result from this change.
This patch is a cleanup based on Brad Spengler/PaX Team's modifications
to sched code in their last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code
are mine and don't reflect the original grsecurity/PaX code.
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1724
Link: https://lkml.kernel.org/r/20221008000758.2957718-1-keescook@chromium.org
In commit 97886d9dcd ("sched: Migration changes for core scheduling"),
sched_group_cookie_match() was added to help determine if a cookie
matches the core state.
However, while it iterates the SMT group, it fails to actually use the
RQ for each of the CPUs iterated, use cpu_rq(cpu) instead of rq to fix
things.
Fixes: 97886d9dcd ("sched: Migration changes for core scheduling")
Signed-off-by: Lin Shengwang <linshengwang1@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221008022709.642-1-linshengwang1@huawei.com
* Raw data is also filled by bpf_perf_event_output.
* Add sample_flags to indicate raw data.
* This eliminates the segfaults as shown below:
Run ./samples/bpf/trace_output
BUG pid 9 cookie 1001000000004 sized 4
BUG pid 9 cookie 1001000000004 sized 4
BUG pid 9 cookie 1001000000004 sized 4
Segmentation fault (core dumped)
Fixes: 838d9bb62d ("perf: Use sample_flags for raw_data")
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/r/20221007081327.1047552-1-sumanthk@linux.ibm.com
Add a SIGTRAP stress test that exercises repeatedly enabling/disabling
an event while it concurrently keeps firing.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/Y0E3uG7jOywn7vy3@elver.google.com/
Marco reported:
Due to the implementation of how SIGTRAP are delivered if
perf_event_attr::sigtrap is set, we've noticed 3 issues:
1. Missing SIGTRAP due to a race with event_sched_out() (more
details below).
2. Hardware PMU events being disabled due to returning 1 from
perf_event_overflow(). The only way to re-enable the event is
for user space to first "properly" disable the event and then
re-enable it.
3. The inability to automatically disable an event after a
specified number of overflows via PERF_EVENT_IOC_REFRESH.
The worst of the 3 issues is problem (1), which occurs when a
pending_disable is "consumed" by a racing event_sched_out(), observed
as follows:
CPU0 | CPU1
--------------------------------+---------------------------
__perf_event_overflow() |
perf_event_disable_inatomic() |
pending_disable = CPU0 | ...
| _perf_event_enable()
| event_function_call()
| task_function_call()
| /* sends IPI to CPU0 */
<IPI> | ...
__perf_event_enable() +---------------------------
ctx_resched()
task_ctx_sched_out()
ctx_sched_out()
group_sched_out()
event_sched_out()
pending_disable = -1
</IPI>
<IRQ-work>
perf_pending_event()
perf_pending_event_disable()
/* Fails to send SIGTRAP because no pending_disable! */
</IRQ-work>
In the above case, not only is that particular SIGTRAP missed, but also
all future SIGTRAPs because 'event_limit' is not reset back to 1.
To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction
of a separate 'pending_sigtrap', no longer using 'event_limit' and
'pending_disable' for its delivery.
Additionally; and different to Marco's proposed patch:
- recognise that pending_disable effectively duplicates oncpu for
the case where it is set. As such, change the irq_work handler to
use ->oncpu to target the event and use pending_* as boolean toggles.
- observe that SIGTRAP targets the ctx->task, so the context switch
optimization that carries contexts between tasks is invalid. If
the irq_work were delayed enough to hit after a context switch the
SIGTRAP would be delivered to the wrong task.
- observe that if the event gets scheduled out
(rotation/migration/context-switch/...) the irq-work would be
insufficient to deliver the SIGTRAP when the event gets scheduled
back in (the irq-work might still be pending on the old CPU).
Therefore have event_sched_out() convert the pending sigtrap into a
task_work which will deliver the signal at return_to_user.
Fixes: 97ba62b278 ("perf: Add support for SIGTRAP on perf events")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Debugged-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Marco Elver <elver@google.com>
Debugged-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Marco Elver <elver@google.com>
Tested-by: Marco Elver <elver@google.com>