Always try to do cancellation in __io_uring_task_cancel() at least once,
so it actually goes and cleans its sqpoll tasks (i.e. via
io_sqpoll_cancel_sync()), otherwise sqpoll task may submit new requests
after cancellation and it's racy for many reasons.
Fixes: 521d6a737a ("io_uring: cancel sqpoll via task_work")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0a21bd6d794bb1629bc906dd57a57b2c2985a8ac.1616839147.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since #DB for bus lock detect changes the split_lock_detect parameter,
update the documentation for the changes.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20210322135325.682257-4-fenghua.yu@intel.com
Bus locks degrade performance for the whole system, not just for the CPU
that requested the bus lock. Two CPU features "#AC for split lock" and
"#DB for bus lock" provide hooks so that the operating system may choose
one of several mitigation strategies.
#AC for split lock is already implemented. Add code to use the #DB for
bus lock feature to cover additional situations with new options to
mitigate.
split_lock_detect=
#AC for split lock #DB for bus lock
off Do nothing Do nothing
warn Kernel OOPs Warn once per task and
Warn once per task and and continues to run.
disable future checking
When both features are
supported, warn in #AC
fatal Kernel OOPs Send SIGBUS to user.
Send SIGBUS to user
When both features are
supported, fatal in #AC
ratelimit:N Do nothing Limit bus lock rate to
N per second in the
current non-root user.
Default option is "warn".
Hardware only generates #DB for bus lock detect when CPL>0 to avoid
nested #DB from multiple bus locks while the first #DB is being handled.
So no need to handle #DB for bus lock detected in the kernel.
#DB for bus lock is enabled by bus lock detection bit 2 in DEBUGCTL MSR
while #AC for split lock is enabled by split lock detection bit 29 in
TEST_CTRL MSR.
Both breakpoint and bus lock in the same instruction can trigger one #DB.
The bus lock is handled before the breakpoint in the #DB handler.
Delivery of #DB for bus lock in userspace clears DR6[11], which is set by
the #DB handler right after reading DR6.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210322135325.682257-3-fenghua.yu@intel.com
A bus lock is acquired through either a split locked access to writeback
(WB) memory or any locked access to non-WB memory. This is typically >1000
cycles slower than an atomic operation within a cache line. It also
disrupts performance on other cores.
Some CPUs have the ability to notify the kernel by a #DB trap after a user
instruction acquires a bus lock and is executed. This allows the kernel to
enforce user application throttling or mitigation. Both breakpoint and bus
lock can trigger the #DB trap in the same instruction and the ordering of
handling them is the kernel #DB handler's choice.
The CPU feature flag to be shown in /proc/cpuinfo will be "bus_lock_detect".
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20210322135325.682257-2-fenghua.yu@intel.com
cpu_current_top_of_stack is currently stored in TSS.sp1. TSS is exposed
through the cpu_entry_area which is visible with user CR3 when PTI is
enabled and active.
This makes it a coveted fruit for attackers. An attacker can fetch the
kernel stack top from it and continue next steps of actions based on the
kernel stack.
But it is actualy not necessary to be stored in the TSS. It is only
accessed after the entry code switched to kernel CR3 and kernel GS_BASE
which means it can be in any regular percpu variable.
The reason why it is in TSS is historical (pre PTI) because TSS is also
used as scratch space in SYSCALL_64 and therefore cache hot.
A syscall also needs the per CPU variable current_task and eventually
__preempt_count, so placing cpu_current_top_of_stack next to them makes it
likely that they end up in the same cache line which should avoid
performance regressions. This is not enforced as the compiler is free to
place these variables, so these entry relevant variables should move into
a data structure to make this enforceable.
The seccomp_benchmark doesn't show any performance loss in the "getpid
native" test result. Actually, the result changes from 93ns before to 92ns
with this change when KPTI is disabled. The test is very stable and
although the test doesn't show a higher degree of precision it gives enough
confidence that moving cpu_current_top_of_stack does not cause a
regression.
[ tglx: Removed unneeded export. Massaged changelog ]
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210125173444.22696-2-jiangshanlai@gmail.com
- Avoid write of uninitialized memory when generating PERF_RECORD_MMAP* records.
- Fix 'perf top' BPF support related crash with perf_event_paranoid=3 + kptr_restrict.
- Validate raw event with sysfs exported format bits.
- Fix waipid on SIGCHLD delivery bugs in 'perf daemon'.
- Change to use bash for daemon test on Debian, where the default is dash and
thus fails for use of bashisms in this test.
- Fix memory leak in vDSO found using ASAN.
- Remove now useless (due to the fact taht BPF now supports static vars)
failing sub test "BPF relocation checker".
- Fix auxtrace queue conflict.
- Sync linux/kvm.h with the kernel sources.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYGDHswAKCRCyPKLppCJ+
J765AQDJNsaxs3RximgY503X6pguLSrMY/uBq9XB2uYCeM3xRgEA1h4nIFCOE06V
s/kOphsfgbUu2UTD1rtCM2olZRzS1AU=
=kpj/
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-fixes-for-v5.12-2020-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tooling fixes from Arnaldo Carvalho de Melo:
- Avoid write of uninitialized memory when generating PERF_RECORD_MMAP*
records.
- Fix 'perf top' BPF support related crash with perf_event_paranoid=3 +
kptr_restrict.
- Validate raw event with sysfs exported format bits.
- Fix waipid on SIGCHLD delivery bugs in 'perf daemon'.
- Change to use bash for daemon test on Debian, where the default is
dash and thus fails for use of bashisms in this test.
- Fix memory leak in vDSO found using ASAN.
- Remove now useless (due to the fact that BPF now supports static
vars) failing sub test "BPF relocation checker".
- Fix auxtrace queue conflict.
- Sync linux/kvm.h with the kernel sources.
* tag 'perf-tools-fixes-for-v5.12-2020-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf test: Change to use bash for daemon test
perf record: Fix memory leak in vDSO found using ASAN
perf test: Remove now useless failing sub test "BPF relocation checker"
perf daemon: Return from kill functions
perf daemon: Force waipid for all session on SIGCHLD delivery
perf top: Fix BPF support related crash with perf_event_paranoid=3 + kptr_restrict
perf pmu: Validate raw event with sysfs exported format bits
perf synthetic events: Avoid write of uninitialized memory when generating PERF_RECORD_MMAP* records
tools headers UAPI: Sync linux/kvm.h with the kernel sources
perf synthetic-events: Fix uninitialized 'kernel_thread' variable
perf auxtrace: Fix auxtrace queue conflict
- Fix build failure on Ubuntu with new GCC packages that turn on -fcf-protection
- Fix SME memory encryption PTE encoding bug - AFAICT the code worked on
4K page sizes (level 1) but had the wrong shift at higher page level orders
(level 2 and higher).
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmBgXdERHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gJWxAAgAOWwGY3yq3kUEtIExosXZPlHCFjal3N
UXoVpQde4aBeZ9A4flMjkSZmTF5PVEN2npMz8ltnxU8NUJg4QR68UYiIE8BReARg
+JuyNXGdAu1XyT+dWdTFqL9xgA9t8dG13o4WbBqGDZagnLNuvjYhzJtsgw9FbNWZ
a1abBbcxpoZvSyQBHyqtuwoiWeeeFJiQZ02wZwxtonYHWVbBXEN5WhFL9Tc2kDJc
/Ic09O+FDhpe3I/PvCiMrkpVJuBnaDdve5zDPDzR+FRMeAj4AhNLIJiMFj17bJWD
eR6vCDoFz3EsbSdJz0XvHIZOSZnaiiC0ybTEv5nJTiRgDk+s6JDXWwDcJG+3yKJR
Fm5TLlnaU++E9lYLpyCbgrWkrv0F2u3GmnieFnOOyzRv8NlkZqrThApf3xGsavy+
qJZnXe5ftWp+mmIDw4DZDBVsJ8rBIflvURQxfG3SHkUc0iVsyUCrAK2eKYewk/dN
eC6FVPkCdx4Ys50wb+OR9Enhq3yKFyRuZ2zIeguUX30sqoapJL85M1vglS5DFoX/
pHcigRzBzFQOZhOh8Kq3VREOx0F+ioUfcZzmYdzjWSfXfpvqWFcLAIFgOv1hDfms
XQ60X/voG0tWd0ODKXqyx6oa0rqamigPjLJp/gtDKpQHORFaabvnTJTLwN6n8N1Q
syTWRiHMhi0=
=tM9n
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2021-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Two fixes:
- Fix build failure on Ubuntu with new GCC packages that turn
on -fcf-protection
- Fix SME memory encryption PTE encoding bug - AFAICT the code
worked on 4K page sizes (level 1) but had the wrong shift at
higher page level orders (level 2 and higher)"
* tag 'x86-urgent-2021-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Turn off -fcf-protection for realmode targets
x86/mem_encrypt: Correct physical address calculation in __set_clr_pte_enc()
Right now nothing uses this API explicitly, but this is an accident waiting to happen.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmBgWhwRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gNOg/+PRiR4Zs/Wmt9r6iFG5xFCQ+xuUzsmcLO
XEwQiUsLGn2yRuLn4wY8+1F4fZpQhcX38jsuMpaY43NMMoFLEKMd2D7yJ24/y5zf
66U2xg7+KMxTKjhx1D75Ln9HpvnS6aB7s5HAKTWO7i9dK9CVmYrO4rMDCK1msIMp
vETjCiKEN7s4BwW/j31L+2gegYXBsxyS14xUT9jX+6lL4yNlRXPKz/adkirQ9BJP
TXY/4K4qhrGjFgpLHQB8haD7P6NN1OaQLX16P+eV//LxQWz49Z8uf60nZKcmDxdk
rbRZD6VD0LqwsKnQW/bnC3Cug59T+Ybz4mgLrf3YsocDbafb3sQYhWTp4+5nlBls
cY6Wq4OrqPli6auFZyK/Ih9hPfI3AFcY/+ieFqenumrl6SPUQTlvX4MfL/oVxszD
aEX5tbWBlCf0fKlr3kcjAaOhjAOZ5SazCJ5y2gY/r7skKDFmuU4eIBzCb+AttpRT
pX/aeXVZdeBUMdMM8W0XfVvscxDkFlO/cEw8hpTFqmYmfwy5CGQDxRvVSrbKEZcg
KKvPUgrotTGOktTKO9tdiX29Su/SSQERC8x3dPV0VohxiRhwILK7/mn8RZJpfkS6
fDFEa22C3z2Xdq3w2c4OweCjT8Kb5Q2udg9EmDru7wrwANlSQRFAJYDtu2TbV5/3
IZ62AGNczXs=
=NgW4
-----END PGP SIGNATURE-----
Merge tag 'locking-urgent-2021-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
"Fix the non-debug mutex_lock_io_nested() method to map to
mutex_lock_io() instead of mutex_lock().
Right now nothing uses this API explicitly, but this is an
accident waiting to happen"
* tag 'locking-urgent-2021-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/mutex: Fix non debug version of mutex_lock_io_nested()
When the TSC frequency is known because it is retrieved from the
hypervisor, skip TSC refined calibration by setting X86_FEATURE_TSC_KNOWN_FREQ.
Signed-off-by: Alexey Makhalov <amakhalov@vmware.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210105004752.131069-1-amakhalov@vmware.com
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmBfx9sACgkQiiy9cAdy
T1F/igv8DsHxOJLw9kc5pBrbmUEgsUpQbdRMESqEROyqKte80jga2P3wsvJQYqQY
JwHPxh477eiRSpSkEWSFDMmELsVtoQIYv3aqgPe79668eCd97mHRM2ItSV++5x9M
iJ0N8GuiVARSyKmndrZ9gvbPoJb4TKkPX6X44pDSgAkgskvTTFKTywZaY5IEiqKe
9zBWghZbNnWhtYG+2On2M2tzy8/Fo8aveLxhFhJstZ0IP6Px+Rg9GMdzRAfDJqK+
QQMwcqmRKjVo4/Z6yji/s9OI1+eQyIAKLa6cyB0Yd+AqnvDYv1dagkRAjRCHl/Ri
28loxGatXeXjXJGYU58EjNkKdoBUh09idJJolcMGwPSteL2j1DQDV9utbZLhAWPq
yNugiIkzbQj3Z55UQ3n3u79pztK31GZ2TOcwJbIqQs3tctJ5aqUIWjQibLVpaNBR
7C5Yug9aC5gpr3LPIUD3AGZIUAenCzsVN5Y9br4SPx0/zHmbynyyF27w14shX8O/
3uQr6xhl
=NxLV
-----END PGP SIGNATURE-----
Merge tag '5.12-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Five cifs/smb3 fixes, two for stable.
Includes an important fix for encryption and an ACL fix, as well as a
fix for possible reflink data corruption"
* tag '5.12-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6:
smb3: fix cached file size problems in duplicate extents (reflink)
cifs: Silently ignore unknown oplock break handle
cifs: revalidate mapping when we open files for SMB1 POSIX
cifs: Fix chmod with modefromsid when an older ACE already exists.
cifs: Adjust key sizes and key generation routines for AES256 encryption
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmBf1KAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpjVSD/0f1HdekXnIE6aSRQ7YEV8ux2t5wUeDyP8U
cdcZ8fBW9PvKZLdODSI4sw8UYV5OYEBcfImFe3nRVHR+RIVQo72UTYvuHqeUYNct
w3drgF2GEMIxJFZR6zf9LDrQVduPqXvbEJLui6TN+eX/5E99ZlUWMLwkX1k+vDju
QfaGZjz2736GTn1MPc7jdyZKoK7eCi5xtNFPash5wGck7aYl5TGXnG/8bRYsv2Tw
eCYKbvv4x0s8OFcYVQMooDfbIMCyyfTwt6YatFHQEtM/RM+M66gndvv3jfkeJQju
hz0I8qOJ8X5lf0VucncWs5J8b9Whr5YZV+k9461xalBbV9ed2vzIIikP8DpCxtYz
yKbsdDm0+3hwfuZOz+d7ooEXKsphJ1PnSsEeuNZXtKDXVtphksUbbq4H2NLINcsQ
m6dwaRPSEA0EymngGY2e+8+CU0euiE4mqoMpw4D9m9Irs+BAaWYGk9xCWr0BGem0
auZOMqvV2xktdBlGx1BJCLts1sHHxy8IM3u0852R/1AfcKOkXwNVPt62I8e9ceIA
wc731aWHwJfS25m430xFDPJKJpUZoZgste4qwVym70CmRziuamgYyIfrfRg1ZjsD
ZBa9Z4hPiT4e0eDqlYjcMpl9FORgYQXVXy5ofd/eZg5xkU8X+i6TVZkaQNkZyqV/
4ogBZYUolg==
=mwLC
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.12-2021-03-27' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- Use thread info versions of flag testing, as discussed last week.
- The series enabling PF_IO_WORKER to just take signals, instead of
needing to special case that they do not in a bunch of places. Ends
up being pretty trivial to do, and then we can revert all the special
casing we're currently doing.
- Kill dead pointer assignment
- Fix hashed part of async work queue trace
- Fix sign extension issue for IORING_OP_PROVIDE_BUFFERS
- Fix a link completion ordering regression in this merge window
- Cancellation fixes
* tag 'io_uring-5.12-2021-03-27' of git://git.kernel.dk/linux-block:
io_uring: remove unsued assignment to pointer io
io_uring: don't cancel extra on files match
io_uring: don't cancel-track common timeouts
io_uring: do post-completion chore on t-out cancel
io_uring: fix timeout cancel return code
Revert "signal: don't allow STOP on PF_IO_WORKER threads"
Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing"
Revert "kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals"
Revert "signal: don't allow sending any signals to PF_IO_WORKER threads"
kernel: stop masking signals in create_io_thread()
io_uring: handle signals for IO threads like a normal thread
kernel: don't call do_exit() for PF_IO_WORKER threads
io_uring: maintain CQE order of a failed link
io-wq: fix race around pending work on teardown
io_uring: do ctx sqd ejection in a clear context
io_uring: fix provide_buffers sign extension
io_uring: don't skip file_end_write() on reissue
io_uring: correct io_queue_async_work() traces
io_uring: don't use {test,clear}_tsk_thread_flag() for current
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmBf1YoQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgprivEADZx//LFwziicjD3Nd5XcLfMeE1su6+CULD
SkGh8MALlB3/smeYa+gG5tb5U8l+7Xk62pDWXsRZj+Ckw/FDGql4qve6uSDqAIBz
6W6PKjHLY81E8nVe/WHcvhQrxE8E9yZg/Hrg4FWLpcLbmJTt709Cm+FciHP8BAsR
iv3gkBreMRrt9Xlfimn4XCsGaqbXg2Xx8AhaJBshhhjIXvirvB8ctNZvguNX4KFl
ob+KTO1p26mTFxHLiaJt1fNJzj21XdMrT27FMPqylBF5s1Xr4U9plZHgTX6KMx3o
BZx1QFTGiskgdKhR01AgzM4ASIWZAUDfpRgABfyWdqHTwqeJyHbcJ+emRpiGCyER
Og3ar2m75WUA8+Pfgl9TusnNTCiRVYBAcMZGpGEbGKZt+cyCq2Ed161e2I7NPOxR
c60/j4KHq3uBXh1FhNRX1Y9ZUiK031RqGhBCABeM0bnxImyEo96L3VXJ72RZOvjZ
1lo9U35q7B6AaFlAesYH4/WaPIExy3RObVHUVtXokzcm4RFh9eycuxPdGc+HDZ04
h8t6KaAKTtBadIIMWvz34SNykqM4Q0xcHrt8Wz+1C3FZfgc7rkQpVBZLjhk5fx8h
33KeuMrATAFGvv9d0tbARbIXqXaFGwcc7Z0sSfVnzRfFM/aPa5xnIfGmbxoT5gH8
v/6ySA3EWA==
=ZaB3
-----END PGP SIGNATURE-----
Merge tag 'block-5.12-2021-03-27' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Fix regression from this merge window with the xarray partition
change, which allowed partition counts that overflow the u8 that
holds the partition number (Ming)
- Fix zone append warning (Johannes)
- Segmentation count fix for multipage bvecs (David)
- Partition scan fix (Chris)
* tag 'block-5.12-2021-03-27' of git://git.kernel.dk/linux-block:
block: don't create too many partitions
block: support zone append bvecs
block: recalculate segment count for multi-segment discards correctly
block: clear GD_NEED_PART_SCAN later in bdev_disk_changed
Seven fixes, all in drivers (qla2xxx, mkt3sas, qedi, target,
ibmvscsi). The most serious are the target pscsi oom and the qla2xxx
revert which can otherwise cause a use after free.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYF/V2yYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pisha+AAQCSO7Wv
Wshi7H9N8SQUJO9EcmkrSApGDtjHUfZYlnse0AD/UhbpCSzSEPzI23lfduMB3QTa
o5wmwHEIG8ULneB57vE=
=f44O
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Seven fixes, all in drivers (qla2xxx, mkt3sas, qedi, target,
ibmvscsi).
The most serious are the target pscsi oom and the qla2xxx revert which
can otherwise cause a use after free"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: target: pscsi: Clean up after failure in pscsi_map_sg()
scsi: target: pscsi: Avoid OOM in pscsi_map_sg()
scsi: mpt3sas: Fix error return code of mpt3sas_base_attach()
scsi: qedi: Fix error return code of qedi_alloc_global_queues()
scsi: Revert "qla2xxx: Make sure that aborted commands are freed"
scsi: ibmvfc: Make ibmvfc_wait_for_ops() MQ aware
scsi: ibmvfc: Fix potential race in ibmvfc_wait_for_ops()
Currently, building the bpf-next source with the CONFIG_BPF_SYSCALL
enabled is causing a compilation error:
"net/ipv4/bpf_tcp_ca.c:209:28: error: expected identifier or '(' before
',' token"
Fix this by removing an unnecessary comma.
Fixes: e78aea8b21 ("bpf: tcp: Put some tcp cong functions in allowlist for bpf-tcp-cc")
Reported-by: syzbot+0b74d8ec3bf0cc4e4209@syzkaller.appspotmail.com
Signed-off-by: Atul Gopinathan <atulgopinathan@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210328120515.113895-1-atulgopinathan@gmail.com
In commit dee60c0dbc ("s390/pci: add zpci_event_hard_deconfigured()")
we added a zdev_enabled() check to what was previously an uncoditional
call to zpci_disable_device(). There are two problems with that. Firstly
zpci_had_deconfigured() is only called on event 0x0304 for which the
device is always already disabled by the platform so it is always false.
Secondly zpci_disable_device() not only disables the device but also
calls zpci_dma_exit_device() which is thus not called and we leak the
DMA tables.
Fix this by calling zpci_disable_device() unconditionally to perform
Linux side cleanup including the freeing of DMA tables.
Fixes: dee60c0dbc ("s390/pci: add zpci_event_hard_deconfigured()")
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Add a backup for s390 vfio-pci, an additional backup for vfio-ccw
and replace the backup for vfio-ap as Pierre is focusing on other
areas.
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Eric Farman <farman@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Acked-by: Pierre Morel <pmorel@linux.ibm.com>
Acked-by: Jason J. Herne <jjherne@linux.ibm.com>
Link: https://lore.kernel.org/r/1616679712-7139-1-git-send-email-mjrosato@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The blocks containing the bad block table can become bad as well. So
make sure to skip any blocks that are marked bad when searching for the
bad block table.
Otherwise in very rare cases where two BBT blocks wear out it might
happen that an obsolete BBT is used instead of a newer available
version.
Signed-off-by: Stefan Riedmueller <s.riedmueller@phytec.de>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de
This allows extending ofpart parser with support for Linksys Northstar
devices. That support uses recently added quirks mechanism.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20210312134919.7767-2-zajec5@gmail.com
Document nvmem-cells compatible used to treat mtd partitions as a
nvmem provider.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20210312062830.20548-3-ansuelsmth@gmail.com
Drop $nodename restriction as now mtd partition can also be used as
nvmem provider.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20210312062830.20548-2-ansuelsmth@gmail.com
Partitions that contains the nvmem-cells compatible will register
their direct subonodes as nvmem cells and the node will be treated as a
nvmem provider.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Tested-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20210312062830.20548-1-ansuelsmth@gmail.com
This may sound like a contradiction but some SPI-NOR flashes really
support erasing their OTP region until it is finally locked. Having the
possibility to erase an OTP region might come in handy during
development.
The ioctl argument follows the OTPLOCK style.
Signed-off-by: Michael Walle <michael@walle.cc>
Acked-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20210303201819.2752-1-michael@walle.cc
In __cpu_setup we conditionally manipulate the TCR_EL1 value in x10
after previously using x10 as a scratch register for unrelated temporary
variables.
To make this a bit clearer, let's move the TCR_EL1 value into a named
register `tcr`. To simplify the register allocation, this is placed in
the highest available caller-saved scratch register, tcr.
Following the example of `mair`, we initialise the register with the
default value prior to any feature discovery, and write it to MAIR_EL1
after all feature discovery is complete, which allows us to simplify the
featuere discovery code.
The existing `mte_tcr` register is no longer needed, and is replaced by
the use of x10 as a temporary, matching the rest of the MTE feature
discovery assembly in __cpu_setup. As x20 is no longer used, the
function is now AAPCS compliant, as we've generally aimed for in our
assembly functions.
There should be no functional change as as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210326180137.43119-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In __cpu_setup we conditionally manipulate the MAIR_EL1 value in x5
before later reusing x5 as a scratch register for unrelated temporary
variables.
To make this a bit clearer, let's move the MAIR_EL1 value into a named
register `mair`. To simplify the register allocation, this is placed in
the highest available caller-saved scratch register, x17. As it is no
longer clobbered by other usage, we can write the value to MAIR_EL1 at
the end of the function as we do for TCR_EL1 rather than part-way though
feature discovery.
There should be no functional change as as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210326180137.43119-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
MEMLOCK, MEMUNLOCK and OTPLOCK modify protection bits. Thus require
write permission. Depending on the hardware MEMLOCK might even be
write-once, e.g. for SPI-NOR flashes with their WP# tied to GND. OTPLOCK
is always write-once.
MEMSETBADBLOCK modifies the bad block table.
Fixes: f7e6b19bc7 ("mtd: properly check all write ioctls for permissions")
Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Rafał Miłecki <rafal@milecki.pl>
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20210303155735.25887-1-michael@walle.cc
The module misses MODULE_DEVICE_TABLE() for both SPI and OF ID tables
and thus never autoloads on ID matches.
Add the missing declarations.
Present since day-0 of spinand framework introduction.
Fixes: 7529df4652 ("mtd: nand: Add core infrastructure to support SPI NANDs")
Cc: stable@vger.kernel.org # 4.19+
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20210323173714.317884-1-alobakin@pm.me
Suppresses the following coccinelle warning:
drivers/mtd/nand/raw/rockchip-nand-controller.c:162:4-8: WARNING use flexible-array member instead
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20210323131137.45552-1-zou_wei@huawei.com
Currently start_backtrace() is a static inline function in the header.
Since it really shouldn't be sufficiently performance critical that we
actually need to have it inlined move it into a C file, this will save
anyone else scratching their head about why it is defined in the header.
As far as I can see it's only there because it was factored out of the
various callers.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210319174022.33051-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch supports the DFL drivers be written in userspace. This is
realized by exposing the userspace I/O device interfaces.
The driver now only binds the ether group feature, which has no irq. So
the irq support is not implemented yet.
Reviewed-by: Tom Rix <trix@redhat.com>
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
Link: https://lore.kernel.org/r/1615168776-8553-2-git-send-email-yilun.xu@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add support for pvpanic PCI device added in qemu [1]. At probe time, obtain the
address where to read/write pvpanic events and pass it to the generic handling
code. Will follow the same logic as pvpanic MMIO device driver. At remove time,
unmap base address and disable PCI device.
[1] 9df52f58e7
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Link: https://lore.kernel.org/r/1616597356-20696-4-git-send-email-mihai.carabas@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Create the mecahism that allows multiple pvpanic instances to call
pvpanic_probe and receive panic events. A global list will retain all the
mapped addresses where to write panic events.
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Link: https://lore.kernel.org/r/1616597356-20696-3-git-send-email-mihai.carabas@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Split-up generic and platform dependent code in order to be able to re-use
generic event handling code in pvpanic PCI device driver in the next patches.
The code from pvpanic.c was split in two new files:
- pvpanic.c: generic code that handles pvpanic events
- pvpanic-mmio.c: platform/bus dependent code
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Link: https://lore.kernel.org/r/1616597356-20696-2-git-send-email-mihai.carabas@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When device_create_file() fails and returns a non-zero value,
no error return code of driver_sysfs_add() is assigned.
To fix this bug, ret is assigned with the return value of
device_create_file(), and then ret is checked.
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Link: https://lore.kernel.org/r/20210324023405.12465-1-baijiaju1990@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Deferred probe usually runs only on pinned kworkers, which might take
longer time if a device contains multiple sub-devices. One such case
is of sound card on mobile devices, where we have good number of
mixers and controls per mixer.
We observed boot up improvement - deferred probes take ~600ms when bound
to little core kworker and ~200ms when deferred probe is queued on
unbound wq. This is due to scheduler moving the worker running deferred
probe work to big CPUs. Without this change, we see the worker is running
on LITTLE CPU due to affinity.
Since kworker runs deferred probe of several devices, the locality may
not be important. Also, init thread executing driver initcalls, can
potentially migrate as it has cpu affinity set to all cpus.In addition
to this, async probes use unbounded workqueue. So, using unbounded wq for
deferred probes looks to be similar to these w.r.t. scheduling behavior.
Signed-off-by: Yogesh Lal <ylal@codeaurora.org>
Link: https://lore.kernel.org/r/1616583698-6398-1-git-send-email-ylal@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When cmd > 6 or copy_to_user() fail, The variable 'ret' would not be
returned back. Fix the 'ret' set but not used.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Xu Jia <xujia39@huawei.com>
Link: https://lore.kernel.org/r/20210324072031.941791-1-xujia39@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>