mutex_trylock_recursive() has been removed from the tree, there is no
need to check for it.
Remove traces of mutex_trylock_recursive()'s existence.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210210085248.219210-3-bigeasy@linutronix.de
There are not users of mutex_trylock_recursive() in tree as of
v5.11-rc7.
Remove it.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210210085248.219210-2-bigeasy@linutronix.de
Commit 997acaf6b4 ("lockdep: report broken irq restoration") makes
compiling s390 fail because the irq enable/disable functions are now
no longer fully contained in header files.
Fixes: 997acaf6b4 ("lockdep: report broken irq restoration")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
vmlinux.o: warning: objtool: lock_is_held_type()+0x107: call to warn_bogus_irq_restore() leaves .noinstr.text section
As per the general rule that WARNs are allowed to violate noinstr to
get out, annotate it away.
Fixes: 997acaf6b4 ("lockdep: report broken irq restoration")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lkml.kernel.org/r/YCKyYg53mMp4E7YI@hirez.programming.kicks-ass.net
Commit f6f48e1804 ("lockdep: Teach lockdep about "USED" <- "IN-NMI"
inversions") overlooked that print_usage_bug() releases the graph_lock
and called it without the graph lock held.
Fixes: f6f48e1804 ("lockdep: Teach lockdep about "USED" <- "IN-NMI" inversions")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <longman@redhat.com>
Link: https://lkml.kernel.org/r/YBfkuyIfB1+VRxXP@hirez.programming.kicks-ass.net
This is a leftover from 7f26482a87 ("locking/percpu-rwsem: Remove the embedded rwsem")
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20210126101721.976027-1-nborisov@suse.com
To fix the following issues:
kernel/locking/rtmutex.c:1612: warning: Function parameter or member
'lock' not described in '__rt_mutex_futex_unlock'
kernel/locking/rtmutex.c:1612: warning: Function parameter or member
'wake_q' not described in '__rt_mutex_futex_unlock'
kernel/locking/rtmutex.c:1675: warning: Function parameter or member
'name' not described in '__rt_mutex_init'
kernel/locking/rtmutex.c:1675: warning: Function parameter or member
'key' not described in '__rt_mutex_init'
[ tglx: Change rt lock to rt_mutex for consistency sake ]
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/1605257895-5536-2-git-send-email-alex.shi@linux.alibaba.com
futex(2) says that 'utime' is a pointer to 'const'. The implementation
doesn't use 'const'; however, it _never_ modifies the contents of utime.
- futex() either uses 'utime' as a pointer to struct or as a 'u32'.
- In case it's used as a 'u32', it makes a copy of it, and of course it is
not dereferenced.
- In case it's used as a 'struct __kernel_timespec __user *', the pointer
is not dereferenced inside the futex() definition, and it is only passed
to a function: get_timespec64(), which accepts a 'const struct
__kernel_timespec __user *'.
[ tglx: Make the same change to the compat syscall and fixup the prototypes. ]
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201128123945.4592-1-alx.manpages@gmail.com
We generally expect local_irq_save() and local_irq_restore() to be
paired and sanely nested, and so local_irq_restore() expects to be
called with irqs disabled. Thus, within local_irq_restore() we only
trace irq flag changes when unmasking irqs.
This means that a sequence such as:
| local_irq_disable();
| local_irq_save(flags);
| local_irq_enable();
| local_irq_restore(flags);
... is liable to break things, as the local_irq_restore() would mask
irqs without tracing this change. Similar problems may exist for
architectures whose arch_irq_restore() function depends on being called
with irqs disabled.
We don't consider such sequences to be a good idea, so let's define
those as forbidden, and add tooling to detect such broken cases.
This patch adds debug code to WARN() when raw_local_irq_restore() is
called with irqs enabled. As raw_local_irq_restore() is expected to pair
with raw_local_irq_save(), it should never be called with irqs enabled.
To avoid the possibility of circular header dependencies between
irqflags.h and bug.h, the warning is handled in a separate C file.
The new code is all conditional on a new CONFIG_DEBUG_IRQFLAGS symbol
which is independent of CONFIG_TRACE_IRQFLAGS. As noted above such cases
will confuse lockdep, so CONFIG_DEBUG_LOCKDEP now selects
CONFIG_DEBUG_IRQFLAGS.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210111153707.10071-1-mark.rutland@arm.com
While running my branch profiler that checks for incorrect "likely" and
"unlikely"s around the kernel, there's a large number of them that are
incorrect due to being "static_branches".
As static_branches are rather special, as they are likely or unlikely for
other reasons than normal annotations are used for, there's no reason to
have them be profiled.
Expose the "unlikely_notrace" and "likely_notrace" so that the
static_branch can use them, and have them be ignored by the branch
profilers.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201211163754.585174b9@gandalf.local.home
Spread the love..
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
The purpose of local_lock_t is to abstract: preempt_disable() /
local_bh_disable() / local_irq_disable(). These are the traditional
means of gaining access to per-cpu data, but are fundamentally
non-preemptible.
local_lock_t provides a per-cpu lock, that on !PREEMPT_RT reduces to
no-ops, just like regular spinlocks do on UP.
This gives rise to:
CPU0 CPU1
local_lock(B) spin_lock_irq(A)
<IRQ>
spin_lock(A) local_lock(B)
Where lockdep then figures things will lock up; which would be true if
B were any other kind of lock. However this is a false positive, no
such deadlock actually exists.
For !RT the above local_lock(B) is preempt_disable(), and there's
obviously no deadlock; alternatively, CPU0's B != CPU1's B.
For RT the argument is that since local_lock() nests inside
spin_lock(), it cannot be used in hardirq context, and therefore CPU0
cannot in fact happen. Even though B is a real lock, it is a
preemptible lock and any threaded-irq would simply schedule out and
let the preempted task (which holds B) continue such that the task on
CPU1 can make progress, after which the threaded-irq resumes and can
finish.
This means that we can never form an IRQ inversion on a local_lock
dependency, so terminate the graph walk when looking for IRQ
inversions when we encounter one.
One consequence is that (for LOCKDEP_SMALL) when we look for redundant
dependencies, A -> B is not redundant in the presence of A -> L -> B.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
[peterz: Changelog]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
In preparation for adding an TRACE_IRQFLAGS dependent skip function to
check_redundant(), move it below the TRACE_IRQFLAGS #ifdef.
While there, provide a stub function to reduce #ifdef usage.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Some __bfs() walks will have additional iteration constraints (beyond
the path being strong). Provide an additional function to allow
terminating graph walks.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
The local_lock_t's are special, because they cannot form IRQ
inversions, make sure we can tell them apart from the rest of the
locks.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
These tests are added for two purposes:
* Test the implementation of wait context checks and related
annotations.
* Semi-document the rules for wait context nesting when
PROVE_RAW_LOCK_NESTING=y.
The test cases are only avaible for PROVE_RAW_LOCK_NESTING=y, as wait
context checking makes more sense for that configuration.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201208103112.2838119-5-boqun.feng@gmail.com
- Fix recently introduced crash in the intel_pstate driver that
occurs if scale-invariance is disabled during resume from
suspend-to-RAM due to inconsistent changes of APERF or MPERF
MSR values made by the platform firmware (Rafael Wysocki).
- Fix a memory leak and add a missing clk_put() in error paths in
the OPP framework (Quanyang Wang, Viresh Kumar).
- Add new C-states table for SnowRidge processors to the intel_idle
driver (Artem Bityutskiy).
- Update the MAINTAINERS entry for cpuidle to make it clear that
the governors are covered by it too (Lukas Bulwahn).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl/wOgQSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxe7wQAIP06E92PVDN1tDf88FN2BpBIziQvHPF
T99v8RWQZCaMOQz5b1nXTqbQrKWFgqHiQK+6VjQcMW6xTWt7EM0eNdzGQtEsv9vd
Ux4UIQp6CZnhorXLC9ZmTo+7tFze2L/KV4qVUQkkrcOXQ1qIqJiZdktYULRXgl37
pJwUroPacKT9WiYwzyXZ2dkt337DQBb1AyumHZ/5gOik6Xo+uVK0kuDzECjRHQ+5
R2aZftqAm/u6/JaeEgKQG+0LOAt1wJ6y9xcsFq303K/xDMKyRw4P3ngcVRfbyUSI
2t3GfL9h2+41wtUpLKeJiADIJEov5P9ngwMedHNOpIX+aLL4cBbXG/LFc86QakD4
qVcN+NOl13kNfn6eSSVz8mmf36juoEcYu1RhenPZfClwesqOZk0YcIoovT7k5A8B
BnfEkQrjN6jFTfgc+A7rDXbLvRjhkIw2qr9Do+77AdYu0LkvgWlV3It3No10SzKw
NshvNIedX3KF6xBISoFJhsGUrC8Xfpc5DIsfqYyQBIIKMEtSA00gK2XOHIpmhnUY
0jXwL3DNIf2xC4BXHUgxJa6QpUyjKeLdqqwpnUryChHo+b8z78oE/gMza7iAV56t
G7pQcdsX2ogLeNKtOvDbhYDa1janQpkiJoD/fy+QD7V5RO+YaxOVM5gFUrQegJKY
1Qm6LXaswei8
=51jM
-----END PGP SIGNATURE-----
Merge tag 'pm-5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a crash in intel_pstate during resume from suspend-to-RAM
that may occur after recent changes and two resource leaks in error
paths in the operating performance points (OPP) framework, add a new
C-states table to intel_idle and update the cpuidle MAINTAINERS entry
to cover the governors too.
Specifics:
- Fix recently introduced crash in the intel_pstate driver that
occurs if scale-invariance is disabled during resume from
suspend-to-RAM due to inconsistent changes of APERF or MPERF MSR
values made by the platform firmware (Rafael Wysocki).
- Fix a memory leak and add a missing clk_put() in error paths in the
OPP framework (Quanyang Wang, Viresh Kumar).
- Add new C-states table for SnowRidge processors to the intel_idle
driver (Artem Bityutskiy).
- Update the MAINTAINERS entry for cpuidle to make it clear that the
governors are covered by it too (Lukas Bulwahn)"
* tag 'pm-5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
intel_idle: add SnowRidge C-state table
cpufreq: intel_pstate: Fix fast-switch fallback path
opp: Call the missing clk_put() on error
opp: fix memory leak in _allocate_opp_table
MAINTAINERS: include governors into CPU IDLE TIME MANAGEMENT FRAMEWORK
This is a load of driver fixes (12 ufs, 1 mpt3sas, 1 cxgbi). The big
core two fixes are for power management ("block: Do not accept any
requests while suspended" and "block: Fix a race in the runtime power
management code") which finally sorts out the resume problems we've
occasionally been having. To make the resume fix, there are seven
necessary precursors which effectively renames REQ_PREEMPT to REQ_PM,
so every "special" request in block is automatically a power
management exempt one. All of the non-PM preempt cases are removed
except for the one in the SCSI Parallel Interface (spi) domain
validation which is a genuine case where we have to run requests at
high priority to validate the bus so this becomes an autopm get/put
protected request.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCX+98LyYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishYvLAP9K+HBT
Lrkt3VWc9gq6F36+QH/SeW8IyXGaj77ysFHXxwD/UambRjRK8IA24mvf9sWeLLj6
p8CqCHUkCXqP48IiymE=
=NHrx
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a load of driver fixes (12 ufs, 1 mpt3sas, 1 cxgbi).
The big core two fixes are for power management ("block: Do not accept
any requests while suspended" and "block: Fix a race in the runtime
power management code") which finally sorts out the resume problems
we've occasionally been having.
To make the resume fix, there are seven necessary precursors which
effectively renames REQ_PREEMPT to REQ_PM, so every "special" request
in block is automatically a power management exempt one.
All of the non-PM preempt cases are removed except for the one in the
SCSI Parallel Interface (spi) domain validation which is a genuine
case where we have to run requests at high priority to validate the
bus so this becomes an autopm get/put protected request"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (22 commits)
scsi: cxgb4i: Fix TLS dependency
scsi: ufs: Un-inline ufshcd_vops_device_reset function
scsi: ufs: Re-enable WriteBooster after device reset
scsi: ufs-mediatek: Use correct path to fix compile error
scsi: mpt3sas: Signedness bug in _base_get_diag_triggers()
scsi: block: Do not accept any requests while suspended
scsi: block: Remove RQF_PREEMPT and BLK_MQ_REQ_PREEMPT
scsi: core: Only process PM requests if rpm_status != RPM_ACTIVE
scsi: scsi_transport_spi: Set RQF_PM for domain validation commands
scsi: ide: Mark power management requests with RQF_PM instead of RQF_PREEMPT
scsi: ide: Do not set the RQF_PREEMPT flag for sense requests
scsi: block: Introduce BLK_MQ_REQ_PM
scsi: block: Fix a race in the runtime power management code
scsi: ufs-pci: Enable UFSHCD_CAP_RPM_AUTOSUSPEND for Intel controllers
scsi: ufs-pci: Fix recovery from hibernate exit errors for Intel controllers
scsi: ufs-pci: Ensure UFS device is in PowerDown mode for suspend-to-disk ->poweroff()
scsi: ufs-pci: Fix restore from S4 for Intel controllers
scsi: ufs-mediatek: Keep VCC always-on for specific devices
scsi: ufs: Allow regulators being always-on
scsi: ufs: Clear UAC for RPMB after ufshcd resets
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/vOAwQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpryUD/9C4cYltnJmzd4OQLDav3vchGI2dhy8Fh7T
Lp04YPpsVFswZq/tz1fyrP1gA4r7lD2QGn+rGtel/hgaXkaxLqwoQ9No/lOJ7Y22
dtDfPGlNrvhjBQL5l+N7xP1DF8BlBOaXHPfMSW2t14InnV/TYUvxuI4YwhZIiuqP
kWYmAGcTdyFRS/x+tQjiyvqMd8VVYiTlEWyL4TpZoxeHigZF2Q3An3uZ+NdsnO0z
S19yZ7eMwUks4Kx2X2WQ2uaMea90bX+sU6v4XABqBcgWqVH/1mbL4MZ8kaiCUaBr
66Im7dN1xS/VMyueB3crDhz7RvjDlZZmCz3i9CNnWUcbHrvUXRms8b9LNsrpXkJ/
ZZq8YAmqM20EBeZVSXL2WCFK1sDBxxsXv5zX4MYwUk7pZ3B+Uea8Z/DCUHBtTpnN
FEbeGDFZs4IlhHoQ/UnnykdAYHvxUVEbWSICcQrzgeh0e4aPgS7nZOE6FiLU5q4n
rl+dOjz5SrdURvFBVPybCFnoV9YCdU7mRDZkx/AWyYpG/zGzQhbS1JQzd9YATIFA
TF6aAl6TuA5yoq9QIiVfd+7SdGqxhM03rCxelw7I9conVzpfBUFXSphVsEh5XnkW
X2M4R1aTtQ49cscFALX6okuadqJoRFEH1f4hT4m4C8BszRH2UjD/Up52pP15Wq0Q
mmtr1MenIw==
=OvP3
-----END PGP SIGNATURE-----
Merge tag 'block-5.11-2021-01-01' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Two minor block fixes from this last week that should go into 5.11:
- Add missing NOWAIT debugfs definition (Andres)
- Fix kerneldoc warning introduced this merge window (Randy)"
* tag 'block-5.11-2021-01-01' of git://git.kernel.dk/linux-block:
block: add debugfs stanza for QUEUE_FLAG_NOWAIT
fs: block_dev.c: fix kernel-doc warnings from struct block_device changes
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/vOCwQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpqA7D/9AyFvg16KEgfYCN2OYXU5jyphu7sCCb8Cx
PJ4H+Lf7fWki+/yFdXLxQnuBMGEOYWqtEIPN9CnO/I1ixzOoNugxiFAyerhd/Noh
COg2EUsUrWq/zobYP60wN9pBPnW6EHTnFVA02kMVKunm4d5O5DZWPXy5BwA9yU3u
dE9LoYDjFiaahogi3x+EmYStexxT0FB0d5WTONA7qSFrskeNbyVaYy8mY09jPynG
IbG41fv2n0Zwlcx4XDCebsZ1+08rAGZFhwiq8VBhPNiz7sOud9jW7rRFHXR2FVoo
DsW2npiYHVvOYqkl1HjXw5Mo6p8UKrDEDAIS7OOAHXM9Lz2/YGS9h9ogROccBta2
5er12VaahIEiH05KtxpGv/q+vyJK7Gdqg0jSuSzKHSdSpTS10Ejh82Xo2V6lRedb
gP03ZiDZjLtvh8F5hrWTJqPTtnFDRkY/I7R3WP1Ga7mqajFhpFDMvjvyEMMBCz+K
KGjMfahNo2nzc9nu5M1VjX42tz5VxKjA3N2netxBfDMVB/GpGcQ7xygS85wx7VPn
UUChgqw0aJrrq5slOZEAVqSsBN/wN97+m6uLLdk025CzQngwiw5fkTooakPxnGee
bW9WKMpWBj/ipPXvU5C1tvHk4gxMg+cmxcr6EZ3uaWfE+MC7Xk9c00lNF62CT0Xm
e+0RWRV1ig==
=XYT5
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.11-2021-01-01' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"A few fixes that should go into 5.11, all marked for stable as well:
- Fix issue around identity COW'ing and users that share a ring
across processes
- Fix a hang associated with unregistering fixed files (Pavel)
- Move the 'process is exiting' cancelation a bit earlier, so
task_works aren't affected by it (Pavel)"
* tag 'io_uring-5.11-2021-01-01' of git://git.kernel.dk/linux-block:
kernel/io_uring: cancel io_uring before task works
io_uring: fix io_sqe_files_unregister() hangs
io_uring: add a helper for setting a ref node
io_uring: don't assume mm is constant across submits
Commit 436e980e2e ("kbuild: don't hardcode depmod path") stopped
hard-coding the path of depmod, but in the process caused trouble for
distributions that had that /sbin location, but didn't have it in the
PATH (generally because /sbin is limited to the super-user path).
Work around it for now by just adding /sbin to the end of PATH in the
depmod.sh script.
Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For cancelling io_uring requests it needs either to be able to run
currently enqueued task_works or having it shut down by that moment.
Otherwise io_uring_cancel_files() may be waiting for requests that won't
ever complete.
Go with the first way and do cancellations before setting PF_EXITING and
so before putting the task_work infrastructure into a transition state
where task_work_run() would better not be called.
Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_sqe_files_unregister() uninterruptibly waits for enqueued ref nodes,
however requests keeping them may never complete, e.g. because of some
userspace dependency. Make sure it's interruptible otherwise it would
hang forever.
Cc: stable@vger.kernel.org # 5.6+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Setting a new reference node to a file data is not trivial, don't repeat
it, add and use a helper.
Cc: stable@vger.kernel.org # 5.6+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
trivial fixups for the new msgr2 support.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl/stgUTHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzi7SECACa5sWn5GKArGNCYesG/Xnbl4FbtorA
QNmDS7z5SBTsxQfOmkIq7Xon9fRxXEpWzr/eCEZKcXc+REHXGsy8zFz/HA0t97SS
tSho288zfEifGK0k2aptBTXAiEnzhIweu9f5ZbxfLN+JPyfynxfvXytdXYFeU/Op
zLUBSjJusmm671APBvyWl/5h/2N7XkpBM3AWNJjaW26+Mft3SpsS8Ui+USWb9SJF
YEmsVR03W2pbSS66a774bThoLP8exfg7xMfUcWRHqYs3Awpp0mwMnKkHMfooYLGA
Cc5FsQ7Exin23B4vCm6+0prg/cgwBg516HfsFH/jTcE2D/PjSCEX7tXE
=ZGwT
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-5.11-rc2' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A fix for an edge case in MClientRequest encoding and a couple of
trivial fixups for the new msgr2 support"
* tag 'ceph-for-5.11-rc2' of git://github.com/ceph/ceph-client:
libceph: add __maybe_unused to DEFINE_MSGR2_FEATURE
libceph: align session_key and con_secret to 16 bytes
libceph: fix auth_signature buffer allocation in secure mode
ceph: reencode gid_list when reconnecting
Add C-state table for the SnowRidge SoC which is found on Intel Jacobsville
platforms.
The following has been changed.
1. C1E latency changed from 10us to 15us. It was measured using the
open source "wult" tool (the "nic" method, 15us is the 99.99th
percentile).
2. C1E power break even changed from 20us to 25us, which may result
in less C1E residency in some workloads.
3. C6 latency changed from 50us to 130us. Measured the same way as C1E.
The C6 C-state is supported only by some SnowRidge revisions, so add a C-state
table commentary about this.
On SnowRidge, C6 support is enumerated via the usual mechanism: "mwait" leaf of
the "cpuid" instruction. The 'intel_idle' driver does check this leaf, so even
though C6 is present in the table, the driver will only use it if the CPU does
support it.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When sugov_update_single_perf() falls back to the "frequency"
path due to the missing scale-invariance, it will call
cpufreq_driver_fast_switch() via sugov_fast_switch()
and the driver's ->fast_switch() callback will be invoked,
so it must not be NULL.
However, after commit a365ab6b9d ("cpufreq: intel_pstate: Implement
the ->adjust_perf() callback") intel_pstate sets ->fast_switch() to
NULL when it is going to use intel_cpufreq_adjust_perf(), which is a
mistake, because on x86 the scale-invariance may be turned off
dynamically, so modify it to retain the original ->adjust_perf()
callback pointer.
Fixes: a365ab6b9d ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback")
Reported-by: Kenneth R. Crudup <kenny@panix.com>
Tested-by: Kenneth R. Crudup <kenny@panix.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull operating performance points (OPP) framework fixes for 5.11-rc2
from Viresh Kumar:
"This contains two patches to fix freeing of resources in error paths."
* 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
opp: Call the missing clk_put() on error
opp: fix memory leak in _allocate_opp_table
This was missed in 021a24460d. Leads to the numeric value of
QUEUE_FLAG_NOWAIT (i.e. 29) showing up in
/sys/kernel/debug/block/*/state.
Fixes: 021a24460d
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fix new kernel-doc warnings in fs/block_dev.c:
../fs/block_dev.c:1066: warning: Excess function parameter 'whole' description in 'bd_abort_claiming'
../fs/block_dev.c:1837: warning: Function parameter or member 'dev' not described in 'lookup_bdev'
Fixes: 4e7b5671c6 ("block: remove i_bdev")
Fixes: 37c3fc9abb ("block: simplify the block device claiming interface")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: linux-fsdevel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge misc fixes from Andrew Morton:
"16 patches
Subsystems affected by this patch series: mm (selftests, hugetlb,
pagecache, mremap, kasan, and slub), kbuild, checkpatch, misc, and
lib"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: slub: call account_slab_page() after slab page initialization
zlib: move EXPORT_SYMBOL() and MODULE_LICENSE() out of dfltcc_syms.c
lib/zlib: fix inflating zlib streams on s390
lib/genalloc: fix the overflow when size is too big
kdev_t: always inline major/minor helper functions
sizes.h: add SZ_8G/SZ_16G/SZ_32G macros
local64.h: make <asm/local64.h> mandatory
kasan: fix null pointer dereference in kasan_record_aux_stack
mm: generalise COW SMC TLB flushing race comment
mm/mremap.c: fix extent calculation
mm: memmap defer init doesn't work as expected
mm: add prototype for __add_to_page_cache_locked()
checkpatch: prefer strscpy to strlcpy
Revert "kbuild: avoid static_assert for genksyms"
mm/hugetlb: fix deadlock in hugetlb_cow error path
selftests/vm: fix building protection keys test
It's convenient to have page->objects initialized before calling into
account_slab_page(). In particular, this information can be used to
pre-alloc the obj_cgroup vector.
Let's call account_slab_page() a bit later, after the initialization of
page->objects.
This commit doesn't bring any functional change, but is required for
further optimizations.
[akpm@linux-foundation.org: undo changes needed by forthcoming mm-memcg-slab-pre-allocate-obj_cgroups-for-slab-caches-with-slab_account.patch]
Link: https://lkml.kernel.org/r/20201110195753.530157-1-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit 11fb479ff5 ("zlib: export S390 symbols for zlib modules"), I
added EXPORT_SYMBOL()s to dfltcc_inflate.c but then Mikhail said that
these should probably be in dfltcc_syms.c with the other
EXPORT_SYMBOL()s.
However, that is contrary to the current kernel style, which places
EXPORT_SYMBOL() immediately after the function that it applies to, so
move all EXPORT_SYMBOL()s to their respective function locations and
drop the dfltcc_syms.c file. Also move MODULE_LICENSE() from the
deleted file to dfltcc.c.
[rdunlap@infradead.org: remove dfltcc_syms.o from Makefile]
Link: https://lkml.kernel.org/r/20201227171837.15492-1-rdunlap@infradead.org
Link: https://lkml.kernel.org/r/20201219052530.28461-1-rdunlap@infradead.org
Fixes: 11fb479ff5 ("zlib: export S390 symbols for zlib modules")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Zaslonko Mikhail <zaslonko@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Decompressing zlib streams on s390 fails with "incorrect data check"
error.
Userspace zlib checks inflate_state.flags in order to byteswap checksums
only for zlib streams, and s390 hardware inflate code, which was ported
from there, tries to match this behavior. At the same time, kernel zlib
does not use inflate_state.flags, so it contains essentially random
values. For many use cases either zlib stream is zeroed out or checksum
is not used, so this problem is masked, but at least SquashFS is still
affected.
Fix by always passing a checksum to and from the hardware as is, which
matches zlib_inflate()'s expectations.
Link: https://lkml.kernel.org/r/20201215155551.894884-1-iii@linux.ibm.com
Fixes: 1261961000 ("lib/zlib: add s390 hardware support for kernel zlib_inflate")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Cc: <stable@vger.kernel.org> [5.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some graphic card has very big memory on chip, such as 32G bytes.
In the following case, it will cause overflow:
pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE);
ret = gen_pool_add(pool, 0x1000000, SZ_32G, NUMA_NO_NODE);
va = gen_pool_alloc(pool, SZ_4G);
The overflow occurs in gen_pool_alloc_algo_owner():
....
size = nbits << order;
....
The @nbits is "int" type, so it will overflow.
Then the gen_pool_avail() will return the wrong value.
This patch converts some "int" to "unsigned long", and
changes the compare code in while.
Link: https://lkml.kernel.org/r/20201229060657.3389-1-sjhuang@iluvatar.ai
Signed-off-by: Huang Shijie <sjhuang@iluvatar.ai>
Reported-by: Shi Jiasheng <jiasheng.shi@iluvatar.ai>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add these macros, since we can use them in drivers.
Link: https://lkml.kernel.org/r/20201229072819.11183-1-sjhuang@iluvatar.ai
Signed-off-by: Huang Shijie <sjhuang@iluvatar.ai>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make <asm-generic/local64.h> mandatory in include/asm-generic/Kbuild and
remove all arch/*/include/asm/local64.h arch-specific files since they
only #include <asm-generic/local64.h>.
This fixes build errors on arch/c6x/ and arch/nios2/ for
block/blk-iocost.c.
Build-tested on 21 of 25 arch-es. (tools problems on the others)
Yes, we could even rename <asm-generic/local64.h> to
<linux/local64.h> and change all #includes to use
<linux/local64.h> instead.
Link: https://lkml.kernel.org/r/20201227024446.17018-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <jacquiot.aurelien@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I'm not sure if I'm completely missing something here, but AFAIKS the
reference to the mysterious "COW SMC race" confuses the issue. The
original changelog and mailing list thread didn't help me either.
This SMC race is where the problem was detected, but isn't the general
problem bigger and more obvious: that the new PTE could be picked up at
any time by any TLB while entries for the old PTE exist in other TLBs
before the TLB flush takes effect?
The case where the iTLB and dTLB of a CPU are pointing at different pages
is an interesting one but follows from the general problem.
The other (minor) thing with the comment I think it makes it a bit clearer
to say what the old code was doing (i.e., it avoids the race as opposed to
what?).
References: 4ce072f1fa ("mm: fix a race condition under SMC + COW")
Link: https://lkml.kernel.org/r/20201215121119.351650-1-npiggin@gmail.com
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
VMware observed a performance regression during memmap init on their
platform, and bisected to commit 73a6e474cb ("mm: memmap_init:
iterate over memblock regions rather that check each PFN") causing it.
Before the commit:
[0.033176] Normal zone: 1445888 pages used for memmap
[0.033176] Normal zone: 89391104 pages, LIFO batch:63
[0.035851] ACPI: PM-Timer IO Port: 0x448
With commit
[0.026874] Normal zone: 1445888 pages used for memmap
[0.026875] Normal zone: 89391104 pages, LIFO batch:63
[2.028450] ACPI: PM-Timer IO Port: 0x448
The root cause is the current memmap defer init doesn't work as expected.
Before, memmap_init_zone() was used to do memmap init of one whole zone,
to initialize all low zones of one numa node, but defer memmap init of
the last zone in that numa node. However, since commit 73a6e474cb,
function memmap_init() is adapted to iterater over memblock regions
inside one zone, then call memmap_init_zone() to do memmap init for each
region.
E.g, on VMware's system, the memory layout is as below, there are two
memory regions in node 2. The current code will mistakenly initialize the
whole 1st region [mem 0xab00000000-0xfcffffffff], then do memmap defer to
iniatialize only one memmory section on the 2nd region [mem
0x10000000000-0x1033fffffff]. In fact, we only expect to see that there's
only one memory section's memmap initialized. That's why more time is
costed at the time.
[ 0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[ 0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
[ 0.008843] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x55ffffffff]
[ 0.008844] ACPI: SRAT: Node 1 PXM 1 [mem 0x5600000000-0xaaffffffff]
[ 0.008844] ACPI: SRAT: Node 2 PXM 2 [mem 0xab00000000-0xfcffffffff]
[ 0.008845] ACPI: SRAT: Node 2 PXM 2 [mem 0x10000000000-0x1033fffffff]
Now, let's add a parameter 'zone_end_pfn' to memmap_init_zone() to pass
down the real zone end pfn so that defer_init() can use it to judge
whether defer need be taken in zone wide.
Link: https://lkml.kernel.org/r/20201223080811.16211-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20201223080811.16211-2-bhe@redhat.com
Fixes: commit 73a6e474cb ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Baoquan He <bhe@redhat.com>
Reported-by: Rahul Gopakumar <gopakumarr@vmware.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>