Commit Graph

949346 Commits

Author SHA1 Message Date
Ahmed S. Darwish
58faf20a08 time/sched_clock: Use raw_read_seqcount_latch() during suspend
sched_clock uses seqcount_t latching to switch between two storage
places protected by the sequence counter. This allows it to have
interruptible, NMI-safe, seqcount_t write side critical sections.

Since 7fc26327b7 ("seqlock: Introduce raw_read_seqcount_latch()"),
raw_read_seqcount_latch() became the standardized way for seqcount_t
latch read paths. Due to the dependent load, it has one read memory
barrier less than the currently used raw_read_seqcount() API.

Use raw_read_seqcount_latch() for the suspend path.

Commit aadd6e5caa ("time/sched_clock: Use raw_read_seqcount_latch()")
missed changing that instance of raw_read_seqcount().

References: 1809bfa44e ("timers, sched/clock: Avoid deadlock during read from NMI")
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200715092345.GA231464@debian-buster-darwi.lab.linutronix.de
2020-09-10 11:19:28 +02:00
Boqun Feng
96a16f45ae lockdep/selftest: Introduce recursion3
Add a test case shows that USED_IN_*_READ and ENABLE_*_READ can cause
deadlock too.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-20-boqun.feng@gmail.com
2020-08-26 12:42:08 +02:00
Boqun Feng
ad56450db8 locking/selftest: Add test cases for queued_read_lock()
Add two self test cases for the following case:

	P0:			P1:			P2:

				<in irq handler>
	spin_lock_irq(&slock)	read_lock(&rwlock)
							write_lock_irq(&rwlock)
	read_lock(&rwlock)	spin_lock(&slock)

, which is a deadlock, as the read_lock() on P0 cannot get the lock
because of the fairness.

	P0:			P1:			P2:

	<in irq handler>
	spin_lock(&slock)	read_lock(&rwlock)
							write_lock(&rwlock)
	read_lock(&rwlock)	spin_lock_irq(&slock)

, which is not a deadlock, as the read_lock() on P0 can get the lock
because it could use the unfair fastpass.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-19-boqun.feng@gmail.com
2020-08-26 12:42:07 +02:00
Boqun Feng
108dc42ed3 Revert "locking/lockdep/selftests: Fix mixed read-write ABBA tests"
This reverts commit d82fed7529.

Since we now could handle mixed read-write deadlock detection well, the
self tests could be detected as expected, no need to use this
work-around.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-18-boqun.feng@gmail.com
2020-08-26 12:42:07 +02:00
Boqun Feng
8ef7ca7512 lockdep/selftest: Add more recursive read related test cases
Add those four test cases:

1.	X --(ER)--> Y --(ER)--> Z --(ER)--> X is deadlock.

2.	X --(EN)--> Y --(SR)--> Z --(ER)--> X is deadlock.

3.	X --(EN)--> Y --(SR)--> Z --(SN)--> X is not deadlock.

4.	X --(ER)--> Y --(SR)--> Z --(EN)--> X is not deadlock.

Those self testcases are valuable for the development of supporting
recursive read related deadlock detection.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-17-boqun.feng@gmail.com
2020-08-26 12:42:07 +02:00
Boqun Feng
31e0d74770 lockdep/selftest: Unleash irq_read_recursion2 and add more
Now since we can handle recursive read related irq inversion deadlocks
correctly, uncomment the irq_read_recursion2 and add more testcases.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-16-boqun.feng@gmail.com
2020-08-26 12:42:06 +02:00
Boqun Feng
f611e8cf98 lockdep: Take read/write status in consideration when generate chainkey
Currently, the chainkey of a lock chain is a hash sum of the class_idx
of all the held locks, the read/write status are not taken in to
consideration while generating the chainkey. This could result into a
problem, if we have:

	P1()
	{
		read_lock(B);
		lock(A);
	}

	P2()
	{
		lock(A);
		read_lock(B);
	}

	P3()
	{
		lock(A);
		write_lock(B);
	}

, and P1(), P2(), P3() run one by one. And when running P2(), lockdep
detects such a lock chain A -> B is not a deadlock, then it's added in
the chain cache, and then when running P3(), even if it's a deadlock, we
could miss it because of the hit of chain cache. This could be confirmed
by self testcase "chain cached mixed R-L/L-W ".

To resolve this, we use concept "hlock_id" to generate the chainkey, the
hlock_id is a tuple (hlock->class_idx, hlock->read), which fits in a u16
type. With this, the chainkeys are different is the lock sequences have
the same locks but different read/write status.

Besides, since we use "hlock_id" to generate chainkeys, the chain_hlocks
array now store the "hlock_id"s rather than lock_class indexes.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-15-boqun.feng@gmail.com
2020-08-26 12:42:06 +02:00
Boqun Feng
d4f200e579 lockdep/selftest: Add a R-L/L-W test case specific to chain cache behavior
As our chain cache doesn't differ read/write locks, so even we can
detect a read-lock/lock-write deadlock in check_noncircular(), we can
still be fooled if a read-lock/lock-read case(which is not a deadlock)
comes first.

So introduce this test case to test specific to the chain cache behavior
on detecting recursive read lock related deadlocks.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-14-boqun.feng@gmail.com
2020-08-26 12:42:06 +02:00
Boqun Feng
621c9dac0e lockdep: Add recursive read locks into dependency graph
Since we have all the fundamental to handle recursive read locks, we now
add them into the dependency graph.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-13-boqun.feng@gmail.com
2020-08-26 12:42:06 +02:00
Boqun Feng
f08e388857 lockdep: Fix recursive read lock related safe->unsafe detection
Currently, in safe->unsafe detection, lockdep misses the fact that a
LOCK_ENABLED_IRQ_*_READ usage and a LOCK_USED_IN_IRQ_*_READ usage may
cause deadlock too, for example:

	P1                          P2
	<irq disabled>
	write_lock(l1);             <irq enabled>
				    read_lock(l2);
	write_lock(l2);
				    <in irq>
				    read_lock(l1);

Actually, all of the following cases may cause deadlocks:

	LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*
	LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*
	LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*_READ
	LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*_READ

To fix this, we need to 1) change the calculation of exclusive_mask() so
that READ bits are not dropped and 2) always call usage() in
mark_lock_irq() to check usage deadlocks, even when the new usage of the
lock is READ.

Besides, adjust usage_match() and usage_acculumate() to recursive read
lock changes.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-12-boqun.feng@gmail.com
2020-08-26 12:42:05 +02:00
Boqun Feng
68e3056785 lockdep: Adjust check_redundant() for recursive read change
check_redundant() will report redundancy if it finds a path could
replace the about-to-add dependency in the BFS search. With recursive
read lock changes, we certainly need to change the match function for
the check_redundant(), because the path needs to match not only the lock
class but also the dependency kinds. For example, if the about-to-add
dependency @prev -> @next is A -(SN)-> B, and we find a path A -(S*)->
.. -(*R)->B in the dependency graph with __bfs() (for simplicity, we can
also say we find an -(SR)-> path from A to B), we can not replace the
dependency with that path in the BFS search. Because the -(SN)->
dependency can make a strong path with a following -(S*)-> dependency,
however an -(SR)-> path cannot.

Further, we can replace an -(SN)-> dependency with a -(EN)-> path, that
means if we find a path which is stronger than or equal to the
about-to-add dependency, we can report the redundancy. By "stronger", it
means both the start and the end of the path are not weaker than the
start and the end of the dependency (E is "stronger" than S and N is
"stronger" than R), so that we can replace the dependency with that
path.

To make sure we find a path whose start point is not weaker than the
about-to-add dependency, we use a trick: the ->only_xr of the root
(start point) of __bfs() is initialized as @prev-> == 0, therefore if
@prev is E, __bfs() will pick only -(E*)-> for the first dependency,
otherwise, __bfs() can pick -(E*)-> or -(S*)-> for the first dependency.

To make sure we find a path whose end point is not weaker than the
about-to-add dependency, we replace the match function for __bfs()
check_redundant(), we check for the case that either @next is R
(anything is not weaker than it) or the end point of the path is N
(which is not weaker than anything).

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-11-boqun.feng@gmail.com
2020-08-26 12:42:05 +02:00
Boqun Feng
9de0c9bbce lockdep: Support deadlock detection for recursive read locks in check_noncircular()
Currently, lockdep only has limit support for deadlock detection for
recursive read locks.

This patch support deadlock detection for recursive read locks. The
basic idea is:

We are about to add dependency B -> A in to the dependency graph, we use
check_noncircular() to find whether we have a strong dependency path
A -> .. -> B so that we have a strong dependency circle (a closed strong
dependency path):

	 A -> .. -> B -> A

, which doesn't have two adjacent dependencies as -(*R)-> L -(S*)->.

Since A -> .. -> B is already a strong dependency path, so if either
B -> A is -(E*)-> or A -> .. -> B is -(*N)->, the circle A -> .. -> B ->
A is strong, otherwise not. So we introduce a new match function
hlock_conflict() to replace the class_equal() for the deadlock check in
check_noncircular().

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-10-boqun.feng@gmail.com
2020-08-26 12:42:05 +02:00
Boqun Feng
61775ed243 lockdep: Make __bfs(.match) return bool
The "match" parameter of __bfs() is used for checking whether we hit a
match in the search, therefore it should return a boolean value rather
than an integer for better readability.

This patch then changes the return type of the function parameter and the
match functions to bool.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-9-boqun.feng@gmail.com
2020-08-26 12:42:05 +02:00
Boqun Feng
6971c0f345 lockdep: Extend __bfs() to work with multiple types of dependencies
Now we have four types of dependencies in the dependency graph, and not
all the pathes carry real dependencies (the dependencies that may cause
a deadlock), for example:

	Given lock A and B, if we have:

	CPU1			CPU2
	=============		==============
	write_lock(A);		read_lock(B);
	read_lock(B);		write_lock(A);

	(assuming read_lock(B) is a recursive reader)

	then we have dependencies A -(ER)-> B, and B -(SN)-> A, and a
	dependency path A -(ER)-> B -(SN)-> A.

	In lockdep w/o recursive locks, a dependency path from A to A
	means a deadlock. However, the above case is obviously not a
	deadlock, because no one holds B exclusively, therefore no one
	waits for the other to release B, so who get A first in CPU1 and
	CPU2 will run non-blockingly.

	As a result, dependency path A -(ER)-> B -(SN)-> A is not a
	real/strong dependency that could cause a deadlock.

From the observation above, we know that for a dependency path to be
real/strong, no two adjacent dependencies can be as -(*R)-> -(S*)->.

Now our mission is to make __bfs() traverse only the strong dependency
paths, which is simple: we record whether we only have -(*R)-> for the
previous lock_list of the path in lock_list::only_xr, and when we pick a
dependency in the traverse, we 1) filter out -(S*)-> dependency if the
previous lock_list only has -(*R)-> dependency (i.e. ->only_xr is true)
and 2) set the next lock_list::only_xr to true if we only have -(*R)->
left after we filter out dependencies based on 1), otherwise, set it to
false.

With this extension for __bfs(), we now need to initialize the root of
__bfs() properly (with a correct ->only_xr), to do so, we introduce some
helper functions, which also cleans up a little bit for the __bfs() root
initialization code.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-8-boqun.feng@gmail.com
2020-08-26 12:42:04 +02:00
Boqun Feng
3454a36d6a lockdep: Introduce lock_list::dep
To add recursive read locks into the dependency graph, we need to store
the types of dependencies for the BFS later. There are four types of
dependencies:

*	Exclusive -> Non-recursive dependencies: EN
	e.g. write_lock(prev) held and try to acquire write_lock(next)
	or non-recursive read_lock(next), which can be represented as
	"prev -(EN)-> next"

*	Shared -> Non-recursive dependencies: SN
	e.g. read_lock(prev) held and try to acquire write_lock(next) or
	non-recursive read_lock(next), which can be represented as
	"prev -(SN)-> next"

*	Exclusive -> Recursive dependencies: ER
	e.g. write_lock(prev) held and try to acquire recursive
	read_lock(next), which can be represented as "prev -(ER)-> next"

*	Shared -> Recursive dependencies: SR
	e.g. read_lock(prev) held and try to acquire recursive
	read_lock(next), which can be represented as "prev -(SR)-> next"

So we use 4 bits for the presence of each type in lock_list::dep. Helper
functions and macros are also introduced to convert a pair of locks into
lock_list::dep bit and maintain the addition of different types of
dependencies.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-7-boqun.feng@gmail.com
2020-08-26 12:42:04 +02:00
Boqun Feng
bd76eca10d lockdep: Reduce the size of lock_list::distance
lock_list::distance is always not greater than MAX_LOCK_DEPTH (which
is 48 right now), so a u16 will fit. This patch reduces the size of
lock_list::distance to save space, so that we can introduce other fields
to help detect recursive read lock deadlocks without increasing the size
of lock_list structure.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-6-boqun.feng@gmail.com
2020-08-26 12:42:04 +02:00
Boqun Feng
d563bc6ead lockdep: Make __bfs() visit every dependency until a match
Currently, __bfs() will do a breadth-first search in the dependency
graph and visit each lock class in the graph exactly once, so for
example, in the following graph:

	A ---------> B
	|            ^
	|            |
	+----------> C

a __bfs() call starts at A, will visit B through dependency A -> B and
visit C through dependency A -> C and that's it, IOW, __bfs() will not
visit dependency C -> B.

This is OK for now, as we only have strong dependencies in the
dependency graph, so whenever there is a traverse path from A to B in
__bfs(), it means A has strong dependencies to B (IOW, B depends on A
strongly). So no need to visit all dependencies in the graph.

However, as we are going to add recursive-read lock into the dependency
graph, as a result, not all the paths mean strong dependencies, in the
same example above, dependency A -> B may be a weak dependency and
traverse A -> C -> B may be a strong dependency path. And with the old
way of __bfs() (i.e. visiting every lock class exactly once), we will
miss the strong dependency path, which will result into failing to find
a deadlock. To cure this for the future, we need to find a way for
__bfs() to visit each dependency, rather than each class, exactly once
in the search until we find a match.

The solution is simple:

We used to mark lock_class::lockdep_dependency_gen_id to indicate a
class has been visited in __bfs(), now we change the semantics a little
bit: we now mark lock_class::lockdep_dependency_gen_id to indicate _all
the dependencies_ in its lock_{after,before} have been visited in the
__bfs() (note we only take one direction in a __bfs() search). In this
way, every dependency is guaranteed to be visited until we find a match.

Note: the checks in mark_lock_accessed() and lock_accessed() are
removed, because after this modification, we may call these two
functions on @source_entry of __bfs(), which may not be the entry in
"list_entries"

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-5-boqun.feng@gmail.com
2020-08-26 12:42:03 +02:00
Boqun Feng
b11be024de lockdep: Demagic the return value of BFS
__bfs() could return four magic numbers:

	1: search succeeds, but none match.
	0: search succeeds, find one match.
	-1: search fails because of the cq is full.
	-2: search fails because a invalid node is found.

This patch cleans things up by using a enum type for the return value
of __bfs() and its friends, this improves the code readability of the
code, and further, could help if we want to extend the BFS.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-4-boqun.feng@gmail.com
2020-08-26 12:42:03 +02:00
Boqun Feng
224ec489d3 lockdep/Documention: Recursive read lock detection reasoning
This patch add the documentation piece for the reasoning of deadlock
detection related to recursive read lock. The following sections are
added:

*	Explain what is a recursive read lock, and what deadlock cases
	they could introduce.

*	Introduce the notations for different types of dependencies, and
	the definition of strong paths.

*	Proof for a closed strong path is both sufficient and necessary
	for deadlock detections with recursive read locks involved. The
	proof could also explain why we call the path "strong"

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-3-boqun.feng@gmail.com
2020-08-26 12:42:03 +02:00
Boqun Feng
e918188611 locking: More accurate annotations for read_lock()
On the archs using QUEUED_RWLOCKS, read_lock() is not always a recursive
read lock, actually it's only recursive if in_interrupt() is true. So
change the annotation accordingly to catch more deadlocks.

Note we used to treat read_lock() as pure recursive read locks in
lib/locking-seftest.c, and this is useful, especially for the lockdep
development selftest, so we keep this via a variable to force switching
lock annotation for read_lock().

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200807074238.1632519-2-boqun.feng@gmail.com
2020-08-26 12:42:02 +02:00
Marta Rybczynska
92b4e9f11a Documentation/locking/locktypes: Fix local_locks documentation
Fix issues with local_locks documentation:

 - fix function names, local_lock.h has local_unlock_irqrestore(), not
   local_lock_irqrestore()

 - fix mapping table, local_unlock_irqrestore() maps to
   local_irq_restore(), not _save()

Signed-off-by: Marta Rybczynska <rybczynska@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/CAApg2=SKxQ3Sqwj6TZnV-0x0cKLXFKDaPvXT4N15MPDMKq724g@mail.gmail.com
2020-08-26 12:42:02 +02:00
Randy Dunlap
a28e884b96 seqlock: Fix multiple kernel-doc warnings
Fix kernel-doc warnings in <linux/seqlock.h>.

../include/linux/seqlock.h:152: warning: Incorrect use of kernel-doc format:  * seqcount_LOCKNAME_init() - runtime initializer for seqcount_LOCKNAME_t
../include/linux/seqlock.h:164: warning: Incorrect use of kernel-doc format:  * SEQCOUNT_LOCKTYPE() - Instantiate seqcount_LOCKNAME_t and helpers
../include/linux/seqlock.h:229: warning: Function parameter or member 'seq_name' not described in 'SEQCOUNT_LOCKTYPE_ZERO'
../include/linux/seqlock.h:229: warning: Function parameter or member 'assoc_lock' not described in 'SEQCOUNT_LOCKTYPE_ZERO'
../include/linux/seqlock.h:229: warning: Excess function parameter 'name' description in 'SEQCOUNT_LOCKTYPE_ZERO'
../include/linux/seqlock.h:229: warning: Excess function parameter 'lock' description in 'SEQCOUNT_LOCKTYPE_ZERO'
../include/linux/seqlock.h:695: warning: duplicate section name 'NOTE'

Demote kernel-doc notation for the macros "seqcount_LOCKNAME_init()" and
"SEQCOUNT_LOCKTYPE()"; scripts/kernel-doc does not handle them correctly.

Rename function parameters in SEQCNT_LOCKNAME_ZERO() documentation
to match the macro's argument names. Change the macro name in the
documentation to SEQCOUNT_LOCKTYPE_ZERO() to match the macro's name.

For raw_write_seqcount_latch(), rename the second NOTE: to NOTE2:
to prevent a kernel-doc warning. However, the generated output is not
quite as nice as it could be for this.

Fix a typo: s/LOCKTYPR/LOCKTYPE/

Fixes: 0efc94c5d1 ("seqcount: Compress SEQCNT_LOCKNAME_ZERO()")
Fixes: e4e9ab3f9f ("seqlock: Fold seqcount_LOCKNAME_init() definition")
Fixes: a8772dccb2 ("seqlock: Fold seqcount_LOCKNAME_t definition")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200817000200.20993-1-rdunlap@infradead.org
2020-08-26 12:42:02 +02:00
Peter Zijlstra
a435b9a143 locking/refcount: Provide __refcount API to obtain the old value
David requested means to obtain the old/previous value from the
refcount API for tracing purposes.

Duplicate (most of) the API as __refcount*() with an additional
'int *' argument into which, if !NULL, the old value will be stored.

Requested-by: David Howells <dhowells@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20200729111120.GA2638@hirez.programming.kicks-ass.net
2020-08-26 12:42:02 +02:00
Peter Zijlstra
6eb6d05958 seqlock,tags: Add support for SEQCOUNT_LOCKTYPE()
Such that we might easily find seqcount_LOCKTYPE_t and
seqcount_LOCKTYPE_init().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200729161938.GB2678@hirez.programming.kicks-ass.net
2020-08-26 12:42:01 +02:00
Peter Zijlstra
eb1f00237a lockdep,trace: Expose tracepoints
The lockdep tracepoints are under the lockdep recursion counter, this
has a bunch of nasty side effects:

 - TRACE_IRQFLAGS doesn't work across the entire tracepoint

 - RCU-lockdep doesn't see the tracepoints either, hiding numerous
   "suspicious RCU usage" warnings.

Pull the trace_lock_*() tracepoints completely out from under the
lockdep recursion handling and completely rely on the trace level
recusion handling -- also, tracing *SHOULD* not be taking locks in any
case.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200821085348.782688941@infradead.org
2020-08-26 12:41:56 +02:00
Nicholas Piggin
044d0d6de9 lockdep: Only trace IRQ edges
Problem:

  raw_local_irq_save(); // software state on
  local_irq_save(); // software state off
  ...
  local_irq_restore(); // software state still off, because we don't enable IRQs
  raw_local_irq_restore(); // software state still off, *whoopsie*

existing instances:

 - lock_acquire()
     raw_local_irq_save()
     __lock_acquire()
       arch_spin_lock(&graph_lock)
         pv_wait() := kvm_wait() (same or worse for Xen/HyperV)
           local_irq_save()

 - trace_clock_global()
     raw_local_irq_save()
     arch_spin_lock()
       pv_wait() := kvm_wait()
	 local_irq_save()

 - apic_retrigger_irq()
     raw_local_irq_save()
     apic->send_IPI() := default_send_IPI_single_phys()
       local_irq_save()

Possible solutions:

 A) make it work by enabling the tracing inside raw_*()
 B) make it work by keeping tracing disabled inside raw_*()
 C) call it broken and clean it up now

Now, given that the only reason to use the raw_* variant is because you don't
want tracing. Therefore A) seems like a weird option (although it can be done).
C) is tempting, but OTOH it ends up converting a _lot_ of code to raw just
because there is one raw user, this strips the validation/tracing off for all
the other users.

So we pick B) and declare any code that ends up doing:

	raw_local_irq_save()
	local_irq_save()
	lockdep_assert_irqs_disabled();

broken. AFAICT this problem has existed forever, the only reason it came
up is because commit: 859d069ee1 ("lockdep: Prepare for NMI IRQ
state tracking") changed IRQ tracing vs lockdep recursion and the
first instance is fairly common, the other cases hardly ever happen.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[rewrote changelog]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200723105615.1268126-1-npiggin@gmail.com
2020-08-26 12:41:56 +02:00
Peter Zijlstra
99dc56feb7 mips: Implement arch_irqs_disabled()
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Paul Burton <paulburton@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200826101653.GE1362448@hirez.programming.kicks-ass.net
2020-08-26 12:41:55 +02:00
Peter Zijlstra
021c109330 arm64: Implement arch_irqs_disabled()
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20200821085348.664425120@infradead.org
2020-08-26 12:41:55 +02:00
Peter Zijlstra
36206b588b nds32: Implement arch_irqs_disabled()
Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200821085348.604899379@infradead.org
2020-08-26 12:41:55 +02:00
Peter Zijlstra
00b0ed2d49 locking/lockdep: Cleanup
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200821085348.546087214@infradead.org
2020-08-26 12:41:54 +02:00
Peter Zijlstra
7da93f3793 x86/entry: Remove unused THUNKs
Unused remnants

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200821085348.487040689@infradead.org
2020-08-26 12:41:54 +02:00
Peter Zijlstra
9864f5b594 cpuidle: Move trace_cpu_idle() into generic code
Remove trace_cpu_idle() from the arch_cpu_idle() implementations and
put it in the generic code, right before disabling RCU. Gets rid of
more trace_*_rcuidle() users.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200821085348.428433395@infradead.org
2020-08-26 12:41:54 +02:00
Peter Zijlstra
bf9282dc26 cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic
This allows moving the leave_mm() call into generic code before
rcu_idle_enter(). Gets rid of more trace_*_rcuidle() users.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200821085348.369441600@infradead.org
2020-08-26 12:41:53 +02:00
Peter Zijlstra
1098582a0f sched,idle,rcu: Push rcu_idle deeper into the idle path
Lots of things take locks, due to a wee bug, rcu_lockdep didn't notice
that the locking tracepoints were using RCU.

Push rcu_idle_{enter,exit}() as deep as possible into the idle paths,
this also resolves a lot of _rcuidle()/RCU_NONIDLE() usage.

Specifically, sched_clock_idle_wakeup_event() will use ktime which
will use seqlocks which will tickle lockdep, and
stop_critical_timings() uses lock.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200821085348.310943801@infradead.org
2020-08-26 12:41:53 +02:00
Peter Zijlstra
49d9c59363 cpuidle: Fixup IRQ state
Match the pattern elsewhere in this file.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200821085348.251340558@infradead.org
2020-08-26 12:41:53 +02:00
Peter Zijlstra
fddf9055a6 lockdep: Use raw_cpu_*() for per-cpu variables
Sven reported that commit a21ee6055c ("lockdep: Change
hardirq{s_enabled,_context} to per-cpu variables") caused trouble on
s390 because their this_cpu_*() primitives disable preemption which
then lands back tracing.

On the one hand, per-cpu ops should use preempt_*able_notrace() and
raw_local_irq_*(), on the other hand, we can trivialy use raw_cpu_*()
ops for this.

Fixes: a21ee6055c ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables")
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200821085348.192346882@infradead.org
2020-08-26 12:41:53 +02:00
Linus Torvalds
d012a7190f Linux 5.9-rc2 2020-08-23 14:08:43 -07:00
Linus Torvalds
cb95712138 powerpc fixes for 5.9 #3
Add perf support for emitting extended registers for power10.
 
 A fix for CPU hotplug on pseries, where on large/loaded systems we may not wait
 long enough for the CPU to be offlined, leading to crashes.
 
 Addition of a raw cputable entry for Power10, which is not required to boot, but
 is required to make our PMU setup work correctly in guests.
 
 Three fixes for the recent changes on 32-bit Book3S to move modules into their
 own segment for strict RWX.
 
 A fix for a recent change in our powernv PCI code that could lead to crashes.
 
 A change to our perf interrupt accounting to avoid soft lockups when using some
 events, found by syzkaller.
 
 A change in the way we handle power loss events from the hypervisor on pseries.
 We no longer immediately shut down if we're told we're running on a UPS.
 
 A few other minor fixes.
 
 Thanks to:
   Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju T Sudhakar,
   Athira Rajeev, Christophe Leroy, Frederic Barrat, Greg Kurz, Kajol Jain,
   Madhavan Srinivasan, Michael Neuling, Michael Roth, Nageswara R Sastry, Oliver
   O'Halloran, Thiago Jung Bauermann, Vaidyanathan Srinivasan, Vasant Hegde.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl9CYMwTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgC/wEACljEVnfHzUObmIgqn9Ru3JlfEI6Hlk
 ts7kajCgS/I/bV6DoDMZ8rlZX87QFOwiBkNM1I+vGHSLAuzsmFAnbFPyxw/idxpQ
 XUoNy8OCvbbzCPzChYdiU0PxW2h2i+QxkmktlWSN1SAPudJUWvoPS2Y4+sC4zksk
 B4B6tbW2DT8TFO1kKeZsU9r2t+EH5KwlIOi+uxbH8d76lJINKkBNSnjzMytl7drM
 TZx/HWr8+s/WJo1787x6bv8gxs5tV9b4vIKt2YZNTY2kvYsEDE+fBR1XfCAneXMw
 ASYnZV+/xCLIUpRF6DI4RAShLBT/Sfiy1yMTndZgfqAgquokFosszNx2zrk0IzCd
 AgqX93YGbGz/H72W3Y/B0W9+74XyO/u2D9zhNpkCRMpdcsM5MbvOQrQA5Ustu47E
 av5MOaF/nNCd8J+OC4Qjgt5VFb/s0h4FdtrwT80srOa2U6Of9cD/T6xAfOszSJ96
 cWdSb5qhn5wuD9pP32KjwdmWBiUw38/gnRGKpRlOVzyHL/GKZijyaBbWBlkoEmty
 0nbjWW/IVfsOb5Weuiybg541h/QOVuOkb2pOvPClITiH83MY/AciDJ+auo4M//hW
 haKz9IgV/KctmzDE+v9d0BD8sGmW03YUcQAPdRufI0eGXijDLcnHeuk2B3Nu84Pq
 8mtev+VQ+T6cZA==
 =sdJ1
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Add perf support for emitting extended registers for power10.

 - A fix for CPU hotplug on pseries, where on large/loaded systems we
   may not wait long enough for the CPU to be offlined, leading to
   crashes.

 - Addition of a raw cputable entry for Power10, which is not required
   to boot, but is required to make our PMU setup work correctly in
   guests.

 - Three fixes for the recent changes on 32-bit Book3S to move modules
   into their own segment for strict RWX.

 - A fix for a recent change in our powernv PCI code that could lead to
   crashes.

 - A change to our perf interrupt accounting to avoid soft lockups when
   using some events, found by syzkaller.

 - A change in the way we handle power loss events from the hypervisor
   on pseries. We no longer immediately shut down if we're told we're
   running on a UPS.

 - A few other minor fixes.

Thanks to Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju T
Sudhakar, Athira Rajeev, Christophe Leroy, Frederic Barrat, Greg Kurz,
Kajol Jain, Madhavan Srinivasan, Michael Neuling, Michael Roth,
Nageswara R Sastry, Oliver O'Halloran, Thiago Jung Bauermann,
Vaidyanathan Srinivasan, Vasant Hegde.

* tag 'powerpc-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/perf/hv-24x7: Move cpumask file to top folder of hv-24x7 driver
  powerpc/32s: Fix module loading failure when VMALLOC_END is over 0xf0000000
  powerpc/pseries: Do not initiate shutdown when system is running on UPS
  powerpc/perf: Fix soft lockups due to missed interrupt accounting
  powerpc/powernv/pci: Fix possible crash when releasing DMA resources
  powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death
  powerpc/32s: Fix is_module_segment() when MODULES_VADDR is defined
  powerpc/kasan: Fix KASAN_SHADOW_START on BOOK3S_32
  powerpc/fixmap: Fix the size of the early debug area
  powerpc/pkeys: Fix build error with PPC_MEM_KEYS disabled
  powerpc/kernel: Cleanup machine check function declarations
  powerpc: Add POWER10 raw mode cputable entry
  powerpc/perf: Add extended regs support for power10 platform
  powerpc/perf: Add support for outputting extended regs in perf intr_regs
  powerpc: Fix P10 PVR revision in /proc/cpuinfo for SMT4 cores
2020-08-23 11:37:23 -07:00
Linus Torvalds
550c2129d9 A single fix for x86 which removes the RDPID usage from the paranoid entry
path and unconditionally uses LSL to retrieve the CPU number. RDPID depends
 on MSR_TSX_AUX.  KVM has an optmization to avoid expensive MRS read/writes
 on VMENTER/EXIT. It caches the MSR values and restores them either when
 leaving the run loop, on preemption or when going out to user
 space. MSR_TSX_AUX is part of that lazy MSR set, so after writing the guest
 value and before the lazy restore any exception using the paranoid entry
 will read the guest value and use it as CPU number to retrieve the GSBASE
 value for the current CPU when FSGSBASE is enabled. As RDPID is only used
 in that particular entry path, there is no reason to burden VMENTER/EXIT
 with two extra MSR writes. Remove the RDPID optimization, which is not even
 backed by numbers from the paranoid entry path instead.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl9CJqgTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoaAvD/sHkSx1V0hupUh9budLhSPhUyWvXiar
 AmAvWl8dKEFG2UOhOT57zD6DgFN3uu5rqjxTG7ha9XEspsVaP5jOji4evus37IAe
 z2EB9J5c3ih4VSdaM+8ODTMls5rjQrvJjeDV0ETTQ9Xb+FOT0vNAub6D5PVms6J3
 FObDKiIpdB3s5INAWPknIYves0EJP5BP3+gOktovMStfcy8tm8N9S/yA4cNL4nbi
 IZ0h1H6xcGrQ79dv+2/vC0cdqlbm6y2KWfNKpSTGwPMdRl0PpHrovsOVPKF+6pio
 Ad230t7xnWMlrHghbSvDyyJ67/N6AA6CaqHECWtgsDuzbqcD2MoQ2l97atoZInh7
 83n8ZWFaw10T4ksw9SWqAex+ZJh6/rD4vcQYUncPN66/kOVM186ezICc+QsPV99s
 ukw29xge4uHz91Hy0Bo8SP+w1bvntKJn6XyJuTFgDt8bmFRIeajSxyOGw7hTs+ZD
 TONw9dMeteWZhZRIXYDjlYc83xFYGkX6hmxLrDJ4jg8UGojaca83s7oZtxgZxWzu
 L9wfCRJIEA33ihvqtbTEOHbJvl6eyDt8b/kBGGHbAbqweQ3mWEH3WDQ5cyhlLrfA
 tnDToX1DvsxfVg94saprNt249qHNZlFIj8EaGfjxxEngd8xAgfU0vxJpI8sBtdsw
 SMIwyLLbLz573A==
 =AEel
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Thomas Gleixner:
 "A single fix for x86 which removes the RDPID usage from the paranoid
  entry path and unconditionally uses LSL to retrieve the CPU number.

  RDPID depends on MSR_TSX_AUX. KVM has an optmization to avoid
  expensive MRS read/writes on VMENTER/EXIT. It caches the MSR values
  and restores them either when leaving the run loop, on preemption or
  when going out to user space. MSR_TSX_AUX is part of that lazy MSR
  set, so after writing the guest value and before the lazy restore any
  exception using the paranoid entry will read the guest value and use
  it as CPU number to retrieve the GSBASE value for the current CPU when
  FSGSBASE is enabled. As RDPID is only used in that particular entry
  path, there is no reason to burden VMENTER/EXIT with two extra MSR
  writes. Remove the RDPID optimization, which is not even backed by
  numbers from the paranoid entry path instead"

* tag 'x86-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM
2020-08-23 11:21:16 -07:00
Linus Torvalds
cea05c192b A single update for perf on x86 which ass support for the
broken down bandwith counters.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl9CJSkTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoUmoD/9xdmi6Pi8cqsN61zoqzDMygyd45rYH
 x9oe1QOYHPBp7fIEv0EeG/HcZpBG23Oyj7XnAGZlUuSzHzRYSK3ceQHqo0VEP7qv
 ZrPzBJCI+wx/zVBFN62IbSZmZF3omBjo+TKT1bTKmDRov+qovAEcD040Vt45SkW2
 NKiD/u2iJemLrF9g7KKgNCv+alqGAeH6YzZpWgFycP550psIF3WgIY2eMjSY9+XD
 i7DrEg/OWY5vvCYbHBxrbhvqkXlhyMzw0jqWJtRHTIPRjMuDCM9yef5KiCExPdFX
 e+rE+O2gp/9xMEcAvi4SK3s1QUe0wjq+e7stNiOKrBhNA0CNhvJxAmIBO2y4Bi5+
 BKJelWt0vDDxCiQgW9mctq0rvu8KM6C3w6AgKWiQNbirVLKHNdx4AvF5SNWGVRhf
 ZGMVaFlpwuuTZfahbdmwi88j6968h8izjkQeQ2gZMXEOgv8P/Df+C9IcHH/HxEgI
 OwW6htpU3LeOSbEu9201AJqjqAtpXxYzodBWuGWLF7+BF6NACciITQPApkYvWGPN
 bKol/kbX52ZRzBEEUHWfrIred7KPH/2UZltPz6Fo5wwuuI2zKBYecIN/vWppzITH
 uj5sq2UVAdRQy8/9wKys1bu0uVhlILVa/KTYuf+9cuCWGELfRTN5iHaKviUEEFWB
 aVfI8o2ThyQyvQ==
 =vF1u
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 perf fix from Thomas Gleixner:
 "A single update for perf on x86 which has support for the broken down
  bandwith counters"

* tag 'perf-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Add BW counters for GT, IA and IO breakdown
2020-08-23 11:15:14 -07:00
Linus Torvalds
10c091b62e A set of EFI fixes:
- Enforce NX on RO data in mixed EFI mode
  - Destroy workqueue in an error handling path to prevent UAF
  - Stop argument parser at '--' which is the delimiter for init
  - Treat a NULL command line pointer as empty instead of dereferncing it
    unconditionally.
  - Handle an unterminated command line correctly
  - Cleanup the 32bit code leftovers and remove obsolete documentation
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl9CJMATHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoXDXD/4ypNf6J1B7VX/eSO8IRW5LepYaatUb
 nOYESLCFF9RcWV4ff1ujs1Wow8rmTV0U+uCb6auSAMvmFu53VA5uJ9U8lHLjx/cZ
 2qFpz6PScXr+cv4oPZ/8T4i7zuT887He1jtwsc+ABRHDUu9yd+qt0IZxqhLAqGQW
 dBal0WdZIfMftR1IBZvdbp8hNqHU/sEnVYJOovf2PpKNe8Z+IR2eeQVUoryM7V1b
 VNL+KtfpsYAbG5Gr4QVB18tHLnfEEJZHh8Dt7TlTOMF8hFJ6lAE7MutydWqHDMVv
 DlXtbs07C1aFIlpJuRdiwKHvNh4/9cUG0XgPqXW44NAXKK2yynV33oidlu9xjLNA
 uuFZ7Ni3XyHYgI4PyNX81tC4tG/krIRi5282KRh1OEUc7zgJvHRXo8cCy4HLgpEM
 VU4R7bI3LLBJgAdEHh/4EXNnryXOdk2ATUMcLgWGH54ZvnKRCFo1soNggzmiZb+2
 WXVRHbSS3nsc0udmXmoUSqqWkud29o9r3KnuC/2qomCXMwtVSTEamTiok+86ICfc
 BiNZRu3DeoZqovyssBuSEeNGazrLsg8cGR0lhlpkAf0nu12y+quawdOj13SB8QcH
 XGrsCEmXD4jE9cpHt/+qIipyRzFt8RcamJCMH9m5ZNhx1GS6ddunxZAZFD2Fbg/9
 +CSclYiPCfH0cw==
 =+nbs
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI fixes from Thomas Gleixner:

 - Enforce NX on RO data in mixed EFI mode

 - Destroy workqueue in an error handling path to prevent UAF

 - Stop argument parser at '--' which is the delimiter for init

 - Treat a NULL command line pointer as empty instead of dereferncing it
   unconditionally.

 - Handle an unterminated command line correctly

 - Cleanup the 32bit code leftovers and remove obsolete documentation

* tag 'efi-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation: efi: remove description of efi=old_map
  efi/x86: Move 32-bit code into efi_32.c
  efi/libstub: Handle unterminated cmdline
  efi/libstub: Handle NULL cmdline
  efi/libstub: Stop parsing arguments at "--"
  efi: add missed destroy_workqueue when efisubsys_init fails
  efi/x86: Mark kernel rodata non-executable for mixed mode
2020-08-23 11:08:32 -07:00
Linus Torvalds
e99b2507ba A single bug fix for the common entry code. The transcript of the x86
version messed up the reload of the syscall number from pt_regs after
 ptrace and seccomp which breaks syscall number rewriting.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl9CI6YTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoQCvEACoc8+Nd3sFR1UoNASbu5DV6PkUmgGy
 eQLKUA42toTzqJIcyXPRAjBrRc51IFEaxZlqGC7KWjQM9d9cdJGylg4zfwspZoI+
 tsvYKCPxvswVJ09QZmibn35+dbJEiYtQ96Cq0BQx/kaaouNeceRtDXV2ptP9dPSx
 pyv3pb8nchjADcKrqbMYe8t647X1kM25BglbTkHOJZDSubEsgMbN6P3d70n2sNO6
 8jQC4o9DX2AJnN5K3tLyN1yoLUYKUdFlj6X2BgusK8HbBVQ2m7eTPaIT2aNGs648
 7CrY49ggFnr8BVJuhIvjAwdyJPcTm9rcWphfD+WBAWrVO7r205aKAINDsoZwrhBe
 4ykfhs2PzfvHMrqKfKfbfNDQu9p6ZWwh3ZLbUpbunZQPCFB8EwL1x/5O/pGWGCNF
 F4rvfh02BuRPTljjM0pXFx05etT/OKKHjgdB7vxKJzb52dxcIZqqbut+lcTCYAmS
 n2M2H/Tgt4NgJsu4dgGamL6JNvHf1JUhyWVB2ZfRLvGMiiEDmyttct2E1Ji+AVqZ
 Dufui4KajQda+bS6VjCLtBNjC5WJ3gOzpIa4nrRw8mlTGWCgRGjsqu/Ze0Fkds6X
 r6WT4NzJ4pD3E/bXpbegf0eikLIx+sEfiLpJGbuQ+stD52/AQjef1oaLDmmiPXKY
 Ep+yR6l58erLbg==
 =2OhI
 -----END PGP SIGNATURE-----

Merge tag 'core-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull entry fix from Thomas Gleixner:
 "A single bug fix for the common entry code.

  The transcription of the x86 version messed up the reload of the
  syscall number from pt_regs after ptrace and seccomp which breaks
  syscall number rewriting"

* tag 'core-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  core/entry: Respect syscall number rewrites
2020-08-23 11:05:47 -07:00
Linus Torvalds
d9232cb796 A single fix correcting a reversed error severity determination check
which lead to a recoverable error getting marked as fatal, by Tony
 Luck.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl9CP7YACgkQEsHwGGHe
 VUoJXBAAqU4LtWFva6XCA7TvIV8tGaxIGTqS92+eKuHON+dmoxgt5ipu5SPgg47I
 tSClJtfHAbHZJkHGuE213vXytMAji6r9w2EvY+pXlEfqN3E8d9Yek6udPuJC+zF3
 hX8uRDpYqn3MaHN/vuNPcBeB3I5jO0Wg5LXvo9udDYoxW+CoS4+lBBArHe0gWZIR
 JCJoYoLnwWSbrB3y5wJC8CrJ9vZqQXB2icTAwVbBpQa7mqu5pMVYfb1KgrzqXVCS
 6eNBSE2ZPQDdJZdzoj+22/IcFjcngAFiLFHbwwe3wc1XTX1BxdwDeaw6BKfkgkfY
 oLsD9TT1znhU3oQlraikn8IWcaewZgoCfF9mU4+AJUeSXEBenc9vZWtZ9vTRt3lo
 q2i5POUbwivi3Jn5YD4e6L9L7lBJHHHo4Wby/X5yWB/cZT/ygQ84VoRKB8L6cNZE
 DKmOHgpmqJkkTe7AyH6M+zYuH5Oq+yubmmzfZHCukcrDonDkN1z2pdf/FNj0aHfk
 7M23lyHZTnXhH60d1bgF0GqeK8lm+iVAbS8Mshbq7WvIuLXs4nv2HJkGtG/3Iwj6
 JX3DwSu4XepRL6spBTi8YiKQX9OHuJDfwttHdCyn+FcCc5iHIEJvH4fbF0Vr1qLC
 3oh9IFF7UIQGABfCA9J3frdcQymYGX3I757ZNN3RgWne23nb8Ks=
 =RYEr
 -----END PGP SIGNATURE-----

Merge tag 'edac_urgent_for_v5.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC fix from Borislav Petkov:
 "A single fix correcting a reversed error severity determination check
  which lead to a recoverable error getting marked as fatal, by Tony
  Luck"

* tag 'edac_urgent_for_v5.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  EDAC/{i7core,sb,pnd2,skx}: Fix error event severity
2020-08-23 10:57:19 -07:00
Linus Torvalds
9d045ed1eb Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
 "Nothing earth shattering here, lots of small fixes (f.e. missing RCU
  protection, bad ref counting, missing memset(), etc.) all over the
  place:

   1) Use get_file_rcu() in task_file iterator, from Yonghong Song.

   2) There are two ways to set remote source MAC addresses in macvlan
      driver, but only one of which validates things properly. Fix this.
      From Alvin Šipraga.

   3) Missing of_node_put() in gianfar probing, from Sumera
      Priyadarsini.

   4) Preserve device wanted feature bits across multiple netlink
      ethtool requests, from Maxim Mikityanskiy.

   5) Fix rcu_sched stall in task and task_file bpf iterators, from
      Yonghong Song.

   6) Avoid reset after device destroy in ena driver, from Shay
      Agroskin.

   7) Missing memset() in netlink policy export reallocation path, from
      Johannes Berg.

   8) Fix info leak in __smc_diag_dump(), from Peilin Ye.

   9) Decapsulate ECN properly for ipv6 in ipv4 tunnels, from Mark
      Tomlinson.

  10) Fix number of data stream negotiation in SCTP, from David Laight.

  11) Fix double free in connection tracker action module, from Alaa
      Hleihel.

  12) Don't allow empty NHA_GROUP attributes, from Nikolay Aleksandrov"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (46 commits)
  net: nexthop: don't allow empty NHA_GROUP
  bpf: Fix two typos in uapi/linux/bpf.h
  net: dsa: b53: check for timeout
  tipc: call rcu_read_lock() in tipc_aead_encrypt_done()
  net/sched: act_ct: Fix skb double-free in tcf_ct_handle_fragments() error flow
  net: sctp: Fix negotiation of the number of data streams.
  dt-bindings: net: renesas, ether: Improve schema validation
  gre6: Fix reception with IP6_TNL_F_RCV_DSCP_COPY
  hv_netvsc: Fix the queue_mapping in netvsc_vf_xmit()
  hv_netvsc: Remove "unlikely" from netvsc_select_queue
  bpf: selftests: global_funcs: Check err_str before strstr
  bpf: xdp: Fix XDP mode when no mode flags specified
  selftests/bpf: Remove test_align leftovers
  tools/resolve_btfids: Fix sections with wrong alignment
  net/smc: Prevent kernel-infoleak in __smc_diag_dump()
  sfc: fix build warnings on 32-bit
  net: phy: mscc: Fix a couple of spelling mistakes "spcified" -> "specified"
  libbpf: Fix map index used in error message
  net: gemini: Fix missing free_netdev() in error path of gemini_ethernet_port_probe()
  net: atlantic: Use readx_poll_timeout() for large timeout
  ...
2020-08-23 10:52:33 -07:00
Linus Torvalds
f320ac6e13 Merge branch 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull epoll fixes from Al Viro:
 "Fix reference counting and clean up exit paths"

* 'work.epoll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  do_epoll_ctl(): clean the failure exits up a bit
  epoll: Keep a reference on files added to the check list
2020-08-22 17:11:38 -07:00
Al Viro
52c479697c do_epoll_ctl(): clean the failure exits up a bit
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-08-22 18:25:52 -04:00
Marc Zyngier
a9ed4a6560 epoll: Keep a reference on files added to the check list
When adding a new fd to an epoll, and that this new fd is an
epoll fd itself, we recursively scan the fds attached to it
to detect cycles, and add non-epool files to a "check list"
that gets subsequently parsed.

However, this check list isn't completely safe when deletions
can happen concurrently. To sidestep the issue, make sure that
a struct file placed on the check list sees its f_count increased,
ensuring that a concurrent deletion won't result in the file
disapearing from under our feet.

Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-08-22 18:23:57 -04:00
Nikolay Aleksandrov
eeaac3634e net: nexthop: don't allow empty NHA_GROUP
Currently the nexthop code will use an empty NHA_GROUP attribute, but it
requires at least 1 entry in order to function properly. Otherwise we
end up derefencing null or random pointers all over the place due to not
having any nh_grp_entry members allocated, nexthop code relies on having at
least the first member present. Empty NHA_GROUP doesn't make any sense so
just disallow it.
Also add a WARN_ON for any future users of nexthop_create_group().

 BUG: kernel NULL pointer dereference, address: 0000000000000080
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP
 CPU: 0 PID: 558 Comm: ip Not tainted 5.9.0-rc1+ #93
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
 RIP: 0010:fib_check_nexthop+0x4a/0xaa
 Code: 0f 84 83 00 00 00 48 c7 02 80 03 f7 81 c3 40 80 fe fe 75 12 b8 ea ff ff ff 48 85 d2 74 6b 48 c7 02 40 03 f7 81 c3 48 8b 40 10 <48> 8b 80 80 00 00 00 eb 36 80 78 1a 00 74 12 b8 ea ff ff ff 48 85
 RSP: 0018:ffff88807983ba00 EFLAGS: 00010213
 RAX: 0000000000000000 RBX: ffff88807983bc00 RCX: 0000000000000000
 RDX: ffff88807983bc00 RSI: 0000000000000000 RDI: ffff88807bdd0a80
 RBP: ffff88807983baf8 R08: 0000000000000dc0 R09: 000000000000040a
 R10: 0000000000000000 R11: ffff88807bdd0ae8 R12: 0000000000000000
 R13: 0000000000000000 R14: ffff88807bea3100 R15: 0000000000000001
 FS:  00007f10db393700(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000080 CR3: 000000007bd0f004 CR4: 00000000003706f0
 Call Trace:
  fib_create_info+0x64d/0xaf7
  fib_table_insert+0xf6/0x581
  ? __vma_adjust+0x3b6/0x4d4
  inet_rtm_newroute+0x56/0x70
  rtnetlink_rcv_msg+0x1e3/0x20d
  ? rtnl_calcit.isra.0+0xb8/0xb8
  netlink_rcv_skb+0x5b/0xac
  netlink_unicast+0xfa/0x17b
  netlink_sendmsg+0x334/0x353
  sock_sendmsg_nosec+0xf/0x3f
  ____sys_sendmsg+0x1a0/0x1fc
  ? copy_msghdr_from_user+0x4c/0x61
  ___sys_sendmsg+0x63/0x84
  ? handle_mm_fault+0xa39/0x11b5
  ? sockfd_lookup_light+0x72/0x9a
  __sys_sendmsg+0x50/0x6e
  do_syscall_64+0x54/0xbe
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7f10dacc0bb7
 Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 8b 05 9a 4b 2b 00 85 c0 75 2e 48 63 ff 48 63 d2 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 b1 f2 2a 00 f7 d8 64 89 02 48
 RSP: 002b:00007ffcbe628bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 00007ffcbe628f80 RCX: 00007f10dacc0bb7
 RDX: 0000000000000000 RSI: 00007ffcbe628c60 RDI: 0000000000000003
 RBP: 000000005f41099c R08: 0000000000000001 R09: 0000000000000008
 R10: 00000000000005e9 R11: 0000000000000246 R12: 0000000000000000
 R13: 0000000000000000 R14: 00007ffcbe628d70 R15: 0000563a86c6e440
 Modules linked in:
 CR2: 0000000000000080

CC: David Ahern <dsahern@gmail.com>
Fixes: 430a049190 ("nexthop: Add support for nexthop groups")
Reported-by: syzbot+a61aa19b0c14c8770bd9@syzkaller.appspotmail.com
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-22 12:39:55 -07:00
Linus Torvalds
c3d8f220d0 Kbuild fixes for v5.9
- move -Wsign-compare warning from W=2 to W=3
 
  - fix the keyword _restrict to __restrict in genksyms
 
  - fix more bugs in qconf
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl9BIv8VHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGgUgQAKvWTNNvrY0YEY5YrFjh5swT/bBs
 9YfYdrEe7MOkmUtIf0DemwzdmaiZmLSGLAZZNfMEElLdk4bofi13uznh/7M/Yd7H
 Vws7qeQDypRYxieUhrQMubpxoK/ZFPb89x+zX9LlFO9nB1/810iMVJkrysSbVDDU
 QnQWcKIa7X+narruX3DWz5I9FxarODXTIVLz4mR29z7xo1UGfuGuLlH3/mmGLaCt
 kmVnDs0eEFpRt1y70plZ5YZqhFV3619LvRQW8RwJgVXEPzXb5FaWhoJnfIlLgcJb
 epZ0miZgTMAugqryfD0cyqAImNglQTkfhXtmWJR5g77qrqwqHkSi/xnMHhbEP499
 h/GwUmmfUzcjwJJlCzfLE1tuSPRgOvoy6Yp/6T0f+rCVhr8EyZ15BG0qDF3leHY9
 Wlz6CdnRvmBKicMyK6MhH0MFWGmE4h2XM7eCMkVYG6u7WiDxC/FjCYI/73I2632t
 YXeMevAtXfNM5J0TOk1zzUeIordan4s5J5ddlRCRd7GMMmqq5BnfuNI+B9n5Npev
 6g+XerkCreHEgOX1HSAj1SYmSykHjIyIl5AmWwRJEvRlxMKuNPPvar0oJqSUlJ7j
 HPsKAhdl+dlKCxKmz7e3nyfkc5IfMrMAfc6dD9/DAxggWBTeCU0UgaB0ZrFkHMdm
 yx4fimZilwKm3wUA
 =bySb
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-fixes-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - move -Wsign-compare warning from W=2 to W=3

 - fix the keyword _restrict to __restrict in genksyms

 - fix more bugs in qconf

* tag 'kbuild-fixes-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: qconf: replace deprecated QString::sprintf() with QTextStream
  kconfig: qconf: remove redundant help in the info view
  kconfig: qconf: remove qInfo() to get back Qt4 support
  kconfig: qconf: remove unused colNr
  kconfig: qconf: fix the popup menu in the ConfigInfoView window
  kconfig: qconf: fix signal connection to invalid slots
  genksyms: keywords: Use __restrict not _restrict
  kbuild: remove redundant patterns in filter/filter-out
  extract-cert: add static to local data
  Makefile.extrawarn: Move sign-compare from W=2 to W=3
2020-08-22 10:22:44 -07:00
Linus Torvalds
dd105d64a0 - Allow booting of late secondary CPUs affected by erratum 1418040
(currently they are parked if none of the early CPUs are affected by
   this erratum).
 
 - Add the 32-bit vdso Makefile to the vdso_install rule so that 'make
   vdso_install' installs the 32-bit compat vdso when it is compiled.
 
 - Print a warning that untrusted guests without a CPU erratum workaround
   (Cortex-A57 832075) may deadlock the affected system.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl9BOb8ACgkQa9axLQDI
 XvFnixAAladNjLswpC8gm0PSv2fXD+OJU7XiJKl1McG0UcnkRp1brgruOi7WMyQg
 I0cbY6Kjfb6mFJaHfEA9uuQ0P44FGmI+gz+3Injl+a0qJgdu0QLU1uJQG/Rae+zG
 kdoimf+/CnLnJTiIA5YXsdrFhSQsn2lVConJx26QrSJO0SB0TROy86aRSrRSMyYy
 mXrK4xCm/cx4LqJQJrFmShpgs/IjuK8T/LZBInjgB43e3y++SHwGTSA2kf3NOldz
 Rx4HqQVJ37IeROZ8B2v8MggZsTk/5C+DaxJu6QBk7REDKPnI9+RAr3BhPUAAVWDn
 BONYtsjBLgf/Q9bmXNsHlGAhjIIeOAgaIIr10oVhSFScCHsjEU15hqyNP2jfgLSC
 Q4cgU8bA0sm36CHNI0vd5phAnMBN6HJZtmSzu2xb/GJKYW4yiuXupYiOWbBAhhiu
 uAFjMRW+dwfOXY/59kwpmedBZ0WvHod2mPhp3n4FRzo3X0NfRgHzgg1kDRwqVAzv
 eh8ynUYMjuy5H3siNR0279M6epIK/hTtqSvnzIqkCTBkftsYX+32v4UzEardRGiv
 +/gKJ1XQGmWD3OCISuMZwaRK+tH5V68eT+3P0gGmssJLYMtRJFzzVtuv3mz1yXRD
 USsww/SY2etlGU+N8r+66NSqU0KPRdQ0tkK6eCX8DTyl3MPLQfQ=
 =VkR3
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Allow booting of late secondary CPUs affected by erratum 1418040
   (currently they are parked if none of the early CPUs are affected by
   this erratum).

 - Add the 32-bit vdso Makefile to the vdso_install rule so that 'make
   vdso_install' installs the 32-bit compat vdso when it is compiled.

 - Print a warning that untrusted guests without a CPU erratum
   workaround (Cortex-A57 832075) may deadlock the affected system.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  ARM64: vdso32: Install vdso32 from vdso_install
  KVM: arm64: Print warning when cpu erratum can cause guests to deadlock
  arm64: Allow booting of late CPUs affected by erratum 1418040
  arm64: Move handling of erratum 1418040 into C code
2020-08-22 10:17:36 -07:00