osq lock maintainers don't want it to be used outside of kernel/locking/
- but, we can do better.
Since we have lock handoff signalled via waitlist entries, there's no
reason for optimistic spinning to have to look at the lock at all -
aside from checking lock-owner; we can just spin looking at our waitlist
entry.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In percpu reader mode, trylock() for read had a lost wakeup: on failure
to get the lock, we may have caused a writer to fail to get the lock,
because we temporarily elevated the reader count.
We need to check for waiters after decrementing the read count - not
before.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Six locks do lock handoff via the wakeup path: the thread doing the
wakeup also takes the lock on behalf of the waiter, which means the
waiter only has to look at its waitlist entry, and doesn't have to touch
the lock cacheline while another thread is using it.
Linus noticed that this needs a real barrier, which this patch fixes.
Also add a comment for the should_sleep_fn() error path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: linux-bcachefs@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
When running in userspace, we currently don't have a real percpu
implementation available - at least in bcachefs-tools, which is where
this code is currently used in userspace.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This switches to a newer cmpxchg variant which updates @old for us on
failure, simplifying the cmpxchg loops a bit and supposedly generating
better code.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In the conversion to atomic_t, six_lock_slowpath() ended up calling
six_lock_wakeup() in the failure path with a state variable that was
never initialized - whoops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Since we're not generating different versions of the lock functions for
each lock type, the constant propagation we were trying to do before is
no longer useful - this is now a small code size decrease.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This deletes the crazy cast-atomic-to-unsigned-long, and replaces them
with atomic_and() and atomic_or().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
lock->state.seq is shortly being moved out of lock->state, to kill the
depedency on atomic64; in preparation for that, we change the write
locking bit to write locked.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The next patch is going to move lock->seq out of lock->state. This
replaces six_relock() with a much simpler implementation based on
trylock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
- Expanded and revamped overview documentation in six.h, giving an
overview of all features
- docbook-comments for all external interfaces
- Rename some functions for simplicity, i.e.
six_lock_ip_type() -> six_lock_ip()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
As suggested by Linus, this drops the six_lock_state union in favor of
raw bitmasks.
On the one hand, bitfields give more type-level structure to the code.
However, a significant amount of the code was working with
six_lock_state as a u64/atomic64_t, and the conversions from the
bitfields to the u64 were deemed a bit too out-there.
More significantly, because bitfield order is poorly defined (#ifdef
__LITTLE_ENDIAN_BITFIELD can be used, but is gross), incrementing the
sequence number would overflow into the rest of the bitfield if the
compiler didn't put the sequence number at the high end of the word.
The new code is a bit saner when we're on an architecture without real
atomic64_t support - all accesses to lock->state now go through
atomic64_*() operations.
On architectures with real atomic64_t support, we additionally use
atomic bit ops for setting/clearing individual bits.
Text size: 7467 bytes -> 4649 bytes - compilers still suck at
bitfields.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Originally, we used inlining/flattening to cause the compiler to
generate different versions of lock/trylock/relock/unlock for each lock
type - read, intent, and write. This made the individual functions
smaller and let the compiler eliminate table lookups: however, as the
code has gotten more complicated these optimizations have gotten less
worthwhile, and all the tricky inlining and dispatching made the code
less readable.
Text size: 11015 bytes -> 7467 bytes, and benchmarks show no loss of
performance.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Originally, the waiting bit was always set by trylock() on failure:
however, it's now set by __six_lock_type_slowpath(), with wait_lock held
- which is the more correct place to do it.
That made setting the waiting bit in trylock redundant, so this patch
deletes that.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The lost wakeup bug hasn't been observed in awhile, and we're trying to
provoke it and determine if it still exists.
This patch removes some defenses that were added to attempt to track it
down; if it still exists, this should make it easier to see it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
six_lock_pcpu_alloc() is an unsafe interface: it's not safe to allocate
or free the percpu reader count on an existing lock that's in use, the
only safe time to allocate percpu readers is when the lock is first
being initialized.
This patch adds a flags parameter to six_lock_init(), and instead of
six_lock_pcpu_free() we now expose six_lock_exit(), which does the same
thing but is less likely to be misused.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This moves a helper out of the bcachefs code that shouldn't have been
there, since it touches six lock internals.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This is a workaround for a lost wakeup bug we've been seeing - we still
need to discover the actual bug.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This adds a threshold for the maximum spin time, similar to the rwsem
code, and a flag to the lock itself indicating when we've spun too long
so other threads also refrain from spinning.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This adds _ip variations of the various lock functions that allow an IP
to be passed in, which is used by lockstat.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This brings back an important optimization, to avoid touching the wait
lists an extra time, while preserving the property that a thread is on a
lock waitlist iff it is waiting - it is never removed from the waitlist
until it has the lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
There was a lost wakeup between a read unlock in percpu mode and a write
lock. The unlock path unlocks, then executes a barrier, then checks for
waiters; correspondingly, the lock side should set the wait bit and
execute a barrier, then attempt to take the lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This is needed by the cycle detector in bcachefs - we need a way to
iterater over waitlist entries while dropping and retaking the waitlist
lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This switches to a single list of waiters, instead of separate lists for
read and intent, and switches write locks to also use the wait lists
instead of being handled differently.
Also, removal from the wait list is now done by the process waiting on
the lock, not the process doing the wakeup. This is needed for the new
deadlock cycle detector - we need tasks to stay on the waitlist until
they've successfully acquired the lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
six_lock_count now counts up whether a write lock held, and this patch
now also correctly counts six_lock->intent_lock_recurse.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.
Website: https://bcachefs.org
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>