Commit Graph

10 Commits

Author SHA1 Message Date
Ingo Molnar
0881e7bd34 sched/headers: Prepare to move the get_task_struct()/put_task_struct() and related APIs from <linux/sched.h> to <linux/sched/task.h>
But first update usage sites with the new header dependency.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:40 +01:00
Ingo Molnar
b17b01533b sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h>
We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/debug.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:34 +01:00
Davidlohr Bueso
642fa448ae sched/core: Remove set_task_state()
This is a nasty interface and setting the state of a foreign task must
not be done. As of the following commit:

  be628be095 ("bcache: Make gc wakeup sane, remove set_task_state()")

... everyone in the kernel calls set_task_state() with current, allowing
the helper to be removed.

However, as the comment indicates, it is still around for those archs
where computing current is more expensive than using a pointer, at least
in theory. An important arch that is affected is arm64, however this has
been addressed now [1] and performance is up to par making no difference
with either calls.

Of all the callers, if any, it's the locking bits that would care most
about this -- ie: we end up passing a tsk pointer to a lot of the lock
slowpath, and setting ->state on that. The following numbers are based
on two tests: a custom ad-hoc microbenchmark that just measures
latencies (for ~65 million calls) between get_task_state() vs
get_current_state().

Secondly for a higher overview, an unlink microbenchmark was used,
which pounds on a single file with open, close,unlink combos with
increasing thread counts (up to 4x ncpus). While the workload is quite
unrealistic, it does contend a lot on the inode mutex or now rwsem.

[1] https://lkml.kernel.org/r/1483468021-8237-1-git-send-email-mark.rutland@arm.com

== 1. x86-64 ==

Avg runtime set_task_state():    601 msecs
Avg runtime set_current_state(): 552 msecs

                                            vanilla                 dirty
Hmean    unlink1-processes-2      36089.26 (  0.00%)    38977.33 (  8.00%)
Hmean    unlink1-processes-5      28555.01 (  0.00%)    29832.55 (  4.28%)
Hmean    unlink1-processes-8      37323.75 (  0.00%)    44974.57 ( 20.50%)
Hmean    unlink1-processes-12     43571.88 (  0.00%)    44283.01 (  1.63%)
Hmean    unlink1-processes-21     34431.52 (  0.00%)    38284.45 ( 11.19%)
Hmean    unlink1-processes-30     34813.26 (  0.00%)    37975.17 (  9.08%)
Hmean    unlink1-processes-48     37048.90 (  0.00%)    39862.78 (  7.59%)
Hmean    unlink1-processes-79     35630.01 (  0.00%)    36855.30 (  3.44%)
Hmean    unlink1-processes-110    36115.85 (  0.00%)    39843.91 ( 10.32%)
Hmean    unlink1-processes-141    32546.96 (  0.00%)    35418.52 (  8.82%)
Hmean    unlink1-processes-172    34674.79 (  0.00%)    36899.21 (  6.42%)
Hmean    unlink1-processes-203    37303.11 (  0.00%)    36393.04 ( -2.44%)
Hmean    unlink1-processes-224    35712.13 (  0.00%)    36685.96 (  2.73%)

== 2. ppc64le ==

Avg runtime set_task_state():  938 msecs
Avg runtime set_current_state: 940 msecs

                                            vanilla                 dirty
Hmean    unlink1-processes-2      19269.19 (  0.00%)    30704.50 ( 59.35%)
Hmean    unlink1-processes-5      20106.15 (  0.00%)    21804.15 (  8.45%)
Hmean    unlink1-processes-8      17496.97 (  0.00%)    17243.28 ( -1.45%)
Hmean    unlink1-processes-12     14224.15 (  0.00%)    17240.21 ( 21.20%)
Hmean    unlink1-processes-21     14155.66 (  0.00%)    15681.23 ( 10.78%)
Hmean    unlink1-processes-30     14450.70 (  0.00%)    15995.83 ( 10.69%)
Hmean    unlink1-processes-48     16945.57 (  0.00%)    16370.42 ( -3.39%)
Hmean    unlink1-processes-79     15788.39 (  0.00%)    14639.27 ( -7.28%)
Hmean    unlink1-processes-110    14268.48 (  0.00%)    14377.40 (  0.76%)
Hmean    unlink1-processes-141    14023.65 (  0.00%)    16271.69 ( 16.03%)
Hmean    unlink1-processes-172    13417.62 (  0.00%)    16067.55 ( 19.75%)
Hmean    unlink1-processes-203    15293.08 (  0.00%)    15440.40 (  0.96%)
Hmean    unlink1-processes-234    13719.32 (  0.00%)    16190.74 ( 18.01%)
Hmean    unlink1-processes-265    16400.97 (  0.00%)    16115.22 ( -1.74%)
Hmean    unlink1-processes-296    14388.60 (  0.00%)    16216.13 ( 12.70%)
Hmean    unlink1-processes-320    15771.85 (  0.00%)    15905.96 (  0.85%)

x86-64 (known to be fast for get_current()/this_cpu_read_stable() caching)
and ppc64 (with paca) show similar improvements in the unlink microbenches.
The small delta for ppc64 (2ms), does not represent the gains on the unlink
runs. In the case of x86, there was a decent amount of variation in the
latency runs, but always within a 20 to 50ms increase), ppc was more constant.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Cc: mark.rutland@arm.com
Link: http://lkml.kernel.org/r/1483479794-14013-5-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14 11:14:16 +01:00
Davidlohr Bueso
5376f2e722 drivers/tty: Compute 'current' directly
This patch effectively replaces the tsk pointer dereference
(which is obviously == current), to directly use get_current()
macro. This is to make the removal of setting foreign task
states smoother and painfully obvious. Performance win on some
archs such as x86-64 and ppc64 -- arm64 is no longer an issue.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: mark.rutland@arm.com
Link: http://lkml.kernel.org/r/1483479794-14013-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14 11:14:13 +01:00
Denys Vlasenko
5ef6504e9d tty: Deinline __ldsem_down_write_nested, save 128 bytes
This function compiles to 491 bytes of machine code.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Jiri Slaby <jslaby@suse.com>
CC: linux-serial@vger.kernel.org
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-13 19:59:48 -08:00
Denys Vlasenko
fc0285f210 tty: Deinline __ldsem_down_read_nested, save 128 bytes
This function compiles to 479 bytes of machine code.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Jiri Slaby <jslaby@suse.com>
CC: linux-serial@vger.kernel.org
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-12-13 19:59:48 -08:00
Greg Kroah-Hartman
f9ce5ccfd9 tty: tty_ldsem.c: move assignment out of if () block
We should not be doing assignments within an if () block
so fix up the code to not do this.

change was created using Coccinelle.

CC: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-10 19:04:18 +02:00
Oleg Nesterov
fb9edbe984 lockdep: Make held_lock->check and "int check" argument bool
The "int check" argument of lock_acquire() and held_lock->check are
misleading. This is actually a boolean: 2 means "true", everything
else is "false".

And there is no need to pass 1 or 0 to lock_acquire() depending on
CONFIG_PROVE_LOCKING, __lock_acquire() checks prove_locking at the
start and clears "check" if !CONFIG_PROVE_LOCKING.

Note: probably we can simply kill this member/arg. The only explicit
user of check => 0 is rcu_lock_acquire(), perhaps we can change it to
use lock_acquire(trylock =>, read => 2). __lockdep_no_validate means
check => 0 implicitly, but we can change validate_chain() to check
hlock->instance->key instead. Not to mention it would be nice to get
rid of lockdep_set_novalidate_class().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140120182006.GA26495@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 21:18:54 +01:00
Peter Hurley
cf872776fc tty: Fix hang at ldsem_down_read()
When a controlling tty is being hung up and the hang up is
waiting for a just-signalled tty reader or writer to exit, and a new tty
reader/writer tries to acquire an ldisc reference concurrently with the
ldisc reference release from the signalled reader/writer, the hangup
can hang. The new reader/writer is sleeping in ldsem_down_read() and the
hangup is sleeping in ldsem_down_write() [1].

The new reader/writer fails to wakeup the waiting hangup because the
wrong lock count value is checked (the old lock count rather than the new
lock count) to see if the lock is unowned.

Change helper function to return the new lock count if the cmpxchg was
successful; document this behavior.

[1] edited dmesg log from reporter

SysRq : Show Blocked State
  task                        PC stack   pid father
systemd         D ffff88040c4f0000     0     1      0 0x00000000
 ffff88040c49fbe0 0000000000000046 ffff88040c4a0000 ffff88040c49ffd8
 00000000001d3980 00000000001d3980 ffff88040c4a0000 ffff88040593d840
 ffff88040c49fb40 ffffffff810a4cc0 0000000000000006 0000000000000023
Call Trace:
 [<ffffffff810a4cc0>] ? sched_clock_cpu+0x9f/0xe4
 [<ffffffff810a4cc0>] ? sched_clock_cpu+0x9f/0xe4
 [<ffffffff810a4cc0>] ? sched_clock_cpu+0x9f/0xe4
 [<ffffffff810a4cc0>] ? sched_clock_cpu+0x9f/0xe4
 [<ffffffff817a6649>] schedule+0x24/0x5e
 [<ffffffff817a588b>] schedule_timeout+0x15b/0x1ec
 [<ffffffff810a4cc0>] ? sched_clock_cpu+0x9f/0xe4
 [<ffffffff817aa691>] ? _raw_spin_unlock_irq+0x24/0x26
 [<ffffffff817aa10c>] down_read_failed+0xe3/0x1b9
 [<ffffffff817aa26d>] ldsem_down_read+0x8b/0xa5
 [<ffffffff8142b5ca>] ? tty_ldisc_ref_wait+0x1b/0x44
 [<ffffffff8142b5ca>] tty_ldisc_ref_wait+0x1b/0x44
 [<ffffffff81423f5b>] tty_write+0x7d/0x28a
 [<ffffffff814241f5>] redirected_tty_write+0x8d/0x98
 [<ffffffff81424168>] ? tty_write+0x28a/0x28a
 [<ffffffff8115d03f>] do_loop_readv_writev+0x56/0x79
 [<ffffffff8115e604>] do_readv_writev+0x1b0/0x1ff
 [<ffffffff8116ea0b>] ? do_vfs_ioctl+0x32a/0x489
 [<ffffffff81167d9d>] ? final_putname+0x1d/0x3a
 [<ffffffff8115e6c7>] vfs_writev+0x2e/0x49
 [<ffffffff8115e7d3>] SyS_writev+0x47/0xaa
 [<ffffffff817ab822>] system_call_fastpath+0x16/0x1b
bash            D ffffffff81c104c0     0  5469   5302 0x00000082
 ffff8800cf817ac0 0000000000000046 ffff8804086b22a0 ffff8800cf817fd8
 00000000001d3980 00000000001d3980 ffff8804086b22a0 ffff8800cf817a48
 000000000000b9a0 ffff8800cf817a78 ffffffff81004675 ffff8800cf817a44
Call Trace:
 [<ffffffff81004675>] ? dump_trace+0x165/0x29c
 [<ffffffff810a4cc0>] ? sched_clock_cpu+0x9f/0xe4
 [<ffffffff8100edda>] ? save_stack_trace+0x26/0x41
 [<ffffffff817a6649>] schedule+0x24/0x5e
 [<ffffffff817a588b>] schedule_timeout+0x15b/0x1ec
 [<ffffffff810a4cc0>] ? sched_clock_cpu+0x9f/0xe4
 [<ffffffff817a9f03>] ? down_write_failed+0xa3/0x1c9
 [<ffffffff817aa691>] ? _raw_spin_unlock_irq+0x24/0x26
 [<ffffffff817a9f0b>] down_write_failed+0xab/0x1c9
 [<ffffffff817aa300>] ldsem_down_write+0x79/0xb1
 [<ffffffff817aada3>] ? tty_ldisc_lock_pair_timeout+0xa5/0xd9
 [<ffffffff817aada3>] tty_ldisc_lock_pair_timeout+0xa5/0xd9
 [<ffffffff8142bf33>] tty_ldisc_hangup+0xc4/0x218
 [<ffffffff81423ab3>] __tty_hangup+0x2e2/0x3ed
 [<ffffffff81424a76>] disassociate_ctty+0x63/0x226
 [<ffffffff81078aa7>] do_exit+0x79f/0xa11
 [<ffffffff81086bdb>] ? get_signal_to_deliver+0x206/0x62f
 [<ffffffff810b4bfb>] ? lock_release_holdtime.part.8+0xf/0x16e
 [<ffffffff81079b05>] do_group_exit+0x47/0xb5
 [<ffffffff81086c16>] get_signal_to_deliver+0x241/0x62f
 [<ffffffff810020a7>] do_signal+0x43/0x59d
 [<ffffffff810f2af7>] ? __audit_syscall_exit+0x21a/0x2a8
 [<ffffffff810b4bfb>] ? lock_release_holdtime.part.8+0xf/0x16e
 [<ffffffff81002655>] do_notify_resume+0x54/0x6c
 [<ffffffff817abaf8>] int_signal+0x12/0x17

Reported-by: Sami Farin <sami.farin@gmail.com>
Cc: <stable@vger.kernel.org> # 3.12.x
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-12-16 16:55:43 -08:00
Peter Hurley
4898e640ca tty: Add timed, writer-prioritized rw semaphore
The semantics of a rw semaphore are almost ideally suited
for tty line discipline lifetime management;  multiple active
threads obtain "references" (read locks) while performing i/o
to prevent the loss or change of the current line discipline
(write lock).

Unfortunately, the existing rw_semaphore is ill-suited in other
ways;
1) TIOCSETD ioctl (change line discipline) expects to return an
   error if the line discipline cannot be exclusively locked within
   5 secs. Lock wait timeouts are not supported by rwsem.
2) A tty hangup is expected to halt and scrap pending i/o, so
   exclusive locking must be prioritized.
   Writer priority is not supported by rwsem.

Add ld_semaphore which implements these requirements in a
semantically similar way to rw_semaphore.

Writer priority is handled by separate wait lists for readers and
writers. Pending write waits are priortized before existing read
waits and prevent further read locks.

Wait timeouts are trivially added, but obviously change the lock
semantics as lock attempts can fail (but only due to timeout).

This implementation incorporates the write-lock stealing work of
Michel Lespinasse <walken@google.com>.

Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-20 12:30:32 -07:00