Pull timekeeping updates from John Stultz:
- Make the timekeeping update more precise when NTP frequency is set
directly by updating the multiplier.
- Adjust selftests
- Join split message for easier grepping,
- Use pr_*() instead of printk*(),
- Use %u to format unsigned cpu numbers.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180712144118.8819-1-geert+renesas@glider.be
Commit b5793b0d92 added support for building the nanosleep compat system
call on 32-bit architectures, but missed one change in nanosleep_copyout(),
which would trigger a BUG() as soon as any architecture is switched over to
use it.
Use the proper config symbol to enable the code path.
Fixes: Commit b5793b0d92 ("posix-timers: Make compat syscalls depend on CONFIG_COMPAT_32BIT_TIME")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: y2038@lists.linaro.org
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20180618140811.2998503-1-arnd@arndb.de
Revert commits
92af4dcb4e ("tracing: Unify the "boot" and "mono" tracing clocks")
127bfa5f43 ("hrtimer: Unify MONOTONIC and BOOTTIME clock behavior")
7250a4047a ("posix-timers: Unify MONOTONIC and BOOTTIME clock behavior")
d6c7270e91 ("timekeeping: Remove boot time specific code")
f2d6fdbfd2 ("Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior")
d6ed449afd ("timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock")
72199320d4 ("timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock")
As stated in the pull request for the unification of CLOCK_MONOTONIC and
CLOCK_BOOTTIME, it was clear that we might have to revert the change.
As reported by several folks systemd and other applications rely on the
documented behaviour of CLOCK_MONOTONIC on Linux and break with the above
changes. After resume daemons time out and other timeout related issues are
observed. Rafael compiled this list:
* systemd kills daemons on resume, after >WatchdogSec seconds
of suspending (Genki Sky). [Verified that that's because systemd uses
CLOCK_MONOTONIC and expects it to not include the suspend time.]
* systemd-journald misbehaves after resume:
systemd-journald[7266]: File /var/log/journal/016627c3c4784cd4812d4b7e96a34226/system.journal
corrupted or uncleanly shut down, renaming and replacing.
(Mike Galbraith).
* NetworkManager reports "networking disabled" and networking is broken
after resume 50% of the time (Pavel). [May be because of systemd.]
* MATE desktop dims the display and starts the screensaver right after
system resume (Pavel).
* Full system hang during resume (me). [May be due to systemd or NM or both.]
That happens on debian and open suse systems.
It's sad, that these problems were neither catched in -next nor by those
folks who expressed interest in this change.
Reported-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Reported-by: Genki Sky <sky@genki.is>,
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Change over clock_nanosleep syscalls to use y2038 safe
__kernel_timespec times. This will enable changing over
of these syscalls to use new y2038 safe syscalls when
the architectures define the CONFIG_64BIT_TIME.
Note that nanosleep syscall is deprecated and does not have a
plan for making it y2038 safe. But, the syscall should work as
before on 64 bit machines and on 32 bit machines, the syscall
works correctly until y2038 as before using the existing compat
syscall version. There is no new syscall for supporting 64 bit
time_t on 32 bit architectures.
Cc: linux-api@vger.kernel.org
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
clock_gettime, clock_settime, clock_getres and clock_nanosleep
compat syscalls are also repurposed to provide backward compatibility
to support 32 bit time_t on 32 bit systems.
Note that nanosleep compat syscall will also be treated the same way
as the above syscalls as it shares common handler functions with
clock_nanosleep. But, there is no plan to provide y2038 safe solution
for nanosleep.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
* pm-cpuidle:
tick-sched: avoid a maybe-uninitialized warning
cpuidle: Add definition of residency to sysfs documentation
time: hrtimer: Use timerqueue_iterate_next() to get to the next timer
nohz: Avoid duplication of code related to got_idle_tick
nohz: Gather tick_sched booleans under a common flag field
cpuidle: menu: Avoid selecting shallow states with stopped tick
cpuidle: menu: Refine idle state selection for running tick
sched: idle: Select idle state before stopping the tick
time: hrtimer: Introduce hrtimer_next_event_without()
time: tick-sched: Split tick_nohz_stop_sched_tick()
cpuidle: Return nohz hint from cpuidle_select()
jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC
sched: idle: Do not stop the tick before cpuidle_idle_call()
sched: idle: Do not stop the tick upfront in the idle loop
time: tick-sched: Reorganize idle tick management code
* pm-qos:
PM / QoS: mark expected switch fall-throughs
Use timerqueue_iterate_next() to get to the next timer in
__hrtimer_next_event_base() without browsing the timerqueue
details diredctly.
No intentional changes in functionality.
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The next set of changes will need to compute the time to the next
hrtimer event over all hrtimers except for the scheduler tick one.
To that end introduce a new helper function,
hrtimer_next_event_without(), for computing the time until the next
hrtimer event over all timers except for one and modify the underlying
code in __hrtimer_next_event_base() to prepare it for being called by
that new function.
No intentional changes in functionality.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Now that th MONOTONIC and BOOTTIME clocks are indentical remove all the special
casing.
The user space visible interfaces still support both clocks, but their behavior
is identical.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20180301165150.410218515@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
hrtimer does not seem to use any of kallsyms functions/defines.
Link: http://lkml.kernel.org/r/20171208025616.16267-9-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The hrtimer interrupt code contains a hang detection and mitigation
mechanism, which prevents that a long delayed hrtimer interrupt causes a
continous retriggering of interrupts which prevent the system from making
progress. If a hang is detected then the timer hardware is programmed with
a certain delay into the future and a flag is set in the hrtimer cpu base
which prevents newly enqueued timers from reprogramming the timer hardware
prior to the chosen delay. The subsequent hrtimer interrupt after the delay
clears the flag and resumes normal operation.
If such a hang happens in the last hrtimer interrupt before a CPU is
unplugged then the hang_detected flag is set and stays that way when the
CPU is plugged in again. At that point the timer hardware is not armed and
it cannot be armed because the hang_detected flag is still active, so
nothing clears that flag. As a consequence the CPU does not receive hrtimer
interrupts and no timers expire on that CPU which results in RCU stalls and
other malfunctions.
Clear the flag along with some other less critical members of the hrtimer
cpu base to ensure starting from a clean state when a CPU is plugged in.
Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
root cause of that hard to reproduce heisenbug. Once understood it's
trivial and certainly justifies a brown paperbag.
Fixes: 41d2e49493 ("hrtimer: Tune hrtimer_interrupt hang logic")
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
All prerequisites to handle hrtimers for expiry in either hard or soft
interrupt context are in place.
Add the missing bit in hrtimer_init() which associates the timer to the
hard or the softirq clock base.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-30-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
hrtimer callbacks are always invoked in hard interrupt context. Several
users in tree require soft interrupt context for their callbacks and
achieve this by combining a hrtimer with a tasklet. The hrtimer schedules
the tasklet in hard interrupt context and the tasklet callback gets invoked
in softirq context later.
That's suboptimal and aside of that the real-time patch moves most of the
hrtimers into softirq context. So adding native support for hrtimers
expiring in softirq context is a valuable extension for both mainline and
the RT patch set.
Each valid hrtimer clock id has two associated hrtimer clock bases: one for
timers expiring in hardirq context and one for timers expiring in softirq
context.
Implement the functionality to associate a hrtimer with the hard or softirq
related clock bases and update the relevant functions to take them into
account when the next expiry time needs to be evaluated.
Add a check into the hard interrupt context handler functions to check
whether the first expiring softirq based timer has expired. If it's expired
the softirq is raised and the accounting of softirq based timers to
evaluate the next expiry time for programming the timer hardware is skipped
until the softirq processing has finished. At the end of the softirq
processing the regular processing is resumed.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-29-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The softirq based hrtimer can utilize most of the existing hrtimers
functions, but need to operate on a different data set.
Add an 'active_mask' parameter to various functions so the hard and soft bases
can be selected. Fixup the existing callers and hand in the ACTIVE_HARD
mask.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-28-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently hrtimer callback functions are always executed in hard interrupt
context. Users of hrtimers, which need their timer function to be executed
in soft interrupt context, make use of tasklets to get the proper context.
Add additional hrtimer clock bases for timers which must expire in softirq
context, so the detour via the tasklet can be avoided. This is also
required for RT, where the majority of hrtimer is moved into softirq
hrtimer context.
The selection of the expiry mode happens via a mode bit. Introduce
HRTIMER_MODE_SOFT and the matching combinations with the ABS/REL/PINNED
bits and update the decoding of hrtimer_mode in tracepoints.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-27-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
__run_hrtimer() is called with the hrtimer_cpu_base.lock held and
interrupts disabled. Before invoking the timer callback the base lock is
dropped, but interrupts stay disabled.
The upcoming support for softirq based hrtimers requires that interrupts
are enabled before the timer callback is invoked.
To avoid code duplication, take hrtimer_cpu_base.lock with
raw_spin_lock_irqsave(flags) at the call site and hand in the flags as
a parameter. So raw_spin_unlock_irqrestore() before the callback invocation
will either keep interrupts disabled in interrupt context or restore to
interrupt enabled state when called from softirq context.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-26-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Preparatory patch for softirq based hrtimers to avoid code duplication,
factor out the __hrtimer_start_range_ns() function from hrtimer_start_range_ns().
No functional change.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-24-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
hrtimer_reprogram() must have access to the hrtimer_clock_base of the new
first expiring timer to access hrtimer_clock_base.offset for adjusting the
expiry time to CLOCK_MONOTONIC. This is required to evaluate whether the
new left most timer in the hrtimer_clock_base is the first expiring timer
of all clock bases in a hrtimer_cpu_base.
The only user of hrtimer_reprogram() is hrtimer_start_range_ns(), which has
a pointer to hrtimer_clock_base() already and hands it in as a parameter. But
hrtimer_start_range_ns() will be split for the upcoming support for softirq
based hrtimers to avoid code duplication and will lose the direct access to
the clock base pointer.
Instead of handing in timer and timer->base as a parameter remove the base
parameter from hrtimer_reprogram() instead and retrieve the clock base internally.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-23-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The current decision whether a timer can be queued on a remote CPU checks
for timer->expiry <= remote_cpu_base.expires_next.
This is too restrictive because a timer with the same expiry time as an
existing timer will be enqueued on right-hand size of the existing timer
inside the rbtree, i.e. behind the first expiring timer.
So its safe to allow enqueuing timers with the same expiry time as the
first expiring timer on a remote CPU base.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-22-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
hrtimer_reprogram() is conditionally invoked from hrtimer_start_range_ns()
when hrtimer_cpu_base.hres_active is true.
In the !hres_active case there is a special condition for the nohz_active
case:
If the newly enqueued timer expires before the first expiring timer on a
remote CPU then the remote CPU needs to be notified and woken up from a
NOHZ idle sleep to take the new first expiring timer into account.
Previous changes have already established the prerequisites to make the
remote enqueue behaviour the same whether high resolution mode is active or
not:
If the to be enqueued timer expires before the first expiring timer on a
remote CPU, then it cannot be enqueued there.
This was done for the high resolution mode because there is no way to
access the remote CPU timer hardware. The same is true for NOHZ, but was
handled differently by unconditionally enqueuing the timer and waking up
the remote CPU so it can reprogram its timer. Again there is no compelling
reason for this difference.
hrtimer_check_target(), which makes the 'can remote enqueue' decision is
already unconditional, but not yet functional because nothing updates
hrtimer_cpu_base.expires_next in the !hres_active case.
To unify this the following changes are required:
1) Make the store of the new first expiry time unconditonal in
hrtimer_reprogram() and check __hrtimer_hres_active() before proceeding
to the actual hardware access. This check also lets the compiler
eliminate the rest of the function in case of CONFIG_HIGH_RES_TIMERS=n.
2) Invoke hrtimer_reprogram() unconditionally from
hrtimer_start_range_ns()
3) Remove the remote wakeup special case for the !high_res && nohz_active
case.
Confine the timers_nohz_active static key to timer.c which is the only user
now.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-21-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When the first hrtimer on the current CPU is removed,
hrtimer_force_reprogram() is invoked but only when
CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active is set.
hrtimer_force_reprogram() updates hrtimer_cpu_base.expires_next and
reprograms the clock event device. When CONFIG_HIGH_RES_TIMERS=y and
hrtimer_cpu_base.hres_active is set, a pointless hrtimer interrupt can be
prevented.
hrtimer_check_target() makes the 'can remote enqueue' decision. As soon as
hrtimer_check_target() is unconditionally available and
hrtimer_cpu_base.expires_next is updated by hrtimer_reprogram(),
hrtimer_force_reprogram() needs to be available unconditionally as well to
prevent the following scenario with CONFIG_HIGH_RES_TIMERS=n:
- the first hrtimer on this CPU is removed and hrtimer_force_reprogram() is
not executed
- CPU goes idle (next timer is calculated and hrtimers are taken into
account)
- a hrtimer is enqueued remote on the idle CPU: hrtimer_check_target()
compares expiry value and hrtimer_cpu_base.expires_next. The expiry value
is after expires_next, so the hrtimer is enqueued. This timer will fire
late, if it expires before the effective first hrtimer on this CPU and
the comparison was with an outdated expires_next value.
To prevent this scenario, make hrtimer_force_reprogram() unconditional
except the effective reprogramming part, which gets eliminated by the
compiler in the CONFIG_HIGH_RES_TIMERS=n case.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-20-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
hrtimer_force_reprogram() needs to be available unconditionally for softirq
based hrtimers. Move the function and all required struct members out of
the CONFIG_HIGH_RES_TIMERS #ifdef.
There is no functional change because hrtimer_force_reprogram() is only
invoked when hrtimer_cpu_base.hres_active is true and
CONFIG_HIGH_RES_TIMERS=y.
Making it unconditional increases the text size for the
CONFIG_HIGH_RES_TIMERS=n case slightly, but avoids replication of that code
for the upcoming softirq based hrtimers support. Most of the code gets
eliminated in the CONFIG_HIGH_RES_TIMERS=n case by the compiler.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-19-anna-maria@linutronix.de
[ Made it build on !CONFIG_HIGH_RES_TIMERS ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
hrtimer_reprogram() needs to be available unconditionally for softirq based
hrtimers. Move the function and all required struct members out of the
CONFIG_HIGH_RES_TIMERS #ifdef.
There is no functional change because hrtimer_reprogram() is only invoked
when hrtimer_cpu_base.hres_active is true. Making it unconditional
increases the text size for the CONFIG_HIGH_RES_TIMERS=n case, but avoids
replication of that code for the upcoming softirq based hrtimers support.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-18-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
hrtimer_cpu_base.next_timer stores the pointer to the next expiring timer
in a CPU base.
This pointer cannot be dereferenced and is solely used to check whether a
hrtimer which is removed is the hrtimer which is the first to expire in the
CPU base. If this is the case, then the timer hardware needs to be
reprogrammed to avoid an extra interrupt for nothing.
Again, this is conditional functionality, but there is no compelling reason
to make this conditional. As a preparation, hrtimer_cpu_base.next_timer
needs to be available unconditonally.
Aside of that the upcoming support for softirq based hrtimers requires access
to this pointer unconditionally as well, so our motivation is not entirely
simplicity based.
Make the update of hrtimer_cpu_base.next_timer unconditional and remove the
#ifdef cruft. The impact on CONFIG_HIGH_RES_TIMERS=n && CONFIG_NOHZ=n is
marginal as it's just a store on an already dirtied cacheline.
No functional change.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-17-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
hrtimer_cpu_base.expires_next is used to cache the next event armed in the
timer hardware. The value is used to check whether an hrtimer can be
enqueued remotely. If the new hrtimer is expiring before expires_next, then
remote enqueue is not possible as the remote hrtimer hardware cannot be
accessed for reprogramming to an earlier expiry time.
The remote enqueue check is currently conditional on
CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active. There is no
compelling reason to make this conditional.
Move hrtimer_cpu_base.expires_next out of the CONFIG_HIGH_RES_TIMERS=y
guarded area and remove the conditionals in hrtimer_check_target().
The check is currently a NOOP for the CONFIG_HIGH_RES_TIMERS=n and the
!hrtimer_cpu_base.hres_active case because in these cases nothing updates
hrtimer_cpu_base.expires_next yet. This will be changed with later patches
which further reduce the #ifdef zoo in this code.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-16-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
__hrtimer_hres_active() is now available unconditionally, so replace open
coded direct accesses to hrtimer_cpu_base.hres_active.
No functional change.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-15-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The hrtimer_cpu_base::hres_active_member field depends on CONFIG_HIGH_RES_TIMERS=y
currently, and all related functions to this member are conditional as well.
To simplify the code make it unconditional and set it to zero during initialization.
(This will also help with the upcoming softirq based hrtimers code.)
The conditional code sections can be avoided by adding IS_ENABLED(HIGHRES)
conditionals into common functions, which ensures dead code elimination.
There is no functional change.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-14-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The pointer to the currently running timer is stored in hrtimer_cpu_base
before the base lock is dropped and the callback is invoked.
This results in two levels of indirections and the upcoming support for
softirq based hrtimer requires splitting the "running" storage into soft
and hard IRQ context expiry.
Storing both in the cpu base would require conditionals in all code paths
accessing that information.
It's possible to have a per clock base sequence count and running pointer
without changing the semantics of the related mechanisms because the timer
base pointer cannot be changed while a timer is running the callback.
Unfortunately this makes cpu_clock base larger than 32 bytes on 32-bit
kernels. Instead of having huge gaps due to alignment, remove the alignment
and let the compiler pack CPU base for 32-bit kernels. The resulting cache access
patterns are fortunately not really different from the current
behaviour. On 64-bit kernels the 64-byte alignment stays and the behaviour is
unchanged. This was determined by analyzing the resulting layout and
looking at the number of cache lines involved for the frequently used
clocks.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-12-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Looping over all clock bases to find active bits is suboptimal if not all
bases are active.
Avoid this by converting it to a __ffs() evaluation. The functionallity is
outsourced into its own function and is called via a macro as suggested by
Peter Zijlstra.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-11-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The 'hrtimer_start' tracepoint lacks the mode information. The mode is
important because consecutive starts can switch from ABS to REL or from
PINNED to non PINNED.
Append the mode field.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-10-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The POSIX specification defines that relative CLOCK_REALTIME timers are not
affected by clock modifications. Those timers have to use CLOCK_MONOTONIC
to ensure POSIX compliance.
The introduction of the additional HRTIMER_MODE_PINNED mode broke this
requirement for pinned timers.
There is no user space visible impact because user space timers are not
using pinned mode, but for consistency reasons this needs to be fixed.
Check whether the mode has the HRTIMER_MODE_REL bit set instead of
comparing with HRTIMER_MODE_ABS.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Fixes: 597d027573 ("timers: Framework for identifying pinned timers")
Link: http://lkml.kernel.org/r/20171221104205.7269-7-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The hrtimer_start[_range_ns]() functions start a timer reliably on this CPU only
when HRTIMER_MODE_PINNED is set.
Furthermore the HRTIMER_MODE_PINNED mode is not considered when a hrtimer is initialized.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-6-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
schedule_hrtimeout_range_clock() uses an 'int clock' parameter for the
clock ID, instead of the customary predefined "clockid_t" type.
In hrtimer coding style the canonical variable name for the clock ID is
'clock_id', therefore change the name of the parameter here as well
to make it all consistent.
While at it, clean up the description for the 'clock_id' and 'mode'
function parameters. The clock modes and the clock IDs are not
restricted as the comment suggests.
Fix the mode description as well for the callers of schedule_hrtimeout_range_clock().
No functional changes intended.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-5-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The protection of a hrtimer which runs its callback against migration to a
different CPU has nothing to do with hard interrupt context.
The protection against migration of a hrtimer running the expiry callback
is the pointer in the cpu_base which holds a pointer to the currently
running timer. This pointer is evaluated in the code which potentially
switches the timer base and makes sure it's kept on the CPU on which the
callback is running.
Reported-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-3-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The hrtimer_cpu_base::migration_enable and ::nohz_active fields
were originally introduced to avoid accessing global variables
for these decisions.
Still that results in a (cache hot) load and conditional branch,
which can be avoided by using static keys.
Implement it with static keys and optimize for the most critical
case of high performance networking which tends to disable the
timer migration functionality.
No change in functionality.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1801142327490.2371@nanos
Link: https://lkml.kernel.org/r/20171221104205.7269-2-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use lockdep to check that IRQs are enabled or disabled as expected. This
way the sanity check only shows overhead when concurrency correctness
debug code is enabled.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David S . Miller <davem@davemloft.net>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1509980490-4285-6-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Usage of these apis and their compat versions makes
the syscalls: clock_nanosleep and nanosleep and
their compat implementations simpler.
This is a preparatory patch to isolate data conversions to
struct timespec64 at userspace boundaries. This helps contain
the changes needed to transition to new y2038 safe types.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
No nanosleep implementation modifies the rqtp argument. Mark is const.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Turn restart_block.nanosleep.{rmtp,compat_rmtp} into a tagged union (kind =
1 -> native, kind = 2 -> compat, kind = 0 -> nothing) and make the places
doing actual copyout handle compat as well as native (that will become a
helper in the next commit). Result: compat wrappers, messing with
reassignments, etc. are gone.
[ tglx: Folded in a variant of Peter Zijlstras enum patch ]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170607084241.28657-6-viro@ZenIV.linux.org.uk
The hrtimer nanosleep() implementation can be simplified by moving the copy
out of the remaining time to do_nanosleep() which is shared between the
real nanosleep function and the restart function.
The pointer to the timespec64 which is updated is already stored in the
restart block at the call site, so the seperate handling of nanosleep and
restart function can be avoided.
[ tglx: Added changelog ]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170607084241.28657-4-viro@ZenIV.linux.org.uk
Store the pointer to the timespec which gets updated with the remaining
time in the restart block and remove the function argument.
[ tglx: Added changelog ]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170607084241.28657-3-viro@ZenIV.linux.org.uk
struct timespec is not y2038 safe on 32 bit machines. Replace uses of
struct timespec with struct timespec64 in the kernel.
The syscall interfaces themselves will be changed in a separate series.
Note that the restart_block parameter for nanosleep has also been left
unchanged and will be part of syscall series noted above.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: y2038@lists.linaro.org
Cc: john.stultz@linaro.org
Cc: arnd@arndb.de
Link: http://lkml.kernel.org/r/1490555058-4603-8-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch fix spelling typos found in
Documentation/output/xml/driver-api/basics.xml.
It is because the xml file was generated from comments in source,
so I had to fix the comments.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This function was removed in commit c6eb3f70d4 (hrtimer: Get rid of
hrtimer softirq, 2015-04-14) but the prototype wasn't ever deleted.
Delete it now.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Link: http://lkml.kernel.org/r/20170317010814.2591-1-sboyd@codeaurora.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/debug.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/nohz.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/nohz.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fix up affected files that include this signal functionality via sched.h.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
commit 82e88ff1ea ("hrtimer: Revert CLOCK_MONOTONIC_RAW support") removed
unfortunately a sanity check in the hrtimer code which was part of that
MONOTONIC_RAW patch series.
It would have caught the bogus usage of CLOCK_MONOTONIC_RAW in the wireless
code. So bring it back.
It is way too easy to take any random clockid and feed it to the hrtimer
subsystem. At best, it gets mapped to a monotonic base, but it would be
better to just catch illegal values as early as possible.
Detect invalid clockids, map them to CLOCK_MONOTONIC and emit a warning.
[ tglx: Replaced the BUG by a WARN and gracefully map to CLOCK_MONOTONIC ]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Tomasz Nowicki <tn@semihalf.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Link: http://lkml.kernel.org/r/1452879670-16133-3-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Currently CONFIG_TIMER_STATS exposes process information across namespaces:
kernel/time/timer_list.c print_timer():
SEQ_printf(m, ", %s/%d", tmp, timer->start_pid);
/proc/timer_list:
#11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570
Given that the tracer can give the same information, this patch entirely
removes CONFIG_TIMER_STATS.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: linux-doc@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Xing Gao <xgao01@email.wm.edu>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jessica Frazelle <me@jessfraz.com>
Cc: kernel-hardening@lists.openwall.com
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-api@vger.kernel.org
Cc: Arjan van de Ven <arjan@linux.intel.com>
Link: http://lkml.kernel.org/r/20170208192659.GA32582@beast
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ktime_set(S,N) was required for the timespec storage type and is still
useful for situations where a Seconds and Nanoseconds part of a time value
needs to be converted. For anything where the Seconds argument is 0, this
is pointless and can be replaced with a simple assignment.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.
Get rid of the union and just keep ktime_t as simple typedef of type s64.
The conversion was done with coccinelle and some manual mopping up.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)
to do the replacement at the end of the merge window.
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The documentation for schedule_timeout(), schedule_hrtimeout(), and
schedule_hrtimeout_range() all claim that the routines couldn't possibly
return early if the task state was TASK_UNINTERRUPTIBLE. This is simply
not true since wake_up_process() will cause those routines to exit early.
We cannot make schedule_[hr]timeout() loop until the timeout expires if the
task state is uninterruptible because we have users which rely on the
existing and designed behaviour.
Make the documentation match the (correct) implementation.
schedule_hrtimeout() returns -EINTR even when a uninterruptible task was
woken up. This might look strange, but making the return code depend on the
state is too much of an effort as it would affect all the call sites. There
is no value in doing so, but we spell it out clearly in the documentation.
Suggested-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: huangtao@rock-chips.com
Cc: heiko@sntech.de
Cc: broonie@kernel.org
Cc: briannorris@chromium.org
Cc: Andreas Mohr <andi@lisas.de>
Cc: linux-rockchip@lists.infradead.org
Cc: tony.xie@rock-chips.com
Cc: John Stultz <john.stultz@linaro.org>
Cc: linux@roeck-us.net
Cc: tskd08@gmail.com
Link: http://lkml.kernel.org/r/1477065531-30342-2-git-send-email-dianders@chromium.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Split out the clockevents callbacks instead of piggybacking them on
hrtimers.
This gets rid of a POST_DEAD user. See commit:
54e88fad22 ("sched: Make sure timers have migrated before killing the migration_thread")
We just move the callback state to the proper place in the state machine.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153337.485419196@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Only need CONFIG_NO_HZ_COMMON as this block is already in a
CONFIG_SMP block.
Signed-off-by: Pratyush Patel <pratyushpatel.1995@gmail.com>
Link: http://lkml.kernel.org/r/20160301172849.GA18152@cyborg
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
hrtimer_init_on_stack() needs a matching call to
destroy_hrtimer_on_stack(), so both need to be exported.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When activating a static object we need make sure that the object is
tracked in the object tracker. If it is a non-static object then the
activation is illegal.
In previous implementation, each subsystem need take care of this in
their fixup callbacks. Actually we can put it into debugobjects core.
Thus we can save duplicated code, and have *pure* fixup callbacks.
To achieve this, a new callback "is_static_object" is introduced to let
the type specific code decide whether a object is static or not. If
yes, we take it into object tracker, otherwise give warning and invoke
fixup callback.
This change has paassed debugobjects selftest, and I also do some test
with all debugobjects supports enabled.
At last, I have a concern about the fixups that can it change the object
which is in incorrect state on fixup? Because the 'addr' may not point
to any valid object if a non-static object is not tracked. Then Change
such object can overwrite someone's memory and cause unexpected
behaviour. For example, the timer_fixup_activate bind timer to function
stub_timer.
Link: http://lkml.kernel.org/r/1462576157-14539-1-git-send-email-changbin.du@intel.com
[changbin.du@intel.com: improve code comments where invoke the new is_static_object callback]
Link: http://lkml.kernel.org/r/1462777431-8171-1-git-send-email-changbin.du@intel.com
Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update the return type to use bool instead of int, corresponding to
cheange (debugobjects: make fixup functions return bool instead of int).
Signed-off-by: Du, Changbin <changbin.du@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Triplett <josh@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This changes several users of manual "on"/"off" parsing to use
strtobool.
Some side-effects:
- these uses will now parse y/n/1/0 meaningfully too
- the early_param uses will now bubble up parse errors
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Amitkumar Karwar <akarwar@marvell.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Perches <joe@perches.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nishant Sarmukadam <nishants@marvell.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Steve French <sfrench@samba.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset introduces a /proc/<pid>/timerslack_ns interface which
would allow controlling processes to be able to set the timerslack value
on other processes in order to save power by avoiding wakeups (Something
Android currently does via out-of-tree patches).
The first patch tries to fix the internal timer_slack_ns usage which was
defined as a long, which limits the slack range to ~4 seconds on 32bit
systems. It converts it to a u64, which provides the same basically
unlimited slack (500 years) on both 32bit and 64bit machines.
The second patch introduces the /proc/<pid>/timerslack_ns interface
which allows the full 64bit slack range for a task to be read or set on
both 32bit and 64bit machines.
With these two patches, on a 32bit machine, after setting the slack on
bash to 10 seconds:
$ time sleep 1
real 0m10.747s
user 0m0.001s
sys 0m0.005s
The first patch is a little ugly, since I had to chase the slack delta
arguments through a number of functions converting them to u64s. Let me
know if it makes sense to break that up more or not.
Other than that things are fairly straightforward.
This patch (of 2):
The timer_slack_ns value in the task struct is currently a unsigned
long. This means that on 32bit applications, the maximum slack is just
over 4 seconds. However, on 64bit machines, its much much larger (~500
years).
This disparity could make application development a little (as well as
the default_slack) to a u64. This means both 32bit and 64bit systems
have the same effective internal slack range.
Now the existing ABI via PR_GET_TIMERSLACK and PR_SET_TIMERSLACK specify
the interface as a unsigned long, so we preserve that limitation on
32bit systems, where SET_TIMERSLACK can only set the slack to a unsigned
long value, and GET_TIMERSLACK will return ULONG_MAX if the slack is
actually larger then what can be stored by an unsigned long.
This patch also modifies hrtimer functions which specified the slack
delta as a unsigned long.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oren Laadan <orenl@cellrox.com>
Cc: Ruchi Kandoi <kandoiruchi@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If CONFIG_TIME_LOW_RES is enabled we add a jiffie to the relative timeout to
prevent short sleeps, but we do not account for that in interfaces which
retrieve the remaining time.
Helge observed that timerfd can return a remaining time larger than the
relative timeout. That's not expected and breaks userland test programs.
Store the information that the timer was armed relative and provide functions
to adjust the remaining time. To avoid bloating the hrtimer struct make state
a u8, which as a bonus results in better code on x86 at least.
Reported-and-tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: linux-m68k@lists.linux-m68k.org
Cc: dhowells@redhat.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20160114164159.273328486@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tianhong Ding <dingtianhong@huawei.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Xinwei Hu <huxinwei@huawei.com>
Cc: Xunlei Pang <pang.xunlei@linaro.org>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/r/1440484973-13892-1-git-send-email-thunder.leizhen@huawei.com
[ Fixed yet another typo in one of the sentences fixed. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 75e3b37d05 ("hrtimer: Drop return code of hrtimer_switch_to_hres()")
drops the return code of hrtimer_switch_to_hres(). While doing so, it also
drops the return statement itself on failure. This may cause a system hang.
Seen when running arm:multi_v7_defconfig in qemu with devicetree file
vexpress-v2p-ca9.
Fixes: 75e3b37d05 ("hrtimer: Drop return code of hrtimer_switch_to_hres()")
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: http://lkml.kernel.org/r/1440231047-16256-1-git-send-email-linux@roeck-us.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The variable called "this_base" is confusing because its name suggests
it's of "struct hrtimer_clock_base" type, along with "base" and "new_base"
which doesn't help understanding this complicated function.
Make its name clearer and fix the misleading comment while at it.
[ tglx: Fixed the comment for real ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1439907509-9553-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull timer updates from Thomas Gleixner:
"A rather largish update for everything time and timer related:
- Cache footprint optimizations for both hrtimers and timer wheel
- Lower the NOHZ impact on systems which have NOHZ or timer migration
disabled at runtime.
- Optimize run time overhead of hrtimer interrupt by making the clock
offset updates smarter
- hrtimer cleanups and removal of restrictions to tackle some
problems in sched/perf
- Some more leap second tweaks
- Another round of changes addressing the 2038 problem
- First step to change the internals of clock event devices by
introducing the necessary infrastructure
- Allow constant folding for usecs/msecs_to_jiffies()
- The usual pile of clockevent/clocksource driver updates
The hrtimer changes contain updates to sched, perf and x86 as they
depend on them plus changes all over the tree to cleanup API changes
and redundant code, which got copied all over the place. The y2038
changes touch s390 to remove the last non 2038 safe code related to
boot/persistant clock"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
clocksource: Increase dependencies of timer-stm32 to limit build wreckage
timer: Minimize nohz off overhead
timer: Reduce timer migration overhead if disabled
timer: Stats: Simplify the flags handling
timer: Replace timer base by a cpu index
timer: Use hlist for the timer wheel hash buckets
timer: Remove FIFO "guarantee"
timers: Sanitize catchup_timer_jiffies() usage
hrtimer: Allow hrtimer::function() to free the timer
seqcount: Introduce raw_write_seqcount_barrier()
seqcount: Rename write_seqcount_barrier()
hrtimer: Fix hrtimer_is_queued() hole
hrtimer: Remove HRTIMER_STATE_MIGRATE
selftest: Timers: Avoid signal deadlock in leap-a-day
timekeeping: Copy the shadow-timekeeper over the real timekeeper last
clockevents: Check state instead of mode in suspend/resume path
selftests: timers: Add leap-second timer edge testing to leap-a-day.c
ntp: Do leapsecond adjustment in adjtimex read path
time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge
ntp: Introduce and use SECS_PER_DAY macro instead of 86400
...
If nohz is disabled on the kernel command line the [hr]timer code
still calls wake_up_nohz_cpu() and tick_nohz_full_cpu(), a pretty
pointless exercise. Cache nohz_active in [hr]timer per cpu bases and
avoid the overhead.
Before:
48.10% hog [.] main
15.25% [kernel] [k] _raw_spin_lock_irqsave
9.76% [kernel] [k] _raw_spin_unlock_irqrestore
6.50% [kernel] [k] mod_timer
6.44% [kernel] [k] lock_timer_base.isra.38
3.87% [kernel] [k] detach_if_pending
3.80% [kernel] [k] del_timer
2.67% [kernel] [k] internal_add_timer
1.33% [kernel] [k] __internal_add_timer
0.73% [kernel] [k] timerfn
0.54% [kernel] [k] wake_up_nohz_cpu
After:
48.73% hog [.] main
15.36% [kernel] [k] _raw_spin_lock_irqsave
9.77% [kernel] [k] _raw_spin_unlock_irqrestore
6.61% [kernel] [k] lock_timer_base.isra.38
6.42% [kernel] [k] mod_timer
3.90% [kernel] [k] detach_if_pending
3.76% [kernel] [k] del_timer
2.41% [kernel] [k] internal_add_timer
1.39% [kernel] [k] __internal_add_timer
0.76% [kernel] [k] timerfn
We probably should have a cached value for nohz full in the per cpu
bases as well to avoid the cpumask check. The base cache line is hot
already, the cpumask not necessarily.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.207378134@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Eric reported that the timer_migration sysctl is not really nice
performance wise as it needs to check at every timer insertion whether
the feature is enabled or not. Further the check does not live in the
timer code, so we have an extra function call which checks an extra
cache line to figure out that it is disabled.
We can do better and store that information in the per cpu (hr)timer
bases. I pondered to use a static key, but that's a nightmare to
update from the nohz code and the timer base cache line is hot anyway
when we select a timer base.
The old logic enabled the timer migration unconditionally if
CONFIG_NO_HZ was set even if nohz was disabled on the kernel command
line.
With this modification, we start off with migration disabled. The user
visible sysctl is still set to enabled. If the kernel switches to NOHZ
migration is enabled, if the user did not disable it via the sysctl
prior to the switch. If nohz=off is on the kernel command line,
migration stays disabled no matter what.
Before:
47.76% hog [.] main
14.84% [kernel] [k] _raw_spin_lock_irqsave
9.55% [kernel] [k] _raw_spin_unlock_irqrestore
6.71% [kernel] [k] mod_timer
6.24% [kernel] [k] lock_timer_base.isra.38
3.76% [kernel] [k] detach_if_pending
3.71% [kernel] [k] del_timer
2.50% [kernel] [k] internal_add_timer
1.51% [kernel] [k] get_nohz_timer_target
1.28% [kernel] [k] __internal_add_timer
0.78% [kernel] [k] timerfn
0.48% [kernel] [k] wake_up_nohz_cpu
After:
48.10% hog [.] main
15.25% [kernel] [k] _raw_spin_lock_irqsave
9.76% [kernel] [k] _raw_spin_unlock_irqrestore
6.50% [kernel] [k] mod_timer
6.44% [kernel] [k] lock_timer_base.isra.38
3.87% [kernel] [k] detach_if_pending
3.80% [kernel] [k] del_timer
2.67% [kernel] [k] internal_add_timer
1.33% [kernel] [k] __internal_add_timer
0.73% [kernel] [k] timerfn
0.54% [kernel] [k] wake_up_nohz_cpu
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Wenbo Wang <wenbo.wang@memblaze.com>
Link: http://lkml.kernel.org/r/20150526224512.127050787@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Currently an hrtimer callback function cannot free its own timer
because __run_hrtimer() still needs to clear HRTIMER_STATE_CALLBACK
after it. Freeing the timer would result in a clear use-after-free.
Solve this by using a scheme similar to regular timers; track the
current running timer in hrtimer_clock_base::running.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: wanpeng.li@linux.intel.com
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150611124743.471563047@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
To avoid getting spurious interrupts on a tickless CPU, clockevent
device can now be stopped by switching to ONESHOT_STOPPED state.
The natural place for handling this transition is tick_program_event().
On 'expires == KTIME_MAX', we skip programming the event and so we need
to fix such call sites as well, to always call tick_program_event()
irrespective of the expires value.
Once the clockevent device is required again, check if it was earlier
put into ONESHOT_STOPPED state. If yes, switch its state to ONESHOT
before programming its event.
To make sure we haven't missed any corner case, add a WARN() for the
case where we try to reprogram clockevent device while we aren't
configured in ONESHOT_STOPPED state.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/5146b07be7f0bc497e0ebae036590ec2fa73e540.1428031396.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It was noted that the 32bit implementation of ktime_divns()
was doing unsigned division and didn't properly handle
negative values.
And when a ktime helper was changed to utilize
ktime_divns, it caused a regression on some IR blasters.
See the following bugzilla for details:
https://bugzilla.redhat.com/show_bug.cgi?id=1200353
This patch fixes the problem in ktime_divns by checking
and preserving the sign bit, and then reapplying it if
appropriate after the division, it also changes the return
type to a s64 to make it more obvious this is expected.
Nicolas also pointed out that negative dividers would
cause infinite loops on 32bit systems, negative dividers
is unlikely for users of this function, but out of caution
this patch adds checks for negative dividers for both
32-bit (BUG_ON) and 64-bit(WARN_ON) versions to make sure
no such use cases creep in.
[ tglx: Hand an u64 to do_div() to avoid the compiler warning ]
Fixes: 166afb6451 'ktime: Sanitize ktime_to_us/ms conversion'
Reported-and-tested-by: Trevor Cordes <trevor@tecnopolis.ca>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1431118043-23452-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Because we drop cpu_base->lock around calling hrtimer::function, it is
possible for hrtimer_start() to come in between and enqueue the timer.
If hrtimer::function then returns HRTIMER_RESTART we'll hit the BUG_ON
because HRTIMER_STATE_ENQUEUED will be set.
Since the above is a perfectly valid scenario, remove the BUG_ON and
make the enqueue_hrtimer() call conditional on the timer not being
enqueued already.
NOTE: in that concurrent scenario its entirely common for both sites
to want to modify the hrtimer, since hrtimers don't provide
serialization themselves be sure to provide some such that the
hrtimer::function and the hrtimer_start() caller don't both try and
fudge the expiration state at the same time.
To that effect, add a WARN when someone tries to forward an already
enqueued timer, the most common way to change the expiry of self
restarting timers. Ideally we'd put the WARN in everything modifying
the expiry but most of that is inlines and we don't need the bloat.
Fixes: 2d44ae4d71 ("hrtimer: clean up cpu->base locking tricks")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Paul Turner <pjt@google.com>
Link: http://lkml.kernel.org/r/20150415113105.GT5029@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We can do a lockless check for hrtimer_active before actually taking
the lock in hrtimer[_try_to]_cancel. This is useful for hotpath users
like nanosleep as they avoid the lock dance when the timer has
expired.
This is safe because active is true when the timer is enqueued or the
callback is running. Taking the hrtimer base lock does not protect
against concurrent hrtimer_start calls, the callsite has to do the
proper serialization itself.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203503.580273114@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
No user was ever interested whether the timer was active or not when
it was started. All abusers of the return value are gone, so get rid
of it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203503.483556394@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The check for hrtimer_active() after starting the timer is
pointless. If the timer is inactive it has expired already and
therefor the task pointer is already NULL.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203502.907149271@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
No point for an extra export just to set the extra argument of
hrtimer_start_range_ns() to 0.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203502.808544539@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The evaluation of the next timer in the nohz code is based on jiffies
while all the tick internals are nano seconds based. We have also to
convert hrtimer nanoseconds to jiffies in the !highres case. That's
just wrong and introduces interesting corner cases.
Turn it around and convert the next timer wheel timer expiry and the
rcu event to clock monotonic and base all calculations on
nanoseconds. That identifies the case where no timer is pending
clearly with an absolute expiry value of KTIME_MAX.
Makes the code more readable and gets rid of the jiffies magic in the
nohz code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Link: http://lkml.kernel.org/r/20150414203502.184198593@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
hrtimer softirq is a leftover from the initial implementation and
serves only the purpose to handle the enqueueing of already expired
timers in the high resolution timer mode. We discussed whether we
change the return value and force all start sites to handle that the
timer is already expired, but that would be a Herculean task and I'm
not sure whether its a good idea to enforce that handling on
everyone.
A simpler solution is to enforce a timer interrupt instead of raising
and scheduling a softirq. Just use the existing infrastructure to do
so and remove all the softirq leftovers.
The HRTIMER softirq enum is now unused, but kept around because trace
parsers rely on the existing numbering.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203501.840834708@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
__remove_hrtimer() needs to evaluate the expiry time to figure out
whether the timer which is removed is eventually the first expiring
timer on the cpu. Keep a pointer to it, which is lazily updated, so we
can avoid the evaluation dance and retrieve the information from there.
Generates slightly better code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203501.752838019@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use the return value instead of reevaluating the information.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203501.658152945@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The active_bases field is guaranteed to be in sync with the timerqueue
of the corresponding clock base. So we can use it for iterating over
the clock bases. This allows to break out early if no more active
clock bases are available and avoids touching the cache lines of
inactive clock bases.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203501.322887675@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
On every tick/hrtimer interrupt we update the offset variables of the
clock bases. That's silly because these offsets change very seldom.
Add a sequence counter to the time keeping code which keeps track of
the offset updates (clock_was_set()). Have a sequence cache in the
hrtimer cpu bases to evaluate whether the offsets must be updated or
not. This allows us later to avoid pointless cacheline pollution.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20150414203501.132820245@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
The softirq time field in the clock bases is an optimization from the
early days of hrtimers. It provides a coarse "jiffies" like time
mostly for self rearming timers.
But that comes with a price:
- Larger code size
- Extra storage space
- Duplicated functions with really small differences
The benefit of this is optimization is marginal for contemporary
systems.
Consolidate everything on the high resolution timer
implementation. This makes further optimizations possible.
Text size reduction:
x8664 -95, i386 -356, ARM -148, ARM64 -40, power64 -16
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203501.039977424@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
No point in having usigned long for /proc/timer_list statistics. Make
them unsigned int.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203500.959773467@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The resolution is directly accessible now. So its simpler just to fill
in the values of the timespec and be done with it.
Text size reduction (combined with "hrtimer: Get rid of the resolution
field in hrtimer_clock_base"):
x8664 -61, i386 -221, ARM -60, power64 -48
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203500.879888080@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The field has no value because all clock bases have the same
resolution. The resolution only changes when we switch to high
resolution timer mode. We can evaluate that from a single static
variable as well. In the !HIGHRES case its simply a constant.
Export the variable, so we can simplify the usage sites.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203500.645454122@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>