linux/kernel
Mel Gorman e832bf48c8 audit: Reduce overhead using a coarse clock
Commit 2115bb250f ("audit: Use timespec64 to represent audit timestamps")
noted that audit timestamps were not y2038 safe and used a 64-bit
timestamp. In itself, this makes sense but the conversion was from
CURRENT_TIME to ktime_get_real_ts64() which is a heavier call to record
an accurate timestamp which is required in some, but not all, cases. The
impact is that when auditd is running without any rules that all syscalls
have higher overhead. This is visible in the sysbench-thread benchmark as
a 11.5% performance hit. That benchmark is dumb as rocks but it's also
visible in redis as an 8-10% hit on all operations which is of greater
concern. It is somewhat stupid of audit to track syscalls without any
rules related to syscalls but that is how it behaves.

The overhead can be directly measured with perf comparing 4.9 with 4.12

4.9
     7.76%  sysbench         [kernel.vmlinux]    [k] __schedule
     7.62%  sysbench         [kernel.vmlinux]    [k] _raw_spin_lock
     7.37%  sysbench         libpthread-2.22.so  [.] __lll_lock_elision
     7.29%  sysbench         [kernel.vmlinux]    [.] syscall_return_via_sysret
     6.59%  sysbench         [kernel.vmlinux]    [k] native_sched_clock
     5.21%  sysbench         libc-2.22.so        [.] __sched_yield
     4.38%  sysbench         [kernel.vmlinux]    [k] entry_SYSCALL_64
     4.28%  sysbench         [kernel.vmlinux]    [k] do_syscall_64
     3.49%  sysbench         libpthread-2.22.so  [.] __lll_unlock_elision
     3.13%  sysbench         [kernel.vmlinux]    [k] __audit_syscall_exit
     2.87%  sysbench         [kernel.vmlinux]    [k] update_curr
     2.73%  sysbench         [kernel.vmlinux]    [k] pick_next_task_fair
     2.31%  sysbench         [kernel.vmlinux]    [k] syscall_trace_enter
     2.20%  sysbench         [kernel.vmlinux]    [k] __audit_syscall_entry
.....
     0.00%  swapper          [kernel.vmlinux]    [k] read_tsc

4.12
     7.84%  sysbench         [kernel.vmlinux]    [k] __schedule
     7.05%  sysbench         [kernel.vmlinux]    [k] _raw_spin_lock
     6.57%  sysbench         libpthread-2.22.so  [.] __lll_lock_elision
     6.50%  sysbench         [kernel.vmlinux]    [.] syscall_return_via_sysret
     5.95%  sysbench         [kernel.vmlinux]    [k] read_tsc
     5.71%  sysbench         [kernel.vmlinux]    [k] native_sched_clock
     4.78%  sysbench         libc-2.22.so        [.] __sched_yield
     4.30%  sysbench         [kernel.vmlinux]    [k] entry_SYSCALL_64
     3.94%  sysbench         [kernel.vmlinux]    [k] do_syscall_64
     3.37%  sysbench         libpthread-2.22.so  [.] __lll_unlock_elision
     3.32%  sysbench         [kernel.vmlinux]    [k] __audit_syscall_exit
     2.91%  sysbench         [kernel.vmlinux]    [k] __getnstimeofday64

Note the additional overhead from read_tsc which goes from 0% to 5.95%.
This is on a single-socket E3-1230 but similar overheads have been measured
on an older machine which the patch also eliminates.

The patch in question has no explanation as to why a fully-accurate timestamp
is required and is likely an oversight.  Using a coarser, but monotically
increasing, timestamp the overhead can be eliminated.  While it can be
worked around by configuring or disabling audit, it's tricky enough to
detect that a kernel fix is justified. With this patch, we see the following;

sysbenchthread
                              4.9.0                 4.12.0                 4.12.0
                            vanilla                vanilla            coarse-v1r1
Amean     1         1.49 (   0.00%)        1.66 ( -11.42%)        1.51 (  -1.34%)
Amean     3         1.48 (   0.00%)        1.65 ( -11.45%)        1.50 (  -0.96%)
Amean     5         1.49 (   0.00%)        1.67 ( -12.31%)        1.51 (  -1.83%)
Amean     7         1.49 (   0.00%)        1.66 ( -11.72%)        1.50 (  -0.67%)
Amean     12        1.48 (   0.00%)        1.65 ( -11.57%)        1.52 (  -2.89%)
Amean     16        1.49 (   0.00%)        1.65 ( -11.13%)        1.51 (  -1.73%)

The benchmark is reporting the time required for different thread counts to
lock/unlock a private mutex which, while dense, demonstrates the syscall
overhead. This is showing that 4.12 took a 11-12% hit but the overhead is
almost eliminated by the patch. While the variance is not reported here,
it's well within the noise with the patch applied.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-09-05 09:46:54 -04:00
..
bpf bpf: fix map value attribute for hash of maps 2017-08-22 16:32:02 -07:00
cgroup Merge branch 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2017-08-29 11:16:21 -07:00
configs config: android-base: disable CONFIG_NFSD and CONFIG_NFS_FS 2017-06-09 11:47:38 +02:00
debug sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h> 2017-03-02 08:42:34 +01:00
events Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-09-03 09:23:23 -07:00
gcov gcov: support GCC 7.1 2017-05-12 15:57:15 -07:00
irq genirq/ipi: Fixup checks against nr_cpu_ids 2017-08-20 10:49:05 +02:00
livepatch livepatch: Fix stacking of patches with respect to RCU 2017-06-20 10:42:19 +02:00
locking Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-07-21 11:11:23 -07:00
power mm: fix global NR_SLAB_.*CLAIMABLE counter reads 2017-08-10 15:54:06 -07:00
printk Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk 2017-07-05 11:11:26 -07:00
rcu rcu: Remove RCU CPU stall warnings from Tiny RCU 2017-06-08 18:52:45 -07:00
sched Minor page waitqueue cleanups 2017-08-27 13:55:12 -07:00
time time: Fix ktime_get_raw() incorrect base accumulation 2017-08-26 16:06:12 +02:00
trace perf/ftrace: Fix double traces of perf on ftrace:function 2017-08-29 13:29:29 +02:00
.gitignore
acct.c sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h> 2017-03-02 08:42:39 +01:00
async.c async: Adjust system_state checks 2017-05-23 10:01:37 +02:00
audit_fsnotify.c Merge branch 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-05-03 11:05:15 -07:00
audit_tree.c Merge branch 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-05-03 11:05:15 -07:00
audit_watch.c audit/stable-4.13 PR 20170816 2017-08-16 16:48:34 -07:00
audit.c audit: Reduce overhead using a coarse clock 2017-09-05 09:46:54 -04:00
audit.h audit: style fix 2017-06-12 18:07:43 -04:00
auditfilter.c audit: kernel generated netlink traffic should have a portid of 0 2017-05-02 10:16:05 -04:00
auditsc.c audit: Reduce overhead using a coarse clock 2017-09-05 09:46:54 -04:00
backtracetest.c
bounds.c
capability.c
compat.c Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-06 20:57:13 -07:00
configs.c
context_tracking.c
cpu_pm.c
cpu.c smp/hotplug: Replace BUG_ON and react useful 2017-07-11 22:25:44 +02:00
crash_core.c kdump: protect vmcoreinfo data under the crash memory 2017-07-12 16:26:00 -07:00
crash_dump.c
cred.c doc: ReSTify credentials.txt 2017-05-18 10:30:19 -06:00
delayacct.c sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h> 2017-03-02 08:42:39 +01:00
dma.c
elfcore.c
exec_domain.c
exit.c kernel/exit.c: avoid undefined behaviour when calling wait4() 2017-07-10 16:32:36 -07:00
extable.c lib/extable.c: use bsearch() library function in search_extable() 2017-07-10 16:32:35 -07:00
fork.c mm, uprobes: fix multiple free of ->uprobes_state.xol_area 2017-08-31 16:33:15 -07:00
freezer.c
futex_compat.c
futex.c futex: Remove unnecessary warning from get_futex_key 2017-08-09 14:00:54 -07:00
groups.c kernel/groups.c: use sort library function 2017-07-10 16:32:34 -07:00
hung_task.c kernel/hung_task.c: defer showing held locks 2017-05-08 17:15:10 -07:00
irq_work.c
jump_label.c jump_label: Reorder hotplug lock and jump_label_lock 2017-05-26 10:10:45 +02:00
kallsyms.c kernel/kallsyms.c: replace all_var with IS_ENABLED(CONFIG_KALLSYMS_ALL) 2017-07-10 16:32:34 -07:00
kcmp.c kcmp: add KCMP_EPOLL_TFD mode to compare epoll target files 2017-07-12 16:26:01 -07:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: simplify interrupt check 2017-05-08 17:15:12 -07:00
kexec_core.c kdump: protect vmcoreinfo data under the crash memory 2017-07-12 16:26:00 -07:00
kexec_file.c kexec_file: adjust declaration of kexec_purgatory 2017-07-12 16:26:02 -07:00
kexec_internal.h kexec_file: adjust declaration of kexec_purgatory 2017-07-12 16:26:02 -07:00
kexec.c kdump: protect vmcoreinfo data under the crash memory 2017-07-12 16:26:00 -07:00
kmod.c kmod: fix wait on recursive loop 2017-08-18 15:32:01 -07:00
kprobes.c kprobes: Ensure that jprobe probepoints are at function entry 2017-07-08 11:05:35 +02:00
ksysfs.c kexec: move vmcoreinfo out of the kernel's .bss section 2017-07-12 16:25:59 -07:00
kthread.c kernel/kthread.c: kthread_worker: don't hog the cpu 2017-08-31 16:33:15 -07:00
latencytop.c sched/headers: Prepare to move sched_info_on() and force_schedstat_enabled() from <linux/sched.h> to <linux/sched/stat.h> 2017-03-02 08:42:39 +01:00
Makefile kernel/watchdog: split up config options 2017-07-12 16:26:02 -07:00
membarrier.c Fix: Disable sys_membarrier when nohz_full is enabled 2017-01-23 11:32:16 -08:00
memremap.c mm, memory_hotplug: replace for_device by want_memblock in arch_add_memory 2017-07-06 16:24:32 -07:00
module_signing.c
module-internal.h
module.c Modules updates for v4.13 2017-07-12 17:22:01 -07:00
notifier.c kernel/notifier.c: simplify expression 2017-02-24 17:46:56 -08:00
nsproxy.c perf: Add PERF_RECORD_NAMESPACES to include namespaces related info 2017-03-13 15:57:41 -03:00
padata.c padata: Avoid nested calls to cpus_read_lock() in pcrypt_init_padata() 2017-05-26 10:10:37 +02:00
panic.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h> 2017-03-02 08:42:34 +01:00
params.c boot/param: Move next_arg() function to lib/cmdline.c for later reuse 2017-04-18 10:37:13 +02:00
pid_namespace.c pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes 2017-05-13 17:26:01 -05:00
pid.c pids: make task_tgid_nr_ns() safe 2017-08-21 12:47:31 -07:00
profile.c sched/headers: Prepare to move sched_info_on() and force_schedstat_enabled() from <linux/sched.h> to <linux/sched/stat.h> 2017-03-02 08:42:39 +01:00
ptrace.c ptrace: Properly initialize ptracer_cred on fork 2017-05-23 07:40:44 -05:00
range.c
reboot.c
relay.c Merge branch 'work.splice' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-05-02 11:38:06 -07:00
resource.c
seccomp.c seccomp: Switch from atomic_t to recount_t 2017-06-26 09:24:00 -07:00
signal.c signal: don't remove SIGNAL_UNKILLABLE for traced tasks. 2017-08-18 15:32:02 -07:00
smp.c smp, cpumask: Use non-atomic cpumask_{set,clear}_cpu() 2017-05-23 10:01:32 +02:00
smpboot.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h> 2017-03-02 08:42:35 +01:00
smpboot.h
softirq.c sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags() 2017-04-11 09:06:32 +02:00
stacktrace.c stacktrace/x86: add function for detecting reliable stack traces 2017-03-08 09:18:02 +01:00
stop_machine.c stop_machine: Provide stop_machine_cpuslocked() 2017-05-26 10:10:36 +02:00
sys_ni.c
sys.c fix a braino in compat_sys_getrlimit() 2017-07-12 09:15:00 -07:00
sysctl_binary.c kernel/sysctl_binary.c: check name array length in deprecated_sysctl_warning() 2017-07-12 16:26:00 -07:00
sysctl.c kernel/watchdog: split up config options 2017-07-12 16:26:02 -07:00
task_work.c
taskstats.c taskstats: add e/u/stime for TGID command 2017-05-08 17:15:12 -07:00
test_kprobes.c
torture.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> 2017-03-02 08:42:27 +01:00
tracepoint.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task.h> 2017-03-02 08:42:35 +01:00
tsacct.c sched/headers: Prepare to move cputime functionality from <linux/sched.h> into <linux/sched/cputime.h> 2017-03-02 08:42:39 +01:00
ucount.c ucount: Remove the atomicity from ucount->count 2017-03-06 15:26:37 -06:00
uid16.c sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
up.c
user_namespace.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> 2017-03-02 08:42:29 +01:00
user-return-notifier.c
user.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/user.h> 2017-03-02 08:42:29 +01:00
utsname_sysctl.c sched/headers: Remove <linux/rwsem.h> from <linux/sched.h> 2017-03-03 01:45:36 +01:00
utsname.c sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h> 2017-03-02 08:42:38 +01:00
watchdog_hld.c kernel/watchdog: Prevent false positives with turbo modes 2017-08-18 12:35:02 +02:00
watchdog.c kernel/watchdog: Prevent false positives with turbo modes 2017-08-18 12:35:02 +02:00
workqueue_internal.h
workqueue.c workqueue: Work around edge cases for calc of pool's cpumask 2017-07-28 11:05:52 -04:00