121dd9ea01
This program benchmarks concurrent epoll_wait(2) for file descriptors that are monitored with with EPOLLIN along various semantics, by a single epoll instance. Such conditions can be found when using single/combined or multiple queuing when load balancing. Each thread has a number of private, nonblocking file descriptors, referred to as fdmap. A writer thread will constantly be writing to the fdmaps of all threads, minimizing each threads's chances of epoll_wait not finding any ready read events and blocking as this is not what we want to stress. Full details in the start of the C file. Committer testing: # perf bench Usage: perf bench [<common options>] <collection> <benchmark> [<options>] # List of all available benchmark collections: sched: Scheduler and IPC benchmarks mem: Memory access benchmarks numa: NUMA scheduling and MM benchmarks futex: Futex stressing benchmarks epoll: Epoll stressing benchmarks all: All benchmarks # perf bench epoll # List of available benchmarks for collection 'epoll': wait: Benchmark epoll concurrent epoll_waits all: Run all futex benchmarks # perf bench epoll wait # Running 'epoll/wait' benchmark: Run summary [PID 19295]: 3 threads monitoring on 64 file-descriptors for 8 secs. [thread 0] fdmap: 0xdaa650 ... 0xdaa74c [ 328241 ops/sec ] [thread 1] fdmap: 0xdaa900 ... 0xdaa9fc [ 351695 ops/sec ] [thread 2] fdmap: 0xdaabb0 ... 0xdaacac [ 381423 ops/sec ] Averaged 353786 operations/sec (+- 4.35%), total secs = 8 # Committer notes: Fix the build on debian:experimental-x-mips, debian:experimental-x-mipsel and others: CC /tmp/build/perf/bench/epoll-wait.o bench/epoll-wait.c: In function 'writerfn': bench/epoll-wait.c:399:12: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'size_t' {aka 'unsigned int'} [-Werror=format=] printinfo("exiting writer-thread (total full-loops: %ld)\n", iter); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~ bench/epoll-wait.c:86:31: note: in definition of macro 'printinfo' do { if (__verbose) { printf(fmt, ## arg); fflush(stdout); } } while (0) ^~~ cc1: all warnings being treated as errors Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Jason Baron <jbaron@akamai.com> <jbaron@akamai.com> Link: http://lkml.kernel.org/r/20181106152226.20883-2-dave@stgolabs.net Link: http://lkml.kernel.org/r/20181106182349.thdkpvshkna5vd7o@linux-r8p5> [ Applied above fixup as per Davidlohr's request ] [ Use inttypes.h to print rlim_t fields, fixing the build on Alpine Linux / musl libc ] [ Check if eventfd() is available, i.e. if HAVE_EVENTFD is defined ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
217 lines
4.3 KiB
Plaintext
217 lines
4.3 KiB
Plaintext
perf-bench(1)
|
|
=============
|
|
|
|
NAME
|
|
----
|
|
perf-bench - General framework for benchmark suites
|
|
|
|
SYNOPSIS
|
|
--------
|
|
[verse]
|
|
'perf bench' [<common options>] <subsystem> <suite> [<options>]
|
|
|
|
DESCRIPTION
|
|
-----------
|
|
This 'perf bench' command is a general framework for benchmark suites.
|
|
|
|
COMMON OPTIONS
|
|
--------------
|
|
-r::
|
|
--repeat=::
|
|
Specify amount of times to repeat the run (default 10).
|
|
|
|
-f::
|
|
--format=::
|
|
Specify format style.
|
|
Current available format styles are:
|
|
|
|
'default'::
|
|
Default style. This is mainly for human reading.
|
|
---------------------
|
|
% perf bench sched pipe # with no style specified
|
|
(executing 1000000 pipe operations between two tasks)
|
|
Total time:5.855 sec
|
|
5.855061 usecs/op
|
|
170792 ops/sec
|
|
---------------------
|
|
|
|
'simple'::
|
|
This simple style is friendly for automated
|
|
processing by scripts.
|
|
---------------------
|
|
% perf bench --format=simple sched pipe # specified simple
|
|
5.988
|
|
---------------------
|
|
|
|
SUBSYSTEM
|
|
---------
|
|
|
|
'sched'::
|
|
Scheduler and IPC mechanisms.
|
|
|
|
'mem'::
|
|
Memory access performance.
|
|
|
|
'numa'::
|
|
NUMA scheduling and MM benchmarks.
|
|
|
|
'futex'::
|
|
Futex stressing benchmarks.
|
|
|
|
'epoll'::
|
|
Eventpoll (epoll) stressing benchmarks.
|
|
|
|
'all'::
|
|
All benchmark subsystems.
|
|
|
|
SUITES FOR 'sched'
|
|
~~~~~~~~~~~~~~~~~~
|
|
*messaging*::
|
|
Suite for evaluating performance of scheduler and IPC mechanisms.
|
|
Based on hackbench by Rusty Russell.
|
|
|
|
Options of *messaging*
|
|
^^^^^^^^^^^^^^^^^^^^^^
|
|
-p::
|
|
--pipe::
|
|
Use pipe() instead of socketpair()
|
|
|
|
-t::
|
|
--thread::
|
|
Be multi thread instead of multi process
|
|
|
|
-g::
|
|
--group=::
|
|
Specify number of groups
|
|
|
|
-l::
|
|
--nr_loops=::
|
|
Specify number of loops
|
|
|
|
Example of *messaging*
|
|
^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
---------------------
|
|
% perf bench sched messaging # run with default
|
|
options (20 sender and receiver processes per group)
|
|
(10 groups == 400 processes run)
|
|
|
|
Total time:0.308 sec
|
|
|
|
% perf bench sched messaging -t -g 20 # be multi-thread, with 20 groups
|
|
(20 sender and receiver threads per group)
|
|
(20 groups == 800 threads run)
|
|
|
|
Total time:0.582 sec
|
|
---------------------
|
|
|
|
*pipe*::
|
|
Suite for pipe() system call.
|
|
Based on pipe-test-1m.c by Ingo Molnar.
|
|
|
|
Options of *pipe*
|
|
^^^^^^^^^^^^^^^^^
|
|
-l::
|
|
--loop=::
|
|
Specify number of loops.
|
|
|
|
Example of *pipe*
|
|
^^^^^^^^^^^^^^^^^
|
|
|
|
---------------------
|
|
% perf bench sched pipe
|
|
(executing 1000000 pipe operations between two tasks)
|
|
|
|
Total time:8.091 sec
|
|
8.091833 usecs/op
|
|
123581 ops/sec
|
|
|
|
% perf bench sched pipe -l 1000 # loop 1000
|
|
(executing 1000 pipe operations between two tasks)
|
|
|
|
Total time:0.016 sec
|
|
16.948000 usecs/op
|
|
59004 ops/sec
|
|
---------------------
|
|
|
|
SUITES FOR 'mem'
|
|
~~~~~~~~~~~~~~~~
|
|
*memcpy*::
|
|
Suite for evaluating performance of simple memory copy in various ways.
|
|
|
|
Options of *memcpy*
|
|
^^^^^^^^^^^^^^^^^^^
|
|
-l::
|
|
--size::
|
|
Specify size of memory to copy (default: 1MB).
|
|
Available units are B, KB, MB, GB and TB (case insensitive).
|
|
|
|
-f::
|
|
--function::
|
|
Specify function to copy (default: default).
|
|
Available functions are depend on the architecture.
|
|
On x86-64, x86-64-unrolled, x86-64-movsq and x86-64-movsb are supported.
|
|
|
|
-l::
|
|
--nr_loops::
|
|
Repeat memcpy invocation this number of times.
|
|
|
|
-c::
|
|
--cycles::
|
|
Use perf's cpu-cycles event instead of gettimeofday syscall.
|
|
|
|
*memset*::
|
|
Suite for evaluating performance of simple memory set in various ways.
|
|
|
|
Options of *memset*
|
|
^^^^^^^^^^^^^^^^^^^
|
|
-l::
|
|
--size::
|
|
Specify size of memory to set (default: 1MB).
|
|
Available units are B, KB, MB, GB and TB (case insensitive).
|
|
|
|
-f::
|
|
--function::
|
|
Specify function to set (default: default).
|
|
Available functions are depend on the architecture.
|
|
On x86-64, x86-64-unrolled, x86-64-stosq and x86-64-stosb are supported.
|
|
|
|
-l::
|
|
--nr_loops::
|
|
Repeat memset invocation this number of times.
|
|
|
|
-c::
|
|
--cycles::
|
|
Use perf's cpu-cycles event instead of gettimeofday syscall.
|
|
|
|
SUITES FOR 'numa'
|
|
~~~~~~~~~~~~~~~~~
|
|
*mem*::
|
|
Suite for evaluating NUMA workloads.
|
|
|
|
SUITES FOR 'futex'
|
|
~~~~~~~~~~~~~~~~~~
|
|
*hash*::
|
|
Suite for evaluating hash tables.
|
|
|
|
*wake*::
|
|
Suite for evaluating wake calls.
|
|
|
|
*wake-parallel*::
|
|
Suite for evaluating parallel wake calls.
|
|
|
|
*requeue*::
|
|
Suite for evaluating requeue calls.
|
|
|
|
*lock-pi*::
|
|
Suite for evaluating futex lock_pi calls.
|
|
|
|
SUITES FOR 'epoll'
|
|
~~~~~~~~~~~~~~~~~~
|
|
*wait*::
|
|
Suite for evaluating concurrent epoll_wait calls.
|
|
|
|
SEE ALSO
|
|
--------
|
|
linkperf:perf[1]
|