linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-22 20:22:09 +00:00

A mirror of the official Linux kernel repository just in case

Go to file

George Spelvin b5c56e0cdd lib/list_sort: optimize number of calls to comparison function CONFIG_RETPOLINE has severely degraded indirect function call performance, so it's worth putting some effort into reducing the number of times cmp() is called. This patch avoids badly unbalanced merges on unlucky input sizes. It slightly increases the code size, but saves an average of 0.2n calls to cmp(). x86-64 code size 739 -> 803 bytes (+64) Unfortunately, there's not a lot of low-hanging fruit in a merge sort; it already performs only nlog2(n) - Kn + O(1) compares. The leading coefficient is already at the theoretical limit (log2(n!) corresponds to K=1.4427), so we're fighting over the linear term, and the best mergesort can do is K=1.2645, achieved when n is a power of 2. The differences between mergesort variants appear when n is not* a power of 2; K is a function of the fractional part of log2(n). Top-down mergesort does best of all, achieving a minimum K=1.2408, and an average (over all sizes) K=1.248. However, that requires knowing the number of entries to be sorted ahead of time, and making a full pass over the input to count it conflicts with a second performance goal, which is cache blocking. Obviously, we have to read the entire list into L1 cache at some point, and performance is best if it fits. But if it doesn't fit, each full pass over the input causes a cache miss per element, which is undesirable. While textbooks explain bottom-up mergesort as a succession of merging passes, practical implementations do merging in depth-first order: as soon as two lists of the same size are available, they are merged. This allows as many merge passes as possible to fit into L1; only the final few merges force cache misses. This cache-friendly depth-first merge order depends on us merging the beginning of the input as much as possible before we've even seen the end of the input (and thus know its size). The simple eager merge pattern causes bad performance when n is just over a power of 2. If n=1028, the final merge is between 1024- and 4-element lists, which is wasteful of comparisons. (This is actually worse on average than n=1025, because a 1204:1 merge will, on average, end after 512 compares, while 1024:4 will walk 4/5 of the list.) Because of this, bottom-up mergesort achieves K < 0.5 for such sizes, and has an average (over all sizes) K of around 1. (My experiments show K=1.01, while theory predicts K=0.965.) There are "worst-case optimal" variants of bottom-up mergesort which avoid this bad performance, but the algorithms given in the literature, such as queue-mergesort and boustrodephonic mergesort, depend on the breadth-first multi-pass structure that we are trying to avoid. This implementation is as eager as possible while ensuring that all merge passes are at worst 1:2 unbalanced. This achieves the same average K=1.207 as queue-mergesort, which is 0.2n better then bottom-up, and only 0.04n behind top-down mergesort. Specifically, defers merging two lists of size 2^k until it is known that there are 2^k additional inputs following. This ensures that the final uneven merges triggered by reaching the end of the input will be at worst 2:1. This will avoid cache misses as long as 3*2^k elements fit into the cache. (I confess to being more than a little bit proud of how clean this code turned out. It took a lot of thinking, but the resultant inner loop is very simple and efficient.) Refs: Bottom-up Mergesort: A Detailed Analysis Wolfgang Panny, Helmut Prodinger Algorithmica 14(4):340--354, October 1995 https://doi.org/10.1007/BF01294131 https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.6.5260 The cost distribution of queue-mergesort, optimal mergesorts, and power-of-two rules Wei-Mei Chen, Hsien-Kuei Hwang, Gen-Huey Chen Journal of Algorithms 30(2); Pages 423--448, February 1999 https://doi.org/10.1006/jagm.1998.0986 https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.5380 Queue-Mergesort Mordecai J. Golin, Robert Sedgewick Information Processing Letters, 48(5):253--259, 10 December 1993 https://doi.org/10.1016/0020-0190(93)90088-q https://sci-hub.tw/10.1016/0020-0190(93)90088-Q Feedback from Rasmus Villemoes <linux@rasmusvillemoes.dk>. Link: http://lkml.kernel.org/r/fd560853cc4dca0d0f02184ffa888b4c1be89abc.1552704200.git.lkml@sdf.org Signed-off-by: George Spelvin <lkml@sdf.org> Acked-by: Andrey Abramov <st5pub@yandex.ru> Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Daniel Wagner <daniel.wagner@siemens.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Don Mullis <don.mullis@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2019-05-14 19:52:49 -07:00
arch	compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING	2019-05-14 19:52:48 -07:00
block	for-5.2/block-20190507	2019-05-07 18:14:36 -07:00
certs	kexec, KEYS: Make use of platform keyring for signature verify	2019-02-04 17:34:07 -05:00
crypto	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next	2019-05-07 22:03:58 -07:00
Documentation	mm: shuffle initial free memory to improve memory-side-cache utilization	2019-05-14 19:52:48 -07:00
drivers	mtd: rawnand: vf610_nfc: add initializer to avoid -Wmaybe-uninitialized	2019-05-14 19:52:48 -07:00
fs	kernel/latencytop.c: rename clear_all_latency_tracing to clear_tsk_latency_tracing	2019-05-14 19:52:49 -07:00
include	lib/list_sort: simplify and remove MAX_LIST_LENGTH_BITS	2019-05-14 19:52:49 -07:00
init	mm: shuffle initial free memory to improve memory-side-cache utilization	2019-05-14 19:52:48 -07:00
ipc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next	2019-05-07 22:03:58 -07:00
kernel	kernel/user.c: clean up some leftover code	2019-05-14 19:52:49 -07:00
lib	lib/list_sort: optimize number of calls to comparison function	2019-05-14 19:52:49 -07:00
LICENSES	LICENSES: Rename other to deprecated	2019-05-03 06:34:32 -06:00
mm	mm/mincore.c: make mincore() more conservative	2019-05-14 19:52:48 -07:00
net	Merge branch 'akpm' (patches from Andrew)	2019-05-14 10:10:55 -07:00
samples	samples: add .gitignore for pidfd-metadata	2019-05-10 11:50:52 +02:00
scripts	gcc-plugin fix:	2019-05-13 16:01:52 -07:00
security	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2019-05-13 15:15:00 -07:00
sound	sound updates for 5.2-rc1	2019-05-09 08:26:55 -07:00
tools	pci-v5.2-changes	2019-05-14 10:30:10 -07:00
usr	user/Makefile: Fix typo and capitalization in comment section	2018-12-11 00:18:03 +09:00
virt	mm/mmu_notifier: convert user range->blockable to helper function	2019-05-14 09:47:49 -07:00
.clang-format	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2019-04-17 11:26:25 -07:00
.cocciconfig
.get_maintainer.ignore
.gitattributes	.gitattributes: set git diff driver for C source code files	2016-10-07 18:46:30 -07:00
.gitignore	.gitignore: add more all*.config patterns	2019-05-08 09:47:46 +09:00
.mailmap	A reasonably busy cycle for docs, including:	2019-05-08 12:42:50 -07:00
COPYING	COPYING: use the new text with points to the license files	2018-03-23 12:41:45 -06:00
CREDITS	Char/Misc driver patches for 5.1-rc1	2019-03-06 14:18:59 -08:00
Kbuild	Kbuild updates for v5.1	2019-03-10 17:48:21 -07:00
Kconfig	kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt	2018-08-02 08:06:55 +09:00
MAINTAINERS	- Core Frameworks	2019-05-14 10:39:08 -07:00
Makefile	Kbuild updates for v5.2	2019-05-08 12:25:12 -07:00
README	Drop all 00-INDEX files from Documentation/	2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.