Workqueue stalls can happen from a variety of usage bugs such as
missing WQ_MEM_RECLAIM flag or concurrency managed work item
indefinitely staying RUNNING. These stalls can be extremely difficult
to hunt down because the usual warning mechanisms can't detect
workqueue stalls and the internal state is pretty opaque.
To alleviate the situation, this patch implements workqueue lockup
detector. It periodically monitors all worker_pools periodically and,
if any pool failed to make forward progress longer than the threshold
duration, triggers warning and dumps workqueue state as follows.
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 31s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=17/256
pending: monkey_wrench_fn, e1000_watchdog, cache_reap, vmstat_shepherd, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, cgroup_release_agent
workqueue events_power_efficient: flags=0x80
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
pending: check_lifetime, neigh_periodic_work
workqueue cgroup_pidlist_destroy: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
pending: cgroup_pidlist_destroy_work_fn
...
The detection mechanism is controller through kernel parameter
workqueue.watchdog_thresh and can be updated at runtime through the
sysfs module parameter file.
v2: Decoupled from softlockup control knobs.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
asm/atomic.h doesn't really need asm/processor.h anymore. Everything
it uses has moved to other header files. So remove that include.
processor.h is a nasty header that includes lots of
other headers and makes it prone to include loops. Removing the
include here makes asm/atomic.h a "leaf" header that can
be safely included in most other headers.
The only fallout is in the lib/atomic tester which relied on
this implicit include. Give it an explicit include.
(the include is in ifdef because the user is also in ifdef)
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/1449018060-1742-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This reverts commit d3716f18a7.
vmalloc cannot be used in BH disabled contexts, even
with GFP_ATOMIC. And we certainly want to support
rhashtable users inserting entries with software
interrupts disabled.
Signed-off-by: David S. Miller <davem@davemloft.net>
When an rhashtable user pounds rhashtable hard with back-to-back
insertions we may end up growing the table in GFP_ATOMIC context.
Unfortunately when the table reaches a certain size this often
fails because we don't have enough physically contiguous pages
to hold the new table.
Eric Dumazet suggested (and in fact wrote this patch) using
__vmalloc instead which can be used in GFP_ATOMIC context.
Reported-by: Phil Sutter <phil@nwl.cc>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas and Phil observed that under stress rhashtable insertion
sometimes failed with EBUSY, even though this error should only
ever been seen when we're under attack and our hash chain length
has grown to an unacceptable level, even after a rehash.
It turns out that the logic for detecting whether there is an
existing rehash is faulty. In particular, when two threads both
try to grow the same table at the same time, one of them may see
the newly grown table and thus erroneously conclude that it had
been rehashed. This is what leads to the EBUSY error.
This patch fixes this by remembering the current last table we
used during insertion so that rhashtable_insert_rehash can detect
when another thread has also done a resize/rehash. When this is
detected we will give up our resize/rehash and simply retry the
insertion with the new table.
Reported-by: Thomas Graf <tgraf@suug.ch>
Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since CHANGEUPPER can now fail, add support for it in the newly
introduced netdev notifier error injection infrastructure.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This module allows to insert errors in some of netdevice's notifier
events. All network drivers use these notifiers to signal various events
and to check if they are allowed, e.g. PRECHANGEMTU and CHANGEMTU
afterwards. Until recently I had to run failure tests by injecting
a custom module, but now this infrastructure makes it trivial to test
these failure paths. Some of the recent bugs I fixed were found using
this module.
Here's an example:
$ cd /sys/kernel/debug/notifier-error-inject/netdev
$ echo -22 > actions/NETDEV_CHANGEMTU/error
$ ip link set eth0 mtu 1024
RTNETLINK answers: Invalid argument
CC: Akinobu Mita <akinobu.mita@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev <netdev@vger.kernel.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The enable_kernel_*() functions leave the relevant MSR bits enabled
until we exit the kernel sometime later. Create disable versions
that wrap the kernel use of FP, Altivec VSX or SPE.
While we don't want to disable it normally for performance reasons
(MSR writes are slow), it will be used for a debug boot option that
does this and catches bad uses in other areas of the kernel.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Code that does lockless emptiness testing of non-RCU lists is relying
on the list-addition code to write the list head's ->next pointer
atomically. This commit therefore adds WRITE_ONCE() to list-addition
pointer stores that could affect the head's ->next pointer.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This is rather a hack to expose the current issue with rhashtable to
under high pressure sometimes return -ENOMEM even though system memory
is not exhausted and a consecutive insert may succeed.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
A maximum table size of 64k entries is insufficient for the multiple
threads test even in default configuration (10 threads * 50000 objects =
500000 objects in total). Since we know how many objects will be
inserted, calculate the max size unless overridden by parameter.
Note that specifying the exact number of objects upon table init won't
suffice as that value is being rounded down to the next power of two -
anticipate this by rounding up to the next power of two in beforehand.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
After adding cond_resched() calls to threadfunc(), a surprisingly high
rate of insert failures occurred probably due to table resizes getting a
better chance to run in background. To not soften up the remaining
tests, retry inserts until they either succeed or fail permanently.
Also change the non-threaded test to retry insert operations, too.
Suggested-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
This should fix for soft lockup bugs triggered on slow systems.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some atomic operations now have _relaxed/acquire/release variants, this
patch adds some trivial tests for two purposes:
1. test the behavior of these new operations in single-CPU
environment.
2. make their code generated before we actually use them somewhere,
so that we can examine their assembly code.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1446634365-25176-1-git-send-email-boqun.feng@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There were still a number of references to my old Red Hat email
address in the kernel source. Remove these while keeping the
Red Hat copyright notices intact.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since mpi_write_to_sgl and mpi_read_buffer explicitly left-align the
integers being written it makes no sense to require a buffer big enough for
the number + the leading zero bytes which are not written. The error
returned also doesn't convey any information. So instead require only the
size needed and return -EOVERFLOW to signal when buffer too short.
Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Merge final patch-bomb from Andrew Morton:
"Various leftovers, mainly Christoph's pci_dma_supported() removals"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
pci: remove pci_dma_supported
usbnet: remove ifdefed out call to dma_supported
kaweth: remove ifdefed out call to dma_supported
sfc: don't call dma_supported
nouveau: don't call pci_dma_supported
netup_unidvb: use pci_set_dma_mask insted of pci_dma_supported
cx23885: use pci_set_dma_mask insted of pci_dma_supported
cx25821: use pci_set_dma_mask insted of pci_dma_supported
cx88: use pci_set_dma_mask insted of pci_dma_supported
saa7134: use pci_set_dma_mask insted of pci_dma_supported
saa7164: use pci_set_dma_mask insted of pci_dma_supported
tw68-core: use pci_set_dma_mask insted of pci_dma_supported
pcnet32: use pci_set_dma_mask insted of pci_dma_supported
lib/string.c: add ULL suffix to the constant definition
hugetlb: trivial comment fix
selftests/mlock2: add ULL suffix to 64-bit constants
selftests/mlock2: add missing #define _GNU_SOURCE
Pull networking fixes from David Miller:
1) Fix null deref in xt_TEE netfilter module, from Eric Dumazet.
2) Several spots need to get to the original listner for SYN-ACK
packets, most spots got this ok but some were not. Whilst covering
the remaining cases, create a helper to do this. From Eric Dumazet.
3) Missiing check of return value from alloc_netdev() in CAIF SPI code,
from Rasmus Villemoes.
4) Don't sleep while != TASK_RUNNING in macvtap, from Vlad Yasevich.
5) Use after free in mvneta driver, from Justin Maggard.
6) Fix race on dst->flags access in dst_release(), from Eric Dumazet.
7) Add missing ZLIB_INFLATE dependency for new qed driver. From Arnd
Bergmann.
8) Fix multicast getsockopt deadlock, from WANG Cong.
9) Fix deadlock in btusb, from Kuba Pawlak.
10) Some ipv6_add_dev() failure paths were not cleaning up the SNMP6
counter state. From Sabrina Dubroca.
11) Fix packet_bind() race, which can cause lost notifications, from
Francesco Ruggeri.
12) Fix MAC restoration in qlcnic driver during bonding mode changes,
from Jarod Wilson.
13) Revert bridging forward delay change which broke libvirt and other
userspace things, from Vlad Yasevich.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
Revert "bridge: Allow forward delay to be cfgd when STP enabled"
bpf_trace: Make dependent on PERF_EVENTS
qed: select ZLIB_INFLATE
net: fix a race in dst_release()
net: mvneta: Fix memory use after free.
net: Documentation: Fix default value tcp_limit_output_bytes
macvtap: Resolve possible __might_sleep warning in macvtap_do_read()
mvneta: add FIXED_PHY dependency
net: caif: check return value of alloc_netdev
net: hisilicon: NET_VENDOR_HISILICON should depend on HAS_DMA
drivers: net: xgene: fix RGMII 10/100Mb mode
netfilter: nft_meta: use skb_to_full_sk() helper
net_sched: em_meta: use skb_to_full_sk() helper
sched: cls_flow: use skb_to_full_sk() helper
netfilter: xt_owner: use skb_to_full_sk() helper
smack: use skb_to_full_sk() helper
net: add skb_to_full_sk() helper and use it in selinux_netlbl_skbuff_setsid()
bpf: doc: correct arch list for supported eBPF JIT
dwc_eth_qos: Delete an unnecessary check before the function call "of_node_put"
bonding: fix panic on non-ARPHRD_ETHER enslave failure
...
8-byte constant is too big for long and compiler complains about this.
lib/string.c:907:20: warning: constant 0x0101010101010101 is so big it is long
Append ULL suffix to explicitly show its type.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWP91+AAoJENkgDmzRrbjxBAAQAJDFEKrMmdhyX56R058RFW1q
pYK3XrHrVMCrJRa80UH6MStvCzkR5yYU7q81XAOfl+/9TR3IIi6EPMIC6wYSyiXC
JpfnUISEJAuDMYOT19xeFDt2c7oknJnkOM7QWQt6ypY5sGWVHQ3KQUmkqlzaxQ5C
Oen9CfFttugmmpO6KDCfIxtMvxkQ1LM6SoTAKTu7LamcVsBCp5It2Me9UwGUxADj
1Phq14U8heJ9ScNYkroutEkWgyZLFJOZExUuNEIMwyooXmWQmZzBiwVwQ72WjstG
2jj3ZiLucVYvBM4k8qnGnlMR4IkymcYlXD1YJ0X7tvBFnp7UGXFKLt2NSqfOskLC
2fRPETf4PLHebZeNN/J/WKJ7qKzsBsS49KjFjJ2vm4+P6sScmcDGXw4eMyLTYfnJ
dRbuRtZpnJV4S1vss/STjehOA8A8/fURXQwb80AUzzEEfmjujZWCMYVhfqO91+kx
XsbtSciek+Abxyh9Ow9xHgVnMcsXgmZMkpODv4Gjc/4R6Uu6XRSVK04jvkuoLVi5
t4VC00NK0WY2PFVK3qGYE5ZejPTOu59UGRLwxDqZ0QmXF36Yun9f//hSDWpM10BO
Ah92OybEnny4tij7/0xz7Krg7u8BQ+at0TAmxrw4Xu9VqbnRqcpJy9Q04e52mwTu
G6ztYV2tOGMEh5lK2k0S
=WtDm
-----END PGP SIGNATURE-----
Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module updates from Rusty Russell:
"Nothing exciting, minor tweaks and cleanups"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
scripts: [modpost] add new sections to white list
modpost: Add flag -E for making section mismatches fatal
params: don't ignore the rest of cmdline if parse_one() fails
modpost: abort if a module symbol is too long
Switch everything to the new and more capable implementation of abs().
Mainly to give the new abs() a bit of a workout.
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge second patch-bomb from Andrew Morton:
- most of the rest of MM
- procfs
- lib/ updates
- printk updates
- bitops infrastructure tweaks
- checkpatch updates
- nilfs2 update
- signals
- various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
dma-debug, dma-mapping, ...
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
ipc,msg: drop dst nil validation in copy_msg
include/linux/zutil.h: fix usage example of zlib_adler32()
panic: release stale console lock to always get the logbuf printed out
dma-debug: check nents in dma_sync_sg*
dma-mapping: tidy up dma_parms default handling
pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
kexec: use file name as the output message prefix
fs, seqfile: always allow oom killer
seq_file: reuse string_escape_str()
fs/seq_file: use seq_* helpers in seq_hex_dump()
coredump: change zap_threads() and zap_process() to use for_each_thread()
coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
signals: kill block_all_signals() and unblock_all_signals()
nilfs2: fix gcc uninitialized-variable warnings in powerpc build
nilfs2: fix gcc unused-but-set-variable warnings
MAINTAINERS: nilfs2: add header file for tracing
nilfs2: add tracepoints for analyzing reading and writing metadata files
...
Like dma_unmap_sg, dma_sync_sg* should be called with the original number
of entries passed to dma_map_sg, so do the same check in the sync path as
we do in the unmap path.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Sakari Ailus <sakari.ailus@iki.fi>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a classical off-by-one error in case when we try to place, for
example, 1+1 bytes as hex in the buffer of size 6. The expected result is
to get an output truncated, but in the reality we get 6 bytes filed
followed by terminating NUL.
Change the logic how we fill the output in case of byte dumping into
limited space. This will follow the snprintf() behaviour by truncating
output even on half bytes.
Fixes: 114fc1afb2 (hexdump: make it return number of bytes placed in buffer)
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Tested-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change current_is_single_threaded() to use for_each_thread() rather than
deprecated while_each_thread().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sometimes kobject_set_name_vargs is called with a format string conaining
no %, or a format string of precisely "%s", where the single vararg
happens to point to .rodata. kvasprintf_const detects these cases for us
and returns a copy of that pointer instead of duplicating the string, thus
saving some run-time memory. Otherwise, it falls back to kvasprintf. We
just need to always deallocate ->name using kfree_const.
Unfortunately, the dance we need to do to perform the '/' -> '!'
sanitization makes the resulting code rather ugly.
I instrumented kstrdup_const to provide some statistics on the memory
saved, and for me this gave an additional ~14KB after boot (306KB was
already saved; this patch bumped that to 320KB). I have
KMALLOC_SHIFT_LOW==3, and since 80% of the kvasprintf_const hits were
satisfied by an 8-byte allocation, the 14K would roughly be quadrupled
when KMALLOC_SHIFT_LOW==5. Whether these numbers are sufficient to
justify the ugliness I'll leave to others to decide.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds kvasprintf_const which tries to use kstrdup_const if possible:
If the format string contains no % characters, or if the format string is
exactly "%s", we delegate to kstrdup_const. Otherwise, we fall back to
kvasprintf.
Just as for kstrdup_const, the main motivation is to save memory by
reusing .rodata when possible.
The return value should be freed by kfree_const, just like for
kstrdup_const.
There is deliberately no kasprintf_const: In the vast majority of cases,
the format string argument is a literal, so one can determine statically
whether one could instead use kstrdup_const directly (which would also
require one to change all corresponding kfree calls to kfree_const).
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
llist_del_first reads entry->next, but it did not acquire visibility over
the entry node. As the result it can get a stale value of entry->next
(e.g. NULL or whatever garbage was there before the appending thread
wrote correct value). And then commit that value as llist head with
cmpxchg. That will corrupt llist.
Note there is a control-dependency between read of head->first and read of
entry->next, but it does not make the code correct. Kernel memory model
unambiguously says: "A load-load control dependency requires a full read
memory barrier".
Use smp_load_acquire to acquire visibility over the entry node.
The data race was found with KernelThreadSanitizer (KTSAN).
Here is an example of KTSAN report:
ThreadSanitizer: data-race in llist_del_first
Read of size 1 by thread T389 (K2630, CPU0):
[<ffffffff8156b8a9>] llist_del_first+0x39/0x70 lib/llist.c:74
[< inlined >] tty_buffer_alloc drivers/tty/tty_buffer.c:181
[<ffffffff81664af4>] __tty_buffer_request_room+0xb4/0x250 drivers/tty/tty_buffer.c:292
[<ffffffff81664e6c>] tty_insert_flip_string_fixed_flag+0x6c/0x150 drivers/tty/tty_buffer.c:337
[< inlined >] tty_insert_flip_string include/linux/tty_flip.h:35
[<ffffffff81667422>] pty_write+0x72/0xc0 drivers/tty/pty.c:110
[< inlined >] process_output_block drivers/tty/n_tty.c:611
[<ffffffff8165c016>] n_tty_write+0x346/0x7f0 drivers/tty/n_tty.c:2401
[< inlined >] do_tty_write drivers/tty/tty_io.c:1159
[<ffffffff816568df>] tty_write+0x21f/0x3f0 drivers/tty/tty_io.c:1245
[<ffffffff8125f00f>] __vfs_write+0x5f/0x1f0 fs/read_write.c:489
[<ffffffff8125ff8f>] vfs_write+0xef/0x280 fs/read_write.c:538
[< inlined >] SYSC_write fs/read_write.c:585
[<ffffffff81261390>] SyS_write+0x70/0xe0 fs/read_write.c:577
[<ffffffff81ee862e>] entry_SYSCALL_64_fastpath+0x12/0x71 arch/x86/entry/entry_64.S:186
Previous write of size 8 by thread T226 (K761, CPU0):
[<ffffffff8156b832>] llist_add_batch+0x32/0x70 lib/llist.c:44 (discriminator 16)
[< inlined >] llist_add include/linux/llist.h:180
[<ffffffff816649fc>] tty_buffer_free+0x6c/0xb0 drivers/tty/tty_buffer.c:221
[<ffffffff816651e7>] flush_to_ldisc+0x107/0x300 drivers/tty/tty_buffer.c:514
[<ffffffff810b20ee>] process_one_work+0x47e/0x930 kernel/workqueue.c:2036
[<ffffffff810b2650>] worker_thread+0xb0/0x900 kernel/workqueue.c:2170
[<ffffffff810bbe20>] kthread+0x150/0x170 kernel/kthread.c:209
[<ffffffff81ee8a1f>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:526
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a couple of simple tests for string_get_size(). The last one will
hang the kernel without the 'lib/string_helpers.c: fix infinite loop in
string_get_size()' fix.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: James Bottomley <JBottomley@Odin.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
<linux/bitops.h> provides rol32() inline function, let's use already
predefined function instead of direct expression.
Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
%n is no longer just ignored; it results in early return from vsnprintf.
Also add a request to add test cases for future %p extensions.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds a simple module for testing the kernel's printf facilities.
Previously, some %p extensions have caused a wrong return value in case
the entire output didn't fit and/or been unusable in kasprintf(). This
should help catch such issues. Also, it should help ensure that changes
to the formatting algorithms don't break anything.
I'm not sure if we have a struct dentry or struct file lying around at
boot time or if we can fake one, but most %p extensions should be
testable, as should the ordinary number and string formatting.
The nature of vararg functions means we can't use a more conventional
table-driven approach.
For now, this is mostly a skeleton; contributions are very
welcome. Some tests are/will be slightly annoying to write, since the
expected output depends on stuff like CONFIG_*, sizeof(long), runtime
values etc.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Martin Kletzander <mkletzan@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As a quick
git grep -E '%[ +0#-]*#[ +0#-]*(\*|[0-9]+)?(\.(\*|[0-9]+)?)?p'
shows, nobody uses the # flag with %p. Should one try to do so, one
will be met with
warning: `#' flag used with `%p' gnu_printf format [-Wformat]
(POSIX and C99 both say "... For other conversion specifiers, the
behavior is undefined.". Obviously, the kernel can choose to define
the behaviour however it wants, but as long as gcc issues that
warning, users are unlikely to show up.)
Since default_width is effectively always 2*sizeof(void*), we can
simplify the prologue of pointer() and save a few instructions.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Martin Kletzander <mkletzan@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Quoting from 2aa2f9e21e ("lib/vsprintf.c: improve sanity check in
vsnprintf()"):
On 64 bit, size may very well be huge even if bit 31 happens to be 0.
Somehow it doesn't feel right that one can pass a 5 GiB buffer but not a
3 GiB one. So cap at INT_MAX as was probably the intention all along.
This is also the made-up value passed by sprintf and vsprintf.
I should have seen this copy-pasted instance back then, but let's just
do it now.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Martin Kletzander <mkletzan@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we meet any invalid or unsupported format specifier, 'handling' it by
just printing it as a literal string is not safe: Presumably the format
string and the arguments passed gcc's type checking, but that means
something like sprintf(buf, "%n %pd", &intvar, dentry) would end up
interpreting &intvar as a struct dentry*.
When the offending specifier was %n it used to be at the end of the format
string, but we can't rely on that always being the case. Also, gcc
doesn't complain about some more or less exotic qualifiers (or 'length
modifiers' in posix-speak) such as 'j' or 'q', but being unrecognized by
the kernel's printf implementation, they'd be interpreted as unknown
specifiers, and the rest of arguments would be interpreted wrongly.
So let's complain about anything we don't understand, not just %n, and
stop pretending that we'd be able to make sense of the rest of the
format/arguments. If the offending specifier is in a printk() call we
unfortunately only get a "BUG: recent printk recursion!", but at least
direct users of the sprintf family will be caught.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Martin Kletzander <mkletzan@redhat.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move all pointer-formatting documentation to one place in the code and one
place in the documentation instead of keeping it in three places with
different level of completeness. Documentation/printk-formats.txt has
detailed information about each modifier, docstring above pointer() has
short descriptions of them (as that is the function dealing with %p) and
docstring above vsprintf() is removed as redundant. Both docstrings in
the code that were modified are updated with a reminder of updating the
documentation upon any further change.
[akpm@linux-foundation.org: fix comment]
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Using kstrdup_const, thus reusing .rodata when possible, saves around 2 kB
of runtime memory on my laptop/.config combination.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__GFP_WAIT was used to signal that the caller was in atomic context and
could not sleep. Now it is possible to distinguish between true atomic
context and callers that are not willing to sleep. The latter should
clear __GFP_DIRECT_RECLAIM so kswapd will still wake. As clearing
__GFP_WAIT behaves differently, there is a risk that people will clear the
wrong flags. This patch renames __GFP_WAIT to __GFP_RECLAIM to clearly
indicate what it does -- setting it allows all reclaim activity, clearing
them prevents it.
[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The asm-generic changes for 4.4 are mostly a series from Christoph Hellwig
to clean up various abuses of headers in there. The patch to rename the
io-64-nonatomic-*.h headers caused some conflicts with new users, so I
added a workaround that we can remove in the next merge window.
The only other patch is a warning fix from Marek Vasut
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIVAwUAVjzaf2CrR//JCVInAQImmhAA20fZ91sUlnA5skKNPT1phhF6Z7UF2Sx5
nPKcHQD3HA3lT1OKfPBYvCo+loYflvXFLaQThVylVcnE/8ecAEMtft4nnGW2nXvh
sZqHIZ8fszTB53cynAZKTjdobD1wu33Rq7XRzg0ugn1mdxFkOzCHW/xDRvWRR5TL
rdQjzzgvn2PNlqFfHlh6cZ5ykShM36AIKs3WGA0H0Y/aYsE9GmDOAUp41q1mLXnA
4lKQaIxoeOa+kmlsUB0wEHUecWWWJH4GAP+CtdKzTX9v12bGNhmiKUMCETG78BT3
uL8irSqaViNwSAS9tBxSpqvmVUsa5aCA5M3MYiO+fH9ifd7wbR65g/wq39D3Pc01
KnZ3BNVRW5XSA3c86pr8vbg/HOynUXK8TN0lzt6rEk8bjoPBNEDy5YWzy0t6reVe
wX65F+ver8upjOKe9yl2Jsg+5Kcmy79GyYjLUY3TU2mZ+dIdScy/jIWatXe/OTKZ
iB4Ctc4MDe9GDECmlPOWf98AXqsBUuKQiWKCN/OPxLtFOeWBvi4IzvFuO8QvnL9p
jZcRDmIlIWAcDX/2wMnLjV+Hqi3EeReIrYznxTGnO7HHVInF555GP51vFaG5k+SN
smJQAB0/sostmC1OCCqBKq5b6/li95/No7+0v0SUhJJ5o76AR5CcNsnolXesw1fu
vTUkB/I66Hk=
=dQKG
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic cleanups from Arnd Bergmann:
"The asm-generic changes for 4.4 are mostly a series from Christoph
Hellwig to clean up various abuses of headers in there. The patch to
rename the io-64-nonatomic-*.h headers caused some conflicts with new
users, so I added a workaround that we can remove in the next merge
window.
The only other patch is a warning fix from Marek Vasut"
* tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: temporarily add back asm-generic/io-64-nonatomic*.h
asm-generic: cmpxchg: avoid warnings from macro-ized cmpxchg() implementations
gpio-mxc: stop including <asm-generic/bug>
n_tracesink: stop including <asm-generic/bug>
n_tracerouter: stop including <asm-generic/bug>
mlx5: stop including <asm-generic/kmap_types.h>
hifn_795x: stop including <asm-generic/kmap_types.h>
drbd: stop including <asm-generic/kmap_types.h>
move count_zeroes.h out of asm-generic
move io-64-nonatomic*.h out of asm-generic
Merge patch-bomb from Andrew Morton:
- inotify tweaks
- some ocfs2 updates (many more are awaiting review)
- various misc bits
- kernel/watchdog.c updates
- Some of mm. I have a huge number of MM patches this time and quite a
lot of it is quite difficult and much will be held over to next time.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits)
selftests: vm: add tests for lock on fault
mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage
mm: introduce VM_LOCKONFAULT
mm: mlock: add new mlock system call
mm: mlock: refactor mlock, munlock, and munlockall code
kasan: always taint kernel on report
mm, slub, kasan: enable user tracking by default with KASAN=y
kasan: use IS_ALIGNED in memory_is_poisoned_8()
kasan: Fix a type conversion error
lib: test_kasan: add some testcases
kasan: update reference to kasan prototype repo
kasan: move KASAN_SANITIZE in arch/x86/boot/Makefile
kasan: various fixes in documentation
kasan: update log messages
kasan: accurately determine the type of the bad access
kasan: update reported bug types for kernel memory accesses
kasan: update reported bug types for not user nor kernel memory accesses
mm/kasan: prevent deadlock in kasan reporting
mm/kasan: don't use kasan shadow pointer in generic functions
mm/kasan: MODULE_VADDR is not available on all archs
...
It's recommended to have slub's user tracking enabled with CONFIG_KASAN,
because:
a) User tracking disables slab merging which improves
detecting out-of-bounds accesses.
b) User tracking metadata acts as redzone which also improves
detecting out-of-bounds accesses.
c) User tracking provides additional information about object.
This information helps to understand bugs.
Currently it is not enabled by default. Besides recompiling the kernel
with KASAN and reinstalling it, user also have to change the boot cmdline,
which is not very handy.
Enable slub user tracking by default with KASAN=y, since there is no good
reason to not do this.
[akpm@linux-foundation.org: little fixes, per David]
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add some out of bounds testcases to test_kasan module.
Signed-off-by: Wang Long <long.wanglong@huawei.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull sparc updates from David Miller:
"Just a couple of fixes/cleanups:
- Correct NUMA latency calculations on sparc64, from Nitin Gupta.
- ASI_ST_BLKINIT_MRU_S value was wrong, from Rob Gardner.
- Fix non-faulting load handling of non-quad values, also from Rob
Gardner.
- Cleanup VISsave assembler, from Sam Ravnborg.
- Fix iommu-common code so it doesn't emit rediculous warnings on
some architectures, particularly ARM"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix numa distance values
sparc64: Don't restrict fp regs for no-fault loads
iommu-common: Fix error code used in iommu_tbl_range_{alloc,free}().
sparc64: use ENTRY/ENDPROC in VISsave
sparc64: Fix incorrect ASI_ST_BLKINIT_MRU_S value
Pull security subsystem update from James Morris:
"This is mostly maintenance updates across the subsystem, with a
notable update for TPM 2.0, and addition of Jarkko Sakkinen as a
maintainer of that"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (40 commits)
apparmor: clarify CRYPTO dependency
selinux: Use a kmem_cache for allocation struct file_security_struct
selinux: ioctl_has_perm should be static
selinux: use sprintf return value
selinux: use kstrdup() in security_get_bools()
selinux: use kmemdup in security_sid_to_context_core()
selinux: remove pointless cast in selinux_inode_setsecurity()
selinux: introduce security_context_str_to_sid
selinux: do not check open perm on ftruncate call
selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default
KEYS: Merge the type-specific data with the payload data
KEYS: Provide a script to extract a module signature
KEYS: Provide a script to extract the sys cert list from a vmlinux file
keys: Be more consistent in selection of union members used
certs: add .gitignore to stop git nagging about x509_certificate_list
KEYS: use kvfree() in add_key
Smack: limited capability for changing process label
TPM: remove unnecessary little endian conversion
vTPM: support little endian guests
char: Drop owner assignment from i2c_driver
...
Here's the "big" driver core updates for 4.4-rc1. Primarily a bunch of
debugfs updates, with a smattering of minor driver core fixes and
updates as well.
All have been in linux-next for a long time.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlY6ePQACgkQMUfUDdst+ymNTgCgpP0CZw57GpwF/Hp2L/lMkVeo
Kx8AoKhEi4iqD5fdCQS9qTfomB+2/M6g
=g7ZO
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here's the "big" driver core updates for 4.4-rc1. Primarily a bunch
of debugfs updates, with a smattering of minor driver core fixes and
updates as well.
All have been in linux-next for a long time"
* tag 'driver-core-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
debugfs: Add debugfs_create_ulong()
of: to support binding numa node to specified device in devicetree
debugfs: Add read-only/write-only bool file ops
debugfs: Add read-only/write-only size_t file ops
debugfs: Add read-only/write-only x64 file ops
debugfs: Consolidate file mode checks in debugfs_create_*()
Revert "mm: Check if section present during memory block (un)registering"
driver-core: platform: Provide helpers for multi-driver modules
mm: Check if section present during memory block (un)registering
devres: fix a for loop bounds check
CMA: fix CONFIG_CMA_SIZE_MBYTES overflow in 64bit
base/platform: assert that dev_pm_domain callbacks are called unconditionally
sysfs: correctly handle short reads on PREALLOC attrs.
base: soc: siplify ida usage
kobject: move EXPORT_SYMBOL() macros next to corresponding definitions
kobject: explain what kobject's sd field is
debugfs: document that debugfs_remove*() accepts NULL and error values
debugfs: Pass bool pointer to debugfs_create_bool()
ACPI / EC: Fix broken 64bit big-endian users of 'global_lock'
When running "mod X" operation, if X is 0 the filter has to be halt.
Add new test cases to cover A = A mod X if X is 0, and A = A mod 1.
CC: Xi Wang <xi.wang@gmail.com>
CC: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Zi Shen Lim <zlim.lnx@gmail.com>
Acked-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The value returned from iommu_tbl_range_alloc() (and the one passed
in as a fourth argument to iommu_tbl_range_free) is not a DMA address,
it is rather an index into the IOMMU page table.
Therefore using DMA_ERROR_CODE is not appropriate.
Use a more type matching error code define, IOMMU_ERROR_CODE, and
update all users of this interface.
Reported-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
Changes of note:
1) Allow to schedule ICMP packets in IPVS, from Alex Gartrell.
2) Provide FIB table ID in ipv4 route dumps just as ipv6 does, from
David Ahern.
3) Allow the user to ask for the statistics to be filtered out of
ipv4/ipv6 address netlink dumps. From Sowmini Varadhan.
4) More work to pass the network namespace context around deep into
various packet path APIs, starting with the netfilter hooks. From
Eric W Biederman.
5) Add layer 2 TX/RX checksum offloading to qeth driver, from Thomas
Richter.
6) Use usec resolution for SYN/ACK RTTs in TCP, from Yuchung Cheng.
7) Support Very High Throughput in wireless MESH code, from Bob
Copeland.
8) Allow setting the ageing_time in switchdev/rocker. From Scott
Feldman.
9) Properly autoload L2TP type modules, from Stephen Hemminger.
10) Fix and enable offload features by default in 8139cp driver, from
David Woodhouse.
11) Support both ipv4 and ipv6 sockets in a single vxlan device, from
Jiri Benc.
12) Fix CWND limiting of thin streams in TCP, from Bendik Rønning
Opstad.
13) Fix IPSEC flowcache overflows on large systems, from Steffen
Klassert.
14) Convert bridging to track VLANs using rhashtable entries rather than
a bitmap. From Nikolay Aleksandrov.
15) Make TCP listener handling completely lockless, this is a major
accomplishment. Incoming request sockets now live in the
established hash table just like any other socket too.
From Eric Dumazet.
15) Provide more bridging attributes to netlink, from Nikolay
Aleksandrov.
16) Use hash based algorithm for ipv4 multipath routing, this was very
long overdue. From Peter Nørlund.
17) Several y2038 cures, mostly avoiding timespec. From Arnd Bergmann.
18) Allow non-root execution of EBPF programs, from Alexei Starovoitov.
19) Support SO_INCOMING_CPU as setsockopt, from Eric Dumazet. This
influences the port binding selection logic used by SO_REUSEPORT.
20) Add ipv6 support to VRF, from David Ahern.
21) Add support for Mellanox Spectrum switch ASIC, from Jiri Pirko.
22) Add rtl8xxxu Realtek wireless driver, from Jes Sorensen.
23) Implement RACK loss recovery in TCP, from Yuchung Cheng.
24) Support multipath routes in MPLS, from Roopa Prabhu.
25) Fix POLLOUT notification for listening sockets in AF_UNIX, from Eric
Dumazet.
26) Add new QED Qlogic river, from Yuval Mintz, Manish Chopra, and
Sudarsana Kalluru.
27) Don't fetch timestamps on AF_UNIX sockets, from Hannes Frederic
Sowa.
28) Support ipv6 geneve tunnels, from John W Linville.
29) Add flood control support to switchdev layer, from Ido Schimmel.
30) Fix CHECKSUM_PARTIAL handling of potentially fragmented frames, from
Hannes Frederic Sowa.
31) Support persistent maps and progs in bpf, from Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1790 commits)
sh_eth: use DMA barriers
switchdev: respect SKIP_EOPNOTSUPP flag in case there is no recursion
net: sched: kill dead code in sch_choke.c
irda: Delete an unnecessary check before the function call "irlmp_unregister_service"
net: dsa: mv88e6xxx: include DSA ports in VLANs
net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports
net/core: fix for_each_netdev_feature
vlan: Invoke driver vlan hooks only if device is present
arcnet/com20020: add LEDS_CLASS dependency
bpf, verifier: annotate verbose printer with __printf
dp83640: Only wait for timestamps for packets with timestamping enabled.
ptp: Change ptp_class to a proper bitmask
dp83640: Prune rx timestamp list before reading from it
dp83640: Delay scheduled work.
dp83640: Include hash in timestamp/packet matching
ipv6: fix tunnel error handling
net/mlx5e: Fix LSO vlan insertion
net/mlx5e: Re-eanble client vlan TX acceleration
net/mlx5e: Return error in case mlx5e_set_features() fails
net/mlx5e: Don't allow more than max supported channels
...
Pull crypto update from Herbert Xu:
"API:
- Add support for cipher output IVs in testmgr
- Add missing crypto_ahash_blocksize helper
- Mark authenc and des ciphers as not allowed under FIPS.
Algorithms:
- Add CRC support to 842 compression
- Add keywrap algorithm
- A number of changes to the akcipher interface:
+ Separate functions for setting public/private keys.
+ Use SG lists.
Drivers:
- Add Intel SHA Extension optimised SHA1 and SHA256
- Use dma_map_sg instead of custom functions in crypto drivers
- Add support for STM32 RNG
- Add support for ST RNG
- Add Device Tree support to exynos RNG driver
- Add support for mxs-dcp crypto device on MX6SL
- Add xts(aes) support to caam
- Add ctr(aes) and xts(aes) support to qat
- A large set of fixes from Russell King for the marvell/cesa driver"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (115 commits)
crypto: asymmetric_keys - Fix unaligned access in x509_get_sig_params()
crypto: akcipher - Don't #include crypto/public_key.h as the contents aren't used
hwrng: exynos - Add Device Tree support
hwrng: exynos - Fix missing configuration after suspend to RAM
hwrng: exynos - Add timeout for waiting on init done
dt-bindings: rng: Describe Exynos4 PRNG bindings
crypto: marvell/cesa - use __le32 for hardware descriptors
crypto: marvell/cesa - fix missing cpu_to_le32() in mv_cesa_dma_add_op()
crypto: marvell/cesa - use memcpy_fromio()/memcpy_toio()
crypto: marvell/cesa - use gfp_t for gfp flags
crypto: marvell/cesa - use dma_addr_t for cur_dma
crypto: marvell/cesa - use readl_relaxed()/writel_relaxed()
crypto: caam - fix indentation of close braces
crypto: caam - only export the state we really need to export
crypto: caam - fix non-block aligned hash calculation
crypto: caam - avoid needlessly saving and restoring caam_hash_ctx
crypto: caam - print errno code when hash registration fails
crypto: marvell/cesa - fix memory leak
crypto: marvell/cesa - fix first-fragment handling in mv_cesa_ahash_dma_last_req()
crypto: marvell/cesa - rearrange handling for sw padded hashes
...
Pull ARM updates from Russell King:
"In this ARM merge, we remove more lines than we add. Changes include:
- Enable imprecise aborts early, so that bus errors aren't masked
until later in the boot. This has the side effect that boot
loaders which provoke these aborts can cause the kernel to crash
early in boot, so we install a handler to report this event around
the site where these are enabled.
- Remove the buggy but impossible to enable cmpxchg syscall code.
- Add unwinding annotations to some assembly code.
- Add support for atomic half-word exchange for ARMv6k+.
- Reduce ioremap() alignment for SMP/LPAE cases where we don't need
the large alignment.
- Addition of an "optimal" 3G configuration for systems with 1G of
RAM.
- Increase vmalloc space by 128M.
- Constify some SMP operations structures, which have never been
writable.
- Improve ARMs dma_mmap() support for mapping DMA coherent mappings
into userspace.
- Fix to the NMI backtrace code in the IPI case on ARM where the
failing CPU gets stuck for 10s waiting for its own IPI to be
delivered.
- Removal of legacy PM support from the AMBA bus driver.
- Another fix for the previous fix of vdsomunge"
* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (23 commits)
ARM: 8449/1: fix bug in vdsomunge swab32 macro
arm: add missing of_node_put
ARM: 8447/1: catch pending imprecise abort on unmask
ARM: 8446/1: amba: Remove unused callbacks for legacy system PM
ARM: 8443/1: Adding support for atomic half word exchange
ARM: clean up TWD after previous patch
ARM: 8441/2: twd: Don't set CLOCK_EVT_FEAT_C3STOP unconditionally
ARM: 8440/1: remove obsolete documentation
ARM: make highpte an expert option
ARM: 8433/1: add a VMSPLIT_3G_OPT config option
ARM: 8439/1: Fix backtrace generation when IPI is masked
ARM: 8428/1: kgdb: Fix registers on sleeping tasks
ARM: 8427/1: dma-mapping: add support for offset parameter in dma_mmap()
ARM: 8426/1: dma-mapping: add missing range check in dma_mmap()
ARM: remove user cmpxchg syscall
ARM: 8438/1: Add unwinding to __clear_user_std()
ARM: 8436/1: hw_breakpoint: remove unnecessary header
ARM: 8434/2: Revert "7655/1: smp_twd: make twd_local_timer_of_register() no-op for nosmp"
ARM: 8432/1: move VMALLOC_END from 0xff000000 to 0xff800000
ARM: 8430/1: use default ioremap alignment for SMP or LPAE
...
When the kernel compiled with KASAN=y, GCC adds redzones for each
variable on stack. This enlarges function's stack frame and causes:
'warning: the frame size of X bytes is larger than Y bytes'
The worst case I've seen for now is following:
../net/wireless/nl80211.c: In function `nl80211_send_wiphy':
../net/wireless/nl80211.c:1731:1: warning: the frame size of 5448 bytes is larger than 2048 bytes [-Wframe-larger-than=]
That kind of warning becomes useless with KASAN=y. It doesn't
necessarily indicate that there is some problem in the code, thus we
should turn it off.
(The KASAN=y stack size in increased from 16k to 32k for this reason)
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Abylay Ospan <aospan@netup.ru>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: Kozlov Sergey <serjk@netup.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The patch fixes the analysis of the input data which contains an off
by one.
The issue is visible when the SGL contains one byte per SG entry.
The code for checking for zero bytes does not operate on the data byte.
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Conflicts:
drivers/net/usb/asix_common.c
net/ipv4/inet_connection_sock.c
net/switchdev/switchdev.c
In the inet_connection_sock.c case the request socket hashing scheme
is completely different in net-next.
The other two conflicts were overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
lib/built-in.o: In function `__bitrev32':
deftree.c:(.text+0x1e799): undefined reference to `byte_rev_table'
deftree.c:(.text+0x1e7a0): undefined reference to `byte_rev_table'
deftree.c:(.text+0x1e7b4): undefined reference to `byte_rev_table'
deftree.c:(.text+0x1e7c1): undefined reference to `byte_rev_table'
Anything which uses bitrevX() has to select BITREVERSE, to grab
lib/bitrev.o.
Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This header contains a few helpers currenly only used by the mpi
implementation, and not default implementation of architecture code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This patch adds CRC generation and validation support for nx-842.
Add CRC flag so that nx842 coprocessor includes CRC during compression
and validates during decompression.
Also changes in 842 SW compression to append CRC value at the end
of template and checks during decompression.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add mpi_read_raw_from_sgl and mpi_write_to_sgl helpers.
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a prandom_init_once() facility that works on the rnd_state, so that
users that are keeping their own state independent from prandom_u32() can
initialize their taus113 per cpu states.
The motivation here is similar to net_get_random_once(): initialize the
state as late as possible in the hope that enough entropy has been
collected for the seeding. prandom_init_once() makes use of the recently
introduced prandom_seed_full_state() helper and is generic enough so that
it could also be used on fast-paths due to the DO_ONCE().
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Factor out the full reseed handling code that populates the state
through get_random_bytes() and runs prandom_warmup(). The resulting
prandom_seed_full_state() will be used later on in more than the
current __prandom_reseed() user. Fix also two minor whitespace
issues along the way.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the get_random_once() helper generic enough, so that functions
in general would only be called once, where one user of this is then
net_get_random_once().
The only implementation specific call is to get_random_bytes(), all
the rest of this *_once() facility would be duplicated among different
subsystems otherwise. The new DO_ONCE() helper will be used by prandom()
later on, but might also be useful for other scenarios/subsystems as
well where a one-time initialization in often-called, possibly fast
path code could occur.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no good reason why users outside of networking should not
be using this facility, f.e. for initializing their seeds.
Therefore, make it accessible from there as get_random_once().
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's possible that the destination can be shadowed in userspace
(as, for example, the perf buffers are now). So we should take
care not to leak data that could be inspected by userspace.
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
The section mismatch warning can be easy to miss during the kernel build
process. Allow it to be marked as fatal to be easily caught and prevent
bugs from slipping in.
Setting CONFIG_SECTION_MISMATCH_WARN_ONLY=y causes these warnings to be
non-fatal, since there are a number of section mismatches when using
allmodconfig on some architectures, and we do not want to break these
builds by default.
Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
Change-Id: Ic346706e3297c9f0d790e3552aa94e5cff9897a6
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The iomap[] array has PCIM_IOMAP_MAX (6) elements and not
DEVICE_COUNT_RESOURCE (16). This bug was found using a static checker.
It may be that the "if (!(mask & (1 << i)))" check means we never
actually go past the end of the array in real life.
Fixes: ec04b07584 ('iomap: implement pcim_iounmap_regions()')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull strscpy string copy function implementation from Chris Metcalf.
Chris sent this during the merge window, but I waffled back and forth on
the pull request, which is why it's going in only now.
The new "strscpy()" function is definitely easier to use and more secure
than either strncpy() or strlcpy(), both of which are horrible nasty
interfaces that have serious and irredeemable problems.
strncpy() has a useless return value, and doesn't NUL-terminate an
overlong result. To make matters worse, it pads a short result with
zeroes, which is a performance disaster if you have big buffers.
strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking
the insane NUL padding, but having a differently broken return value
which returns the original length of the source string. Which means
that it will read characters past the count from the source buffer, and
you have to trust the source to be properly terminated. It also makes
error handling fragile, since the test for overflow is unnecessarily
subtle.
strscpy() avoids both these problems, guaranteeing the NUL termination
(but not excessive padding) if the destination size wasn't zero, and
making the overflow condition very obvious by returning -E2BIG. It also
doesn't read past the size of the source, and can thus be used for
untrusted source data too.
So why did I waffle about this for so long?
Every time we introduce a new-and-improved interface, people start doing
these interminable series of trivial conversion patches.
And every time that happens, somebody does some silly mistake, and the
conversion patch to the improved interface actually makes things worse.
Because the patch is mindnumbing and trivial, nobody has the attention
span to look at it carefully, and it's usually done over large swatches
of source code which means that not every conversion gets tested.
So I'm pulling the strscpy() support because it *is* a better interface.
But I will refuse to pull mindless conversion patches. Use this in
places where it makes sense, but don't do trivial patches to fix things
that aren't actually known to be broken.
* 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
tile: use global strscpy() rather than private copy
string: provide strscpy()
Make asm/word-at-a-time.h available on all architectures
Move EXPORT_SYMBOL() macros in kobject.c from the end of the file
next to the function definitions to which they belong.
Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Its a bit odd that debugfs_create_bool() takes 'u32 *' as an argument,
when all it needs is a boolean pointer.
It would be better to update this API to make it accept 'bool *'
instead, as that will make it more consistent and often more convenient.
Over that bool takes just a byte.
That required updates to all user sites as well, in the same commit
updating the API. regmap core was also using
debugfs_{read|write}_file_bool(), directly and variable types were
updated for that to be bool as well.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently on ARM when <SysRq-L> is triggered from an interrupt handler
(e.g. a SysRq issued using UART or kbd) the main CPU will wedge for ten
seconds with interrupts masked before issuing a backtrace for every CPU
except itself.
The new backtrace code introduced by commit 96f0e00378 ("ARM: add
basic support for on-demand backtrace of other CPUs") does not work
correctly when run from an interrupt handler because IPI_CPU_BACKTRACE
is used to generate the backtrace on all CPUs but cannot preempt the
current calling context.
This can be fixed by detecting that the calling context cannot be
preempted and issuing the backtrace directly in this case. Issuing
directly leaves us without any pt_regs to pass to nmi_cpu_backtrace()
so we also modify the generic code to call dump_stack() when its
argument is NULL.
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pull networking fixes from David Miller:
1) When we run a tap on netlink sockets, we have to copy mmap'd SKBs
instead of cloning them. From Daniel Borkmann.
2) When converting classical BPF into eBPF, fix the setting of the
source reg to BPF_REG_X. From Tycho Andersen.
3) Fix igmpv3/mldv2 report parsing in the bridge multicast code, from
Linus Lussing.
4) Fix dst refcounting for ipv6 tunnels, from Martin KaFai Lau.
5) Set NLM_F_REPLACE flag properly when replacing ipv6 routes, from
Roopa Prabhu.
6) Add some new cxgb4 PCI device IDs, from Hariprasad Shenai.
7) Fix headroom tests and SKB leaks in ipv6 fragmentation code, from
Florian Westphal.
8) Check DMA mapping errors in bna driver, from Ivan Vecera.
9) Several 8139cp bug fixes (dev_kfree_skb_any in interrupt context,
misclearing of interrupt status in TX timeout handler, etc.) from
David Woodhouse.
10) In tipc, reset SKB header pointer after skb_linearize(), from Erik
Hugne.
11) Fix autobind races et al. in netlink code, from Herbert Xu with
help from Tejun Heo and others.
12) Missing SET_NETDEV_DEV in sunvnet driver, from Sowmini Varadhan.
13) Fix various races in timewait timer and reqsk_queue_hadh_req, from
Eric Dumazet.
14) Fix array overruns in mac80211, from Johannes Berg and Dan
Carpenter.
15) Fix data race in rhashtable_rehash_one(), from Dmitriy Vyukov.
16) Fix race between poll_one_napi and napi_disable, from Neil Horman.
17) Fix byte order in geneve tunnel port config, from John W Linville.
18) Fix handling of ARP replies over lightweight tunnels, from Jiri
Benc.
19) We can loop when fib rule dumps cross multiple SKBs, fix from Wilson
Kok and Roopa Prabhu.
20) Several reference count handling bug fixes in the PHY/MDIO layer
from Russel King.
21) Fix lockdep splat in ppp_dev_uninit(), from Guillaume Nault.
22) Fix crash in icmp_route_lookup(), from David Ahern.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
net: Fix panic in icmp_route_lookup
net: update docbook comment for __mdiobus_register()
ppp: fix lockdep splat in ppp_dev_uninit()
net: via/Kconfig: GENERIC_PCI_IOMAP required if PCI not selected
phy: marvell: add link partner advertised modes
net: fix net_device refcounting
phy: add phy_device_remove()
phy: fixed-phy: properly validate phy in fixed_phy_update_state()
net: fix phy refcounting in a bunch of drivers
of_mdio: fix MDIO phy device refcounting
phy: add proper phy struct device refcounting
phy: fix mdiobus module safety
net: dsa: fix of_mdio_find_bus() device refcount leak
phy: fix of_mdio_find_bus() device refcount leak
ip6_tunnel: Reduce log level in ip6_tnl_err() to debug
ip6_gre: Reduce log level in ip6gre_err() to debug
fib_rules: fix fib rule dumps across multiple skbs
bnx2x: byte swap rss_key to comply to Toeplitz specs
net: revert "net_sched: move tp->root allocation into fw_init()"
lwtunnel: remove source and destination UDP port config option
...
rhashtable_rehash_one() uses complex logic to update entry->next field,
after INIT_RHT_NULLS_HEAD and NULLS_MARKER expansion:
entry->next = 1 | ((base + off) << 1)
This can be compiled along the lines of:
entry->next = base + off
entry->next <<= 1
entry->next |= 1
Which will break concurrent readers.
NULLS value recomputation is not needed here, so just remove
the complex logic.
The data race was found with KernelThreadSanitizer (KTSAN).
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The check for invoking iommu->lazy_flush() from iommu_tbl_range_alloc()
has to be refactored so that we only call ->lazy_flush() if it is
non-null.
I had a sparc kernel that was crashing when I was trying to process some
very large perf.data files- the crash happens when the scsi driver calls
into dma_4v_map_sg and thus the iommu_tbl_range_alloc().
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some string_get_size() calls (e.g.:
string_get_size(1, 512, STRING_UNITS_10, ..., ...)
string_get_size(15, 64, STRING_UNITS_10, ..., ...)
) result in an infinite loop. The problem is that if size is equal to
divisor[units]/blk_size and is smaller than divisor[units] we'll end
up with size == 0 when we start doing sf_cap calculations:
For string_get_size(1, 512, STRING_UNITS_10, ..., ...) case:
...
remainder = do_div(size, divisor[units]); -> size is 0, remainder is 1
remainder *= blk_size; -> remainder is 512
...
size *= blk_size; -> size is still 0
size += remainder / divisor[units]; -> size is still 0
The caller causing the issue is sd_read_capacity(), the problem was
noticed on Hyper-V, such weird size was reported by host when scanning
collides with device removal. This is probably a separate issue worth
fixing, this patch is intended to prevent the library routine from
infinite looping.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: James Bottomley <JBottomley@Odin.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove bi_reverse() and use generic bitrev32() instead - it should have
better performance on some platforms.
Signed-off-by: yalin wang <yalin.wang2010@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Compare pointer-typed values to NULL rather than 0.
The semantic patch that makes this change is available
in scripts/coccinelle/null/badzero.cocci.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When loading x86 64bit kernel above 4GiB with patched grub2, got kernel
gunzip error.
| early console in decompress_kernel
| decompress_kernel:
| input: [0x807f2143b4-0x807ff61aee]
| output: [0x807cc00000-0x807f3ea29b] 0x027ea29c: output_len
| boot via startup_64
| KASLR using RDTSC...
| new output: [0x46fe000000-0x470138cfff] 0x0338d000: output_run_size
| decompress: [0x46fe000000-0x47007ea29b] <=== [0x807f2143b4-0x807ff61aee]
|
| Decompressing Linux... gz...
|
| uncompression error
|
| -- System halted
the new buffer is at 0x46fe000000ULL, decompressor_gzip is using
0xffffffb901ffffff as out_len. gunzip in lib/zlib_inflate/inflate.c cap
that len to 0x01ffffff and decompress fails later.
We could hit this problem with crashkernel booting that uses kexec loading
kernel above 4GiB.
We have decompress_* support:
1. inbuf[]/outbuf[] for kernel preboot.
2. inbuf[]/flush() for initramfs
3. fill()/flush() for initrd.
This bug only affect kernel preboot path that use outbuf[].
Add __decompress and take real out_buf_len for gunzip instead of guessing
wrong buf size.
Fixes: 1431574a1c (lib/decompressors: fix "no limit" output buffer length)
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Alexandre Courbot <acourbot@nvidia.com>
Cc: Jon Medhurst <tixy@linaro.org>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In kmalloc_oob_krealloc_less, I think it is better to test
the size2 boundary.
If we do not call krealloc, the access of position size1 will still cause
out-of-bounds and access of position size2 does not. After call krealloc,
the access of position size2 cause out-of-bounds. So using size2 is more
correct.
Signed-off-by: Wang Long <long.wanglong@huawei.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To further clarify the purpose of the "esc" argument, rename it to "only"
to reflect that it is a limit, not a list of additional characters to
escape.
Signed-off-by: Kees Cook <keescook@chromium.org>
Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The esc argument is used to reduce which characters will be escaped. For
example, using " " with ESCAPE_SPACE will not produce any escaped spaces.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In __bitmap_parselist we can accept whitespaces on head or tail during
every parsing procedure. If input has valid ranges, there is no reason to
reject the user.
For example, bitmap_parselist(" 1-3, 5, ", &mask, nmaskbits). After
separating the string, we get " 1-3", " 5", and " ". It's possible and
reasonable to accept such string as long as the parsing result is correct.
Signed-off-by: Pan Xinhui <xinhuix.pan@intel.com>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If string end with '-', for exapmle, bitmap_parselist("1,0-",&mask,
nmaskbits), It is not in a valid pattern, so add a check after loop.
Return -EINVAL on such condition.
Signed-off-by: Pan Xinhui <xinhuix.pan@intel.com>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We can avoid in-loop incrementation of ndigits. Save current totaldigits
to ndigits before loop, and check ndigits against totaldigits after the
loop.
Signed-off-by: Pan Xinhui <xinhuix.pan@intel.com>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
strtol(3) et al accept "-0", so should we.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The strscpy() API is intended to be used instead of strlcpy(),
and instead of most uses of strncpy().
- Unlike strlcpy(), it doesn't read from memory beyond (src + size).
- Unlike strlcpy() or strncpy(), the API provides an easy way to check
for destination buffer overflow: an -E2BIG error return value.
- The provided implementation is robust in the face of the source
buffer being asynchronously changed during the copy, unlike the
current implementation of strlcpy().
- Unlike strncpy(), the destination buffer will be NUL-terminated
if the string in the source buffer is too long.
- Also unlike strncpy(), the destination buffer will not be updated
beyond the NUL termination, avoiding strncpy's behavior of zeroing
the entire tail end of the destination buffer. (A memset() after
the strscpy() can be used if this behavior is desired.)
- The implementation should be reasonably performant on all
platforms since it uses the asm/word-at-a-time.h API rather than
simple byte copy. Kernel-to-kernel string copy is not considered
to be performance critical in any case.
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Merge second patch-bomb from Andrew Morton:
"Almost all of the rest of MM. There was an unusually large amount of
MM material this time"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (141 commits)
zpool: remove no-op module init/exit
mm: zbud: constify the zbud_ops
mm: zpool: constify the zpool_ops
mm: swap: zswap: maybe_preload & refactoring
zram: unify error reporting
zsmalloc: remove null check from destroy_handle_cache()
zsmalloc: do not take class lock in zs_shrinker_count()
zsmalloc: use class->pages_per_zspage
zsmalloc: consider ZS_ALMOST_FULL as migrate source
zsmalloc: partial page ordering within a fullness_list
zsmalloc: use shrinker to trigger auto-compaction
zsmalloc: account the number of compacted pages
zsmalloc/zram: introduce zs_pool_stats api
zsmalloc: cosmetic compaction code adjustments
zsmalloc: introduce zs_can_compact() function
zsmalloc: always keep per-class stats
zsmalloc: drop unused variable `nr_to_migrate'
mm/memblock.c: fix comment in __next_mem_range()
mm/page_alloc.c: fix type information of memoryless node
memory-hotplug: fix comments in zone_spanned_pages_in_node() and zone_spanned_pages_in_node()
...
CMA reserved memory is not part of total reserved memory. Currently
when we print the total reserve memory it considers cma as part of
reserve memory and do minus of totalcma_pages from reserved, which is
wrong. In cases where total reserved is less than cma reserved we will
get negative values & while printing we print as unsigned and we will
get a very large value.
Below is the show mem output on X86 ubuntu based system where CMA
reserved is 100MB (25600 pages) & total reserved is ~40MB(10316 pages).
And reserve memory shows a large value because of this bug.
Before:
[ 127.066430] 898908 pages RAM
[ 127.066432] 671682 pages HighMem/MovableOnly
[ 127.066434] 4294952012 pages reserved
[ 127.066436] 25600 pages cma reserved
After:
[ 44.663129] 898908 pages RAM
[ 44.663130] 671682 pages HighMem/MovableOnly
[ 44.663130] 10316 pages reserved
[ 44.663131] 25600 pages cma reserved
Signed-off-by: Vishnu Pratap Singh <vishnu.ps@samsung.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Danesh Petigara <dpetigara@broadcom.com>
Cc: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1/ Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map. This facility is used by the pmem driver to
enable pfn_to_page() operations on the page frames returned by DAX
('direct_access' in 'struct block_device_operations'). For now, the
'memmap' allocation for these "device" pages comes from "System
RAM". Support for allocating the memmap from device memory will
arrive in a later kernel.
2/ Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3. Completion of
the conversion is targeted for v4.4.
3/ Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
4/ Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
5/ Miscellaneous updates and fixes to libnvdimm including support
for issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV6Nx7AAoJEB7SkWpmfYgCWyYQAI5ju6Gvw27RNFtPovHcZUf5
JGnxXejI6/AqeTQ+IulgprxtEUCrXOHjCDA5dkjr1qvsoqK1qxug+vJHOZLgeW0R
OwDtmdW4Qrgeqm+CPoxETkorJ8wDOc8mol81kTiMgeV3UqbYeeHIiTAmwe7VzZ0C
nNdCRDm5g8dHCjTKcvK3rvozgyoNoWeBiHkPe76EbnxDICxCB5dak7XsVKNMIVFQ
NuYlnw6IYN7+rMHgpgpRux38NtIW8VlYPWTmHExejc2mlioWMNBG/bmtwLyJ6M3e
zliz4/cnonTMUaizZaVozyinTa65m7wcnpjK+vlyGV2deDZPJpDRvSOtB0lH30bR
1gy+qrKzuGKpaN6thOISxFLLjmEeYwzYd7SvC9n118r32qShz+opN9XX0WmWSFlA
sajE1ehm4M7s5pkMoa/dRnAyR8RUPu4RNINdQ/Z9jFfAOx+Q26rLdQXwf9+uqbEb
bIeSQwOteK5vYYCstvpAcHSMlJAglzIX5UfZBvtEIJN7rlb0VhmGWfxAnTu+ktG1
o9cqAt+J4146xHaFwj5duTsyKhWb8BL9+xqbKPNpXEp+PbLsrnE/+WkDLFD67jxz
dgIoK60mGnVXp+16I2uMqYYDgAyO5zUdmM4OygOMnZNa1mxesjbDJC6Wat1Wsndn
slsw6DkrWT60CRE42nbK
=o57/
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"This update has successfully completed a 0day-kbuild run and has
appeared in a linux-next release. The changes outside of the typical
drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
the introduction of ZONE_DEVICE + devm_memremap_pages().
Summary:
- Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map.
This facility is used by the pmem driver to enable pfn_to_page()
operations on the page frames returned by DAX ('direct_access' in
'struct block_device_operations').
For now, the 'memmap' allocation for these "device" pages comes
from "System RAM". Support for allocating the memmap from device
memory will arrive in a later kernel.
- Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3.
Completion of the conversion is targeted for v4.4.
- Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
- Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
- Miscellaneous updates and fixes to libnvdimm including support for
issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes"
* tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
libnvdimm, pmem: direct map legacy pmem by default
libnvdimm, pmem: 'struct page' for pmem
libnvdimm, pfn: 'struct page' provider infrastructure
x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
add devm_memremap_pages
mm: ZONE_DEVICE for "device memory"
mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
dax: drop size parameter to ->direct_access()
nd_blk: change aperture mapping from WC to WB
nvdimm: change to use generic kvfree()
pmem, dax: have direct_access use __pmem annotation
dax: update I/O path to do proper PMEM flushing
pmem: add copy_from_iter_pmem() and clear_pmem()
pmem, x86: clean up conditional pmem includes
pmem: remove layer when calling arch_has_wmb_pmem()
pmem, x86: move x86 PMEM API to new pmem.h header
libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
pmem: switch to devm_ allocations
devres: add devm_memremap
libnvdimm, btt: write and validate parent_uuid
...
Pull security subsystem updates from James Morris:
"Highlights:
- PKCS#7 support added to support signed kexec, also utilized for
module signing. See comments in 3f1e1bea.
** NOTE: this requires linking against the OpenSSL library, which
must be installed, e.g. the openssl-devel on Fedora **
- Smack
- add IPv6 host labeling; ignore labels on kernel threads
- support smack labeling mounts which use binary mount data
- SELinux:
- add ioctl whitelisting (see
http://kernsec.org/files/lss2015/vanderstoep.pdf)
- fix mprotect PROT_EXEC regression caused by mm change
- Seccomp:
- add ptrace options for suspend/resume"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (57 commits)
PKCS#7: Add OIDs for sha224, sha284 and sha512 hash algos and use them
Documentation/Changes: Now need OpenSSL devel packages for module signing
scripts: add extract-cert and sign-file to .gitignore
modsign: Handle signing key in source tree
modsign: Use if_changed rule for extracting cert from module signing key
Move certificate handling to its own directory
sign-file: Fix warning about BIO_reset() return value
PKCS#7: Add MODULE_LICENSE() to test module
Smack - Fix build error with bringup unconfigured
sign-file: Document dependency on OpenSSL devel libraries
PKCS#7: Appropriately restrict authenticated attributes and content type
KEYS: Add a name for PKEY_ID_PKCS7
PKCS#7: Improve and export the X.509 ASN.1 time object decoder
modsign: Use extract-cert to process CONFIG_SYSTEM_TRUSTED_KEYS
extract-cert: Cope with multiple X.509 certificates in a single file
sign-file: Generate CMS message as signature instead of PKCS#7
PKCS#7: Support CMS messages also [RFC5652]
X.509: Change recorded SKID & AKID to not include Subject or Issuer
PKCS#7: Check content type and versions
MAINTAINERS: The keyrings mailing list has moved
...
Pull NMI backtrace update from Russell King:
"These changes convert the x86 NMI handling to be a library
implementation which other architectures can make use of. Thomas
Gleixner has reviewed and tested these changes, and wishes me to send
these rather than taking them through the tip tree.
The final patch in the set adds an initial implementation using this
infrastructure to ARM, even though it doesn't send the IPI at "NMI"
level. Patches are in progress to add the ARM equivalent of NMI, but
we still need the IRQ-level fallback for systems where the "NMI" isn't
available due to secure firmware denying access to it"
* 'nmi' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: add basic support for on-demand backtrace of other CPUs
nmi: x86: convert to generic nmi handler
nmi: create generic NMI backtrace implementation
Pull vfs updates from Al Viro:
"In this one:
- d_move fixes (Eric Biederman)
- UFS fixes (me; locking is mostly sane now, a bunch of bugs in error
handling ought to be fixed)
- switch of sb_writers to percpu rwsem (Oleg Nesterov)
- superblock scalability (Josef Bacik and Dave Chinner)
- swapon(2) race fix (Hugh Dickins)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (65 commits)
vfs: Test for and handle paths that are unreachable from their mnt_root
dcache: Reduce the scope of i_lock in d_splice_alias
dcache: Handle escaped paths in prepend_path
mm: fix potential data race in SyS_swapon
inode: don't softlockup when evicting inodes
inode: rename i_wb_list to i_io_list
sync: serialise per-superblock sync operations
inode: convert inode_sb_list_lock to per-sb
inode: add hlist_fake to avoid the inode hash lock in evict
writeback: plug writeback at a high level
change sb_writers to use percpu_rw_semaphore
shift percpu_counter_destroy() into destroy_super_work()
percpu-rwsem: kill CONFIG_PERCPU_RWSEM
percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire()
percpu-rwsem: introduce percpu_down_read_trylock()
document rwsem_release() in sb_wait_write()
fix the broken lockdep logic in __sb_start_write()
introduce __sb_writers_{acquired,release}() helpers
ufs_inode_get{frag,block}(): get rid of 'phys' argument
ufs_getfrag_block(): tidy up a bit
...
- An assortment of little fixes, several for minor races only likely
to be hit during testing
- further cluster-md-raid1 development, not ready for real use yet.
- new RAID6 syndrome code for ARM NEON
- fix a race where a write can return before failure of one device
is properly recorded in metadata, so an immediate crash might result
in that write being lost.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV6rSXAAoJEDnsnt1WYoG5eJIQAJs62+kB3+p87/VEu4hiBgYv
yyaCBlTDn3xxy3WFLtvSIc+cZOamvGe/u/+9/aTA5zq30VpS0fwZlLUxwyR3vB7H
aXh5y0JL8fViCUp6o+SplOpNDMAv4ntcW5NMv7uWhPLxtQxF/IJu6YLsDRcFaJqL
LFCpvKSPgXOQ88ZXHa54xgFgEy+aAh1lxaWQmeqCLtgVc6YhwIsazG00R/vow8Pb
91u3jFioWjBpovTJiRxQO+NGemfOnKrm2EWkR4jzo8taHOouBWOH0RZjh/67dh4p
QX4GjMINhFvYSr1UMGXfPm+Fjp2PRgx1qKyR/XhPeXNuE2xZf7T4aCnmKA8DVUA4
vyEl/l0lAZClExNA+bgE/wCrMpvtb9E4NnklzIffDqsDY79m9JzLwznYqDQcXP7m
0zPlRmf8KQoSOVV960N2O6siwQMwvTyPecG0raAv9BKjwZ+7/M8HLOplZuuMsbzT
BZ6+FnAIDtc0Id0wwJoARUkghG7Nr4IWi4Q8MtyYLgH9KLnYkomjf/I2B5sEooCF
JFIXeg+XX/xKSFHV4TycYdAFMtMEJMJ/pEnbKJ/W7CyAmrHJv+0/U+/gOkA8Mg76
iqYVWqRJHP9ZyWpmaWaaOeGIgFoJqrjM65qFNRcOnzMd/aAi8W63oyM99Lxi+1pm
i8StqQBNtiwzds/w32SI
=n+/k
-----END PGP SIGNATURE-----
Merge tag 'md/4.3' of git://neil.brown.name/md
Pull md updates from Neil Brown:
- an assortment of little fixes, several for minor races only likely to
be hit during testing
- further cluster-md-raid1 development, not ready for real use yet.
- new RAID6 syndrome code for ARM NEON
- fix a race where a write can return before failure of one device is
properly recorded in metadata, so an immediate crash might result in
that write being lost.
* tag 'md/4.3' of git://neil.brown.name/md: (33 commits)
md/raid5: ensure device failure recorded before write request returns.
md/raid5: use bio_list for the list of bios to return.
md/raid10: ensure device failure recorded before write request returns.
md/raid1: ensure device failure recorded before write request returns.
md-cluster: remove inappropriate try_module_get from join()
md: extend spinlock protection in register_md_cluster_operations
md-cluster: Read the disk bitmap sb and check if it needs recovery
md-cluster: only call complete(&cinfo->completion) when node join cluster
md-cluster: add missed lockres_free
md-cluster: remove the unused sb_lock
md-cluster: init suspend_list and suspend_lock early in join
md-cluster: add the error check if failed to get dlm lock
md-cluster: init completion within lockres_init
md-cluster: fix deadlock issue on message lock
md-cluster: transfer the resync ownership to another node
md-cluster: split recover_slot for future code reuse
md-cluster: use %pU to print UUIDs
md: setup safemode_timer before it's being used
md/raid5: handle possible race as reshape completes.
md: sync sync_completed has correct value as recovery finishes.
...