Commit Graph

872358 Commits

Author SHA1 Message Date
Jens Axboe
17f2fe35d0 io_uring: add support for IORING_OP_ACCEPT
This allows an application to call accept4() in an async fashion. Like
other opcodes, we first try a non-blocking accept, then punt to async
context if we have to.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 12:43:06 -06:00
Jens Axboe
de2ea4b64b net: add __sys_accept4_file() helper
This is identical to __sys_accept4(), except it takes a struct file
instead of an fd, and it also allows passing in extra file->f_flags
flags. The latter is done to support masking in O_NONBLOCK without
manipulating the original file flags.

No functional changes in this patch.

Cc: netdev@vger.kernel.org
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 12:43:06 -06:00
Jens Axboe
fcb323cc53 io_uring: io_uring: add support for async work inheriting files
This is in preparation for adding opcodes that need to add new files
in a process file table, system calls like open(2) or accept4(2).

If an opcode needs this, it must set IO_WQ_WORK_NEEDS_FILES in the work
item. If work that needs to get punted to async context have this
set, the async worker will assume the original task file table before
executing the work.

Note that opcodes that need access to the current files of an
application cannot be done through IORING_SETUP_SQPOLL.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 12:43:06 -06:00
Jens Axboe
561fb04a6a io_uring: replace workqueue usage with io-wq
Drop various work-arounds we have for workqueues:

- We no longer need the async_list for tracking sequential IO.

- We don't have to maintain our own mm tracking/setting.

- We don't need a separate workqueue for buffered writes. This didn't
  even work that well to begin with, as it was suboptimal for multiple
  buffered writers on multiple files.

- We can properly cancel pending interruptible work. This fixes
  deadlocks with particularly socket IO, where we cannot cancel them
  when the io_uring is closed. Hence the ring will wait forever for
  these requests to complete, which may never happen. This is different
  from disk IO where we know requests will complete in a finite amount
  of time.

- Due to being able to cancel work interruptible work that is already
  running, we can implement file table support for work. We need that
  for supporting system calls that add to a process file table.

- It gets us one step closer to adding async support for any system
  call.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 12:43:06 -06:00
Jens Axboe
771b53d033 io-wq: small threadpool implementation for io_uring
This adds support for io-wq, a smaller and specialized thread pool
implementation. This is meant to replace workqueues for io_uring. Among
the reasons for this addition are:

- We can assign memory context smarter and more persistently if we
  manage the life time of threads.

- We can drop various work-arounds we have in io_uring, like the
  async_list.

- We can implement hashed work insertion, to manage concurrency of
  buffered writes without needing a) an extra workqueue, or b)
  needlessly making the concurrency of said workqueue very low
  which hurts performance of multiple buffered file writers.

- We can implement cancel through signals, for cancelling
  interruptible work like read/write (or send/recv) to/from sockets.

- We need the above cancel for being able to assign and use file tables
  from a process.

- We can implement a more thorough cancel operation in general.

- We need it to move towards a syslet/threadlet model for even faster
  async execution. For that we need to take ownership of the used
  threads.

This list is just off the top of my head. Performance should be the
same, or better, at least that's what I've seen in my testing. io-wq
supports basic NUMA functionality, setting up a pool per node.

io-wq hooks up to the scheduler schedule in/out just like workqueue
and uses that to drive the need for more/less workers.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 12:43:00 -06:00
Pavel Begunkov
95a1b3ff9a io_uring: Fix mm_fault with READ/WRITE_FIXED
Commit fb5ccc9878 ("io_uring: Fix broken links with offloading")
introduced a potential performance regression with unconditionally
taking mm even for READ/WRITE_FIXED operations.

Return the logic handling it back. mm-faulted requests will go through
the generic submission path, so honoring links and drains, but will
fail further on req->has_user check.

Fixes: fb5ccc9878 ("io_uring: Fix broken links with offloading")
Cc: stable@vger.kernel.org # v5.4
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 10:24:23 -06:00
Pavel Begunkov
fa45622808 io_uring: remove index from sqe_submit
submit->index is used only for inbound check in submission path (i.e.
head < ctx->sq_entries). However, it always will be true, as
1. it's already validated by io_get_sqring()
2. ctx->sq_entries can't be changedd in between, because of held
ctx->uring_lock and ctx->refs.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 10:24:21 -06:00
Dmitrii Dolgov
c826bd7a74 io_uring: add set of tracing events
To trace io_uring activity one can get an information from workqueue and
io trace events, but looks like some parts could be hard to identify via
this approach. Making what happens inside io_uring more transparent is
important to be able to reason about many aspects of it, hence introduce
the set of tracing events.

All such events could be roughly divided into two categories:

* those, that are helping to understand correctness (from both kernel
  and an application point of view). E.g. a ring creation, file
  registration, or waiting for available CQE. Proposed approach is to
  get a pointer to an original structure of interest (ring context, or
  request), and then find relevant events. io_uring_queue_async_work
  also exposes a pointer to work_struct, to be able to track down
  corresponding workqueue events.

* those, that provide performance related information. Mostly it's about
  events that change the flow of requests, e.g. whether an async work
  was queued, or delayed due to some dependencies. Another important
  case is how io_uring optimizations (e.g. registered files) are
  utilized.

Signed-off-by: Dmitrii Dolgov <9erthalion6@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 10:24:18 -06:00
Jens Axboe
11365043e5 io_uring: add support for canceling timeout requests
We might have cases where the need for a specific timeout is gone, add
support for canceling an existing timeout operation. This works like the
POLL_REMOVE command, where the application passes in the user_data of
the timeout it wishes to cancel in the sqe->addr field.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 10:22:50 -06:00
Jens Axboe
a41525ab2e io_uring: add support for absolute timeouts
This is a pretty trivial addition on top of the relative timeouts
we have now, but it's handy for ensuring tighter timing for those
that are building scheduling primitives on top of io_uring.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 10:22:48 -06:00
Jackie Liu
ba5290ccb6 io_uring: replace s->needs_lock with s->in_async
There is no function change, just to clean up the code, use s->in_async
to make the code know where it is.

Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 10:22:47 -06:00
Jens Axboe
33a107f0a1 io_uring: allow application controlled CQ ring size
We currently size the CQ ring as twice the SQ ring, to allow some
flexibility in not overflowing the CQ ring. This is done because the
SQE life time is different than that of the IO request itself, the SQE
is consumed as soon as the kernel has seen the entry.

Certain application don't need a huge SQ ring size, since they just
submit IO in batches. But they may have a lot of requests pending, and
hence need a big CQ ring to hold them all. By allowing the application
to control the CQ ring size multiplier, we can cater to those
applications more efficiently.

If an application wants to define its own CQ ring size, it must set
IORING_SETUP_CQSIZE in the setup flags, and fill out
io_uring_params->cq_entries. The value must be a power of two.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 10:22:46 -06:00
Jens Axboe
c3a31e6056 io_uring: add support for IORING_REGISTER_FILES_UPDATE
Allows the application to remove/replace/add files to/from a file set.
Passes in a struct:

struct io_uring_files_update {
	__u32 offset;
	__s32 *fds;
};

that holds an array of fds, size of array passed in through the usual
nr_args part of the io_uring_register() system call. The logic is as
follows:

1) If ->fds[i] is -1, the existing file at i + ->offset is removed from
   the set.
2) If ->fds[i] is a valid fd, the existing file at i + ->offset is
   replaced with ->fds[i].

For case #2, is the existing file is currently empty (fd == -1), the
new fd is simply added to the array.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 10:22:44 -06:00
Jens Axboe
08a451739a io_uring: allow sparse fixed file sets
This is in preparation for allowing updates to fixed file sets without
requiring a full unregister+register.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 10:22:43 -06:00
Jens Axboe
ba816ad61f io_uring: run dependent links inline if possible
Currently any dependent link is executed from a new workqueue context,
which means that we'll be doing a context switch per link in the chain.
If we are running the completion of the current request from our async
workqueue and find that the next request is a link, then run it directly
from the workqueue context instead of forcing another switch.

This improves the performance of linked SQEs, and reduces the CPU
overhead.

Reviewed-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 10:22:41 -06:00
Anton Ivanov
d848074b2f um-ubd: Entrust re-queue to the upper layers
Fixes crashes due to ubd requeue logic conflicting with the block-mq
logic. Crash is reproducible in 5.0 - 5.3.

Fixes: 53766defb8 ("um: Clean-up command processing in UML UBD driver")
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 10:07:41 -06:00
Anton Eidelman
86cccfbf77 nvme-multipath: remove unused groups_only mode in ana log
groups_only mode in nvme_read_ana_log() is no longer used: remove it.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 08:55:00 -06:00
Anton Eidelman
af8fd04247 nvme-multipath: fix possible io hang after ctrl reconnect
The following scenario results in an IO hang:
1) ctrl completes a request with NVME_SC_ANA_TRANSITION.
   NVME_NS_ANA_PENDING bit in ns->flags is set and ana_work is triggered.
2) ana_work: nvme_read_ana_log() tries to get the ANA log page from the ctrl.
   This fails because ctrl disconnects.
   Therefore nvme_update_ns_ana_state() is not called
   and NVME_NS_ANA_PENDING bit in ns->flags is not cleared.
3) ctrl reconnects: nvme_mpath_init(ctrl,...) calls
   nvme_read_ana_log(ctrl, groups_only=true).
   However, nvme_update_ana_state() does not update namespaces
   because nr_nsids = 0 (due to groups_only mode).
4) scan_work calls nvme_validate_ns() finds the ns and re-validates OK.

Result:
The ctrl is now live but NVME_NS_ANA_PENDING bit in ns->flags is still set.
Consequently ctrl will never be considered a viable path by __nvme_find_path().
IO will hang if ctrl is the only or the last path to the namespace.

More generally, while ctrl is reconnecting, its ANA state may change.
And because nvme_mpath_init() requests ANA log in groups_only mode,
these changes are not propagated to the existing ctrl namespaces.
This may result in a mal-function or an IO hang.

Solution:
nvme_mpath_init() will nvme_read_ana_log() with groups_only set to false.
This will not harm the new ctrl case (no namespaces present),
and will make sure the ANA state of namespaces gets updated after reconnect.

Note: Another option would be for nvme_mpath_init() to invoke
nvme_parse_ana_log(..., nvme_set_ns_ana_state) for each existing namespace.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-29 08:55:00 -06:00
Jens Axboe
044c1ab399 io_uring: don't touch ctx in setup after ring fd install
syzkaller reported an issue where it looks like a malicious app can
trigger a use-after-free of reading the ctx ->sq_array and ->rings
value right after having installed the ring fd in the process file
table.

Defer ring fd installation until after we're done reading those
values.

Fixes: 75b28affdd ("io_uring: allocate the two rings together")
Reported-by: syzbot+6f03d895a6cd0d06187f@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-28 09:15:33 -06:00
Pavel Begunkov
7b20238d28 io_uring: Fix leaked shadow_req
io_queue_link_head() owns shadow_req after taking it as an argument.
By not freeing it in case of an error, it can leak the request along
with taken ctx->refs.

Reviewed-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-27 21:29:18 -06:00
Linus Torvalds
d6d5df1db6 Linux 5.4-rc5 2019-10-27 13:19:19 -04:00
Linus Torvalds
153a971ff5 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "Two fixes for the VMWare guest support:

   - Unbreak VMWare platform detection which got wreckaged by converting
     an integer constant to a string constant.

   - Fix the clang build of the VMWAre hypercall by explicitely
     specifying the ouput register for INL instead of using the short
     form"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/vmware: Fix platform detection VMWARE_PORT macro
  x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvm
2019-10-27 07:14:40 -04:00
Linus Torvalds
2b776b54bc Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "A small set of fixes for time(keeping):

   - Add a missing include to prevent compiler warnings.

   - Make the VDSO implementation of clock_getres() POSIX compliant
     again. A recent change dropped the NULL pointer guard which is
     required as NULL is a valid pointer value for this function.

   - Fix two function documentation typos"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-cpu-timers: Fix two trivial comments
  timers/sched_clock: Include local timekeeping.h for missing declarations
  lib/vdso: Make clock_getres() POSIX compliant again
2019-10-27 07:04:22 -04:00
Linus Torvalds
a8a31fdcca Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "A set of perf fixes:

  kernel:

   - Unbreak the tracking of auxiliary buffer allocations which got
     imbalanced causing recource limit failures.

   - Fix the fallout of splitting of ToPA entries which missed to shift
     the base entry PA correctly.

   - Use the correct context to lookup the AUX event when unmapping the
     associated AUX buffer so the event can be stopped and the buffer
     reference dropped.

  tools:

   - Fix buildiid-cache mode setting in copyfile_mode_ns() when copying
     /proc/kcore

   - Fix freeing id arrays in the event list so the correct event is
     closed.

   - Sync sched.h anc kvm.h headers with the kernel sources.

   - Link jvmti against tools/lib/ctype.o to have weak strlcpy().

   - Fix multiple memory and file descriptor leaks, found by coverity in
     perf annotate.

   - Fix leaks in error handling paths in 'perf c2c', 'perf kmem', found
     by a static analysis tool"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/aux: Fix AUX output stopping
  perf/aux: Fix tracking of auxiliary trace buffer allocation
  perf/x86/intel/pt: Fix base for single entry topa
  perf kmem: Fix memory leak in compact_gfp_flags()
  tools headers UAPI: Sync sched.h with the kernel
  tools headers kvm: Sync kvm.h headers with the kernel sources
  tools headers kvm: Sync kvm headers with the kernel sources
  tools headers kvm: Sync kvm headers with the kernel sources
  perf c2c: Fix memory leak in build_cl_output()
  perf tools: Fix mode setting in copyfile_mode_ns()
  perf annotate: Fix multiple memory and file descriptor leaks
  perf tools: Fix resource leak of closedir() on the error paths
  perf evlist: Fix fix for freed id arrays
  perf jvmti: Link against tools/lib/ctype.h to have weak strlcpy()
2019-10-27 06:59:34 -04:00
Linus Torvalds
1e1ac1cb65 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 "Two fixes for interrupt controller drivers:

   - Skip IRQ_M_EXT entries in the device tree when initializing the
     RISCV PLIC controller to avoid a double init attempt.

   - Use the correct ITS list when issuing the VMOVP synchronization
     command so the operation works only on the ITS instances which are
     associated to a VM"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/sifive-plic: Skip contexts except supervisor in plic_init()
  irqchip/gic-v3-its: Use the exact ITSList for VMOVP
2019-10-27 06:55:55 -04:00
Linus Torvalds
c9a2e4a829 Merge tag '5.4-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "Seven cifs/smb3 fixes, including three for stable"

* tag '5.4-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix cifsInodeInfo lock_sem deadlock when reconnect occurs
  CIFS: Fix use after free of file info structures
  CIFS: Fix retry mid list corruption on reconnects
  cifs: Fix missed free operations
  CIFS: avoid using MID 0xFFFF
  cifs: clarify comment about timestamp granularity for old servers
  cifs: Handle -EINPROGRESS only when noblockcnt is set
2019-10-27 06:41:52 -04:00
Linus Torvalds
6995a6a5a5 Merge tag 'riscv/for-v5.4-rc5-b' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:
 "Several minor fixes and cleanups for v5.4-rc5:

   - Three build fixes for various SPARSEMEM-related kernel
     configurations

   - Two cleanup patches for the kernel bug and breakpoint trap handler
     code"

* tag 'riscv/for-v5.4-rc5-b' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: cleanup do_trap_break
  riscv: cleanup <asm/bug.h>
  riscv: Fix undefined reference to vmemmap_populate_basepages
  riscv: Fix implicit declaration of 'page_to_section'
  riscv: fix fs/proc/kcore.c compilation with sparsemem enabled
2019-10-27 06:36:57 -04:00
Linus Torvalds
5a1e843c66 Merge tag 'mips_fixes_5.4_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Paul Burton:
 "A few MIPS fixes:

   - Fix VDSO time-related function behavior for systems where we need
     to fall back to syscalls, but were instead returning bogus results.

   - A fix to TLB exception handlers for Cavium Octeon systems where
     they would inadvertently clobber the $1/$at register.

   - A build fix for bcm63xx configurations.

   - Switch to using my @kernel.org email address"

* tag 'mips_fixes_5.4_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: tlbex: Fix build_restore_pagemask KScratch restore
  MIPS: bmips: mark exception vectors as char arrays
  mips: vdso: Fix __arch_get_hw_counter()
  MAINTAINERS: Use @kernel.org address for Paul Burton
2019-10-26 19:43:12 -04:00
Linus Torvalds
2976895459 Merge tag 'tty-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver fix from Greg KH:
 "Here is a single tty/serial driver fix for 5.4-rc5 that resolves a
  reported issue.

  It has been in linux-next for a while with no problems"

* tag 'tty-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  8250-men-mcb: fix error checking when get_num_ports returns -ENODEV
2019-10-26 16:40:04 -04:00
Linus Torvalds
228bd62434 Merge tag 'staging-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fix from Greg KH:
 "Here is a single staging driver fix, for the wlan-ng driver, that
  resolves a reported issue.

  It is been in linux-next for a while with no reported issues"

* tag 'staging-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: wlan-ng: fix exit return when sme->key_idx >= NUM_WEPKEYS
2019-10-26 16:36:47 -04:00
Linus Torvalds
13fa692e3f Merge tag 'driver-core-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
 "Here is a single sysfs fix for 5.4-rc5.

  It resolves an error if you actually try to use the __BIN_ATTR_WO()
  macro, seems I never tested it properly before :(

  This has been in linux-next for a while with no reported issues"

* tag 'driver-core-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  sysfs: Fixes __BIN_ATTR_WO() macro
2019-10-26 15:23:08 -04:00
Linus Torvalds
a03885d596 Merge tag 'char-misc-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull binder fix from Greg KH:
 "This is a single binder fix to resolve a reported issue by Jann. It's
  been in linux-next for a while with no reported issues"

* tag 'char-misc-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  binder: Don't modify VMA bounds in ->mmap handler
2019-10-26 15:17:54 -04:00
Linus Torvalds
0ecdd78c75 Merge tag 'usb-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
 "Here are a number of small USB driver fixes for 5.4-rc5.

  More "fun" with some of the misc USB drivers as found by syzbot, and
  there are a number of other small bugfixes in here for reported
  issues.

  All have been in linux-next for a while with no reported issues"

* tag 'usb-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: cdns3: Error out if USB_DR_MODE_UNKNOWN in cdns3_core_init_role()
  USB: ldusb: fix read info leaks
  USB: serial: ti_usb_3410_5052: clean up serial data access
  USB: serial: ti_usb_3410_5052: fix port-close races
  USB: usblp: fix use-after-free on disconnect
  usb: udc: lpc32xx: fix bad bit shift operation
  usb: cdns3: Fix dequeue implementation.
  USB: legousbtower: fix a signedness bug in tower_probe()
  USB: legousbtower: fix memleak on disconnect
  USB: ldusb: fix memleak on disconnect
2019-10-26 15:14:55 -04:00
Linus Torvalds
992cb107e1 Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "A few driver fixes for the I2C subsystem"

* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: stm32f7: remove warning when compiling with W=1
  i2c: stm32f7: fix a race in slave mode with arbitration loss irq
  i2c: stm32f7: fix first byte to send in slave mode
  i2c: mt65xx: fix NULL ptr dereference
  i2c: aspeed: fix master pending state handling
2019-10-26 15:06:58 -04:00
Linus Torvalds
acf913b7fb Merge tag 'for-linus-2019-10-26' of git://git.kernel.dk/linux-block
Pull block and io_uring fixes from Jens Axboe:
 "A bit bigger than usual at this point in time, mostly due to some good
  bug hunting work by Pavel that resulted in three io_uring fixes from
  him and two from me. Anyway, this pull request contains:

   - Revert of the submit-and-wait optimization for io_uring, it can't
     always be done safely. It depends on commands always making
     progress on their own, which isn't necessarily the case outside of
     strict file IO. (me)

   - Series of two patches from me and three from Pavel, fixing issues
     with shared data and sequencing for io_uring.

   - Lastly, two timeout sequence fixes for io_uring (zhangyi)

   - Two nbd patches fixing races (Josef)

   - libahci regulator_get_optional() fix (Mark)"

* tag 'for-linus-2019-10-26' of git://git.kernel.dk/linux-block:
  nbd: verify socket is supported during setup
  ata: libahci_platform: Fix regulator_get_optional() misuse
  nbd: handle racing with error'ed out commands
  nbd: protect cmd->status with cmd->lock
  io_uring: fix bad inflight accounting for SETUP_IOPOLL|SETUP_SQTHREAD
  io_uring: used cached copies of sq->dropped and cq->overflow
  io_uring: Fix race for sqes with userspace
  io_uring: Fix broken links with offloading
  io_uring: Fix corrupted user_data
  io_uring: correct timeout req sequence when inserting a new entry
  io_uring : correct timeout req sequence when waiting timeout
  io_uring: revert "io_uring: optimize submit_and_wait API"
2019-10-26 14:59:51 -04:00
Linus Torvalds
f877bee5ea Merge tag 's390-5.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:

 - Add R_390_GLOB_DAT relocation type support. This fixes boot problem
   on linux-next.

 - Fix memory leak in zcrypt

* tag 's390-5.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/kaslr: add support for R_390_GLOB_DAT relocation type
  s390/zcrypt: fix memleak at release
2019-10-26 06:35:46 -04:00
Linus Torvalds
4fac2407f8 Merge tag 'for-linus-5.4-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixlet from Juergen Gross:
 "Just one patch for issuing a deprecation warning for 32-bit Xen pv
  guests"

* tag 'for-linus-5.4-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: issue deprecation warning for 32-bit pv guest
2019-10-26 06:32:12 -04:00
Linus Torvalds
964f9cfaae Merge tag 'dma-mapping-5.4-2' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
 "Fix a regression in the intel-iommu get_required_mask conversion
  (Arvind Sankar)"

* tag 'dma-mapping-5.4-2' of git://git.infradead.org/users/hch/dma-mapping:
  iommu/vt-d: Return the correct dma mask when we are bypassing the IOMMU
2019-10-26 06:29:04 -04:00
Linus Torvalds
485fc4b69c Merge tag 'dax-fix-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull dax fix from Dan Williams:
 "Fix a performance regression that followed from a fix to the
  conversion of the fsdax implementation to the xarray. v5.3 users
  report that they stop seeing huge page mappings on an application +
  filesystem layout that was seeing huge pages previously on v5.2"

* tag 'dax-fix-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  fs/dax: Fix pmd vs pte conflict detection
2019-10-26 06:26:04 -04:00
Linus Torvalds
1c4e395cf7 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
 "Nine changes, eight to drivers (qla2xxx, hpsa, lpfc, alua, ch,
  53c710[x2], target) and one core change that tries to close a race
  between sysfs delete and module removal"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: lpfc: remove left-over BUILD_NVME defines
  scsi: core: try to get module before removing device
  scsi: hpsa: add missing hunks in reset-patch
  scsi: target: core: Do not overwrite CDB byte 1
  scsi: ch: Make it possible to open a ch device multiple times again
  scsi: fix kconfig dependency warning related to 53C700_LE_ON_BE
  scsi: sni_53c710: fix compilation error
  scsi: scsi_dh_alua: handle RTPG sense code correctly during state transitions
  scsi: qla2xxx: fix a potential NULL pointer dereference
2019-10-25 20:11:33 -04:00
Christoph Hellwig
e8f44c50df riscv: cleanup do_trap_break
If we always compile the get_break_insn_length inline function we can
remove the ifdefs and let dead code elimination take care of the warn
branch that is now unreadable because the report_bug stub always
returns BUG_TRAP_TYPE_BUG.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-10-25 16:32:38 -07:00
Linus Torvalds
b4b61b224d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fix from Dmitry Torokhov:
 "A fix for st1232 driver to properly report coordinates for 2nd and
  subsequent fingers when more than one is on the surface"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: st1232 - fix reporting multitouch coordinates
2019-10-25 17:31:53 -04:00
Mike Christie
cf1b2326b7 nbd: verify socket is supported during setup
nbd requires socket families to support the shutdown method so the nbd
recv workqueue can be woken up from its sock_recvmsg call. If the socket
does not support the callout we will leave recv works running or get hangs
later when the device or module is removed.

This adds a check during socket connection/reconnection to make sure the
socket being passed in supports the needed callout.

Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com
Fixes: e9e006f5fc ("nbd: fix max number of supported devs")
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25 14:37:21 -06:00
Mark Brown
962399bb7f ata: libahci_platform: Fix regulator_get_optional() misuse
This driver is using regulator_get_optional() to handle all the supplies
that it handles, and only ever enables and disables all supplies en masse
without ever doing any other configuration of the device to handle missing
power. These are clear signs that the API is being misused - it should only
be used for supplies that may be physically absent from the system and in
these cases the hardware usually needs different configuration if the
supply is missing. Instead use normal regualtor_get(), if the supply is
not described in DT then the framework will substitute a dummy regulator in
so no special handling is needed by the consumer driver.

In the case of the PHY regulator the handling in the driver is a hack to
deal with integrated PHYs; the supplies are only optional in the sense
that that there's some confusion in the code about where they're bound to.
From a code point of view they function exactly as normal supplies so can
be treated as such. It'd probably be better to model this by instantiating
a PHY object for integrated PHYs.

Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25 14:22:20 -06:00
Josef Bacik
7ce23e8e0a nbd: handle racing with error'ed out commands
We hit the following warning in production

print_req_error: I/O error, dev nbd0, sector 7213934408 flags 80700
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 25 PID: 32407 at lib/refcount.c:190 refcount_sub_and_test_checked+0x53/0x60
Workqueue: knbd-recv recv_work [nbd]
RIP: 0010:refcount_sub_and_test_checked+0x53/0x60
Call Trace:
 blk_mq_free_request+0xb7/0xf0
 blk_mq_complete_request+0x62/0xf0
 recv_work+0x29/0xa1 [nbd]
 process_one_work+0x1f5/0x3f0
 worker_thread+0x2d/0x3d0
 ? rescuer_thread+0x340/0x340
 kthread+0x111/0x130
 ? kthread_create_on_node+0x60/0x60
 ret_from_fork+0x1f/0x30
---[ end trace b079c3c67f98bb7c ]---

This was preceded by us timing out everything and shutting down the
sockets for the device.  The problem is we had a request in the queue at
the same time, so we completed the request twice.  This can actually
happen in a lot of cases, we fail to get a ref on our config, we only
have one connection and just error out the command, etc.

Fix this by checking cmd->status in nbd_read_stat.  We only change this
under the cmd->lock, so we are safe to check this here and see if we've
already error'ed this command out, which would indicate that we've
completed it as well.

Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25 14:20:03 -06:00
Josef Bacik
de6346ecbc nbd: protect cmd->status with cmd->lock
We already do this for the most part, except in timeout and clear_req.
For the timeout case we take the lock after we grab a ref on the config,
but that isn't really necessary because we're safe to touch the cmd at
this point, so just move the order around.

For the clear_req cause this is initiated by the user, so again is safe.

Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-10-25 14:20:01 -06:00
Linus Torvalds
9e2dd2ca85 Merge tag 'modules-for-v5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull modules fixes from Jessica Yu:

 - Revert __ksymtab_$namespace.$symbol naming scheme back to
   __ksymtab_$symbol, as it was causing issues with depmod.

   Instead, have modpost extract a symbol's namespace from __kstrtabns
   and __ksymtab_strings.

 - Fix `make nsdeps` for out of tree kernel builds (make O=...) caused
   by unescaped '/'.

   Use a different sed delimiter to avoid this problem.

* tag 'modules-for-v5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  scripts/nsdeps: use alternative sed delimiter
  symbol namespaces: revert to previous __ksymtab name scheme
  modpost: make updating the symbol namespace explicit
  modpost: delegate updating namespaces to separate function
2019-10-25 16:11:55 -04:00
Linus Torvalds
63cbb3b364 Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
 "A slightly larger set of fixes have accrued in the last two weeks.
  Mostly a collection of the usual smaller fixes:

   - Marvell Armada: USB phy setup issues on Turris Mox

   - Broadcom: GPIO/pinmux DT mapping corrections for Stingray, MMC bus
     width fix for RPi Zero W, GPIO LED removal for RPI CM3. Also some
     maintainer updates.

   - OMAP: Fixlets for display config, interrupt settings for wifi, some
     clock/PM pieces. Also IOMMU regression fix and a ti-sysc
     no-watchdog regression fix.

   - i.MX: A few fixes around PM/settings, some devicetree fixlets and
     catching up with config option changes in DRM

   - Rockchip: RockRro64 misc DT fixups, Hugsun X99 USB-C, Kevin display
     panel settings

  ... and some smaller fixes for Davinci (backlight, McBSP DMA),
  Allwinner (phy regulators, PMU removal on A64, etc)"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (42 commits)
  ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157
  MAINTAINERS: Update the Spreadtrum SoC maintainer
  MAINTAINERS: Remove Gregory and Brian for ARCH_BRCMSTB
  ARM: dts: bcm2837-rpi-cm3: Avoid leds-gpio probing issue
  bus: ti-sysc: Fix watchdog quirk handling
  ARM: OMAP2+: Add pdata for OMAP3 ISP IOMMU
  ARM: OMAP2+: Plug in device_enable/idle ops for IOMMUs
  ARM: davinci_all_defconfig: enable GPIO backlight
  ARM: davinci: dm365: Fix McBSP dma_slave_map entry
  ARM: dts: bcm2835-rpi-zero-w: Fix bus-width of sdhci
  ARM: imx_v6_v7_defconfig: Enable CONFIG_DRM_MSM
  arm64: dts: imx8mn: Use correct clock for usdhc's ipg clk
  arm64: dts: imx8mm: Use correct clock for usdhc's ipg clk
  arm64: dts: imx8mq: Use correct clock for usdhc's ipg clk
  ARM: dts: imx7s: Correct GPT's ipg clock source
  ARM: dts: vf610-zii-scu4-aib: Specify 'i2c-mux-idle-disconnect'
  ARM: dts: imx6q-logicpd: Re-Enable SNVS power key
  arm64: dts: lx2160a: Correct CPU core idle state name
  mailmap: Add Simon Arlott (replacement for expired email address)
  arm64: dts: rockchip: Fix override mode for rk3399-kevin panel
  ...
2019-10-25 16:00:47 -04:00
Linus Torvalds
8c123380b3 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
 "Bugfixes for ARM, PPC and x86, plus selftest improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: nVMX: Don't leak L1 MMIO regions to L2
  KVM: SVM: Fix potential wrong physical id in avic_handle_ldr_update
  kvm: clear kvmclock MSR on reset
  KVM: x86: fix bugon.cocci warnings
  KVM: VMX: Remove specialized handling of unexpected exit-reasons
  selftests: kvm: fix sync_regs_test with newer gccs
  selftests: kvm: vmx_dirty_log_test: skip the test when VMX is not supported
  selftests: kvm: consolidate VMX support checks
  selftests: kvm: vmx_set_nested_state_test: don't check for VMX support twice
  KVM: Don't shrink/grow vCPU halt_poll_ns if host side polling is disabled
  selftests: kvm: synchronize .gitignore to Makefile
  kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID
  KVM: arm64: pmu: Reset sample period on overflow handling
  KVM: arm64: pmu: Set the CHAINED attribute before creating the in-kernel event
  arm64: KVM: Handle PMCR_EL0.LC as RES1 on pure AArch64 systems
  KVM: arm64: pmu: Fix cycle counter truncation
  KVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in use
2019-10-25 15:52:05 -04:00
Linus Torvalds
8caacaad78 Merge tag 'drm-fixes-2019-10-25' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
 "Quiet week this week, which I suspect means some people just didn't
  get around to sending me fixes pulls in time. This has 2 komeda and a
  bunch of amdgpu fixes in it:

  komeda:
   - typo fixes
   - flushing pipes fix

  amdgpu:
   - Fix suspend/resume issue related to multi-media engines
   - Fix memory leak in user ptr code related to hmm conversion
   - Fix possible VM faults when allocating page table memory
   - Fix error handling in bo list ioctl"

* tag 'drm-fixes-2019-10-25' of git://anongit.freedesktop.org/drm/drm:
  drm/komeda: Fix typos in komeda_splitter_validate
  drm/komeda: Don't flush inactive pipes
  drm/amdgpu/vce: fix allocation size in enc ring test
  drm/amdgpu: fix error handling in amdgpu_bo_list_create
  drm/amdgpu: fix potential VM faults
  drm/amdgpu: user pages array memory leak fix
  drm/amdgpu/vcn: fix allocation size in enc ring test
  drm/amdgpu/uvd7: fix allocation size in enc ring test (v2)
  drm/amdgpu/uvd6: fix allocation size in enc ring test (v2)
2019-10-25 15:41:14 -04:00