linux/drivers/nvme/host
Sagi Grimberg 5c11f7d9f8 nvme-tcp: Fix possible race of io_work and direct send
We may send a request (with or without its data) from two paths:

  1. From our I/O context nvme_tcp_io_work which is triggered from:
    - queue_rq
    - r2t reception
    - socket data_ready and write_space callbacks
  2. Directly from queue_rq if the send_list is empty (because we want to
     save the context switch associated with scheduling our io_work).

However, given that now we have the send_mutex, we may run into a race
condition where none of these contexts will send the pending payload to
the controller. Both io_work send path and queue_rq send path
opportunistically attempt to acquire the send_mutex however queue_rq only
attempts to send a single request, and if io_work context fails to
acquire the send_mutex it will complete without rescheduling itself.

The race can trigger with the following sequence:

  1. queue_rq sends request (no incapsule data) and blocks
  2. RX path receives r2t - prepares data PDU to send, adds h2cdata PDU
     to the send_list and schedules io_work
  3. io_work triggers and cannot acquire the send_mutex - because of (1),
     ends without self rescheduling
  4. queue_rq completes the send, and completes

==> no context will send the h2cdata - timeout.

Fix this by having queue_rq sending as much as it can from the send_list
such that if it still has any left, its because the socket buffer is
full and the socket write_space callback will trigger, thus guaranteeing
that a context will be scheduled to send the h2cdata PDU.

Fixes: db5ad6b7f8 ("nvme-tcp: try to send request in queue_rq context")
Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reported-by: Samuel Jones <sjones@kalrayinc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-01-06 10:30:36 +01:00
..
core.c for-5.11/drivers-2020-12-14 2020-12-16 13:09:32 -08:00
fabrics.c nvme-fabrics: reject I/O to offline device 2020-12-01 20:36:37 +01:00
fabrics.h nvme-fabrics: reject I/O to offline device 2020-12-01 20:36:37 +01:00
fault_inject.c nvme: enable to inject errors into admin commands 2019-06-21 11:15:50 +02:00
fc.c nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context 2021-01-06 10:30:36 +01:00
fc.h nvme-fc: Update header and host for common definitions for LS handling 2020-05-09 16:18:33 -06:00
hwmon.c nvme: return errors for hwmon init 2020-09-22 17:49:55 +02:00
Kconfig nvme-tcp: fix kconfig dependency warning when !CRYPTO 2020-09-15 07:58:49 +02:00
lightnvm.c nvme: split nvme_alloc_request() 2020-12-01 20:36:35 +01:00
Makefile nvme: support for zoned namespaces 2020-07-08 16:16:20 +02:00
multipath.c for-5.11/drivers-2020-12-14 2020-12-16 13:09:32 -08:00
nvme.h for-5.11/drivers-2020-12-14 2020-12-16 13:09:32 -08:00
pci.c nvme-pci: mark Samsung PM1725a as IGNORE_DEV_SUBNQN 2021-01-06 10:30:36 +01:00
rdma.c RDMA 5.11 pull request 2020-12-16 13:42:26 -08:00
tcp.c nvme-tcp: Fix possible race of io_work and direct send 2021-01-06 10:30:36 +01:00
trace.c nvme: trace: parse Get LBA Status command in detail 2019-08-29 12:55:01 -07:00
trace.h nvme-trace: print result and status in hex format 2019-06-21 11:12:37 +02:00
zns.c nvme: export zoned namespaces without Zone Append support read-only 2020-12-01 20:36:38 +01:00