Commit Graph

996532 Commits

Author SHA1 Message Date
Chaitanya Kulkarni
295a39f5a5 nvmet: remove else at the end of the function
The function nvmet_parse_io_cmd() returns value from
nvmet_file_parse_io_cmd() or nvmet_bdev_parse_io_cmd() based on which
backend is set for the request. Remove the else and just return the
value from nvmet_bdev_parse_io_cmd().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:06 +01:00
Chaitanya Kulkarni
20c2c3bb83 nvmet: add nvmet_req_subsys() helper
Just like what we have to get the passthru ctrl from the req, add an
helper to get the subsystem associated with the nvmet_req() instead
of open coding the chain of structures.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:05 +01:00
Chaitanya Kulkarni
d86481e924 nvmet: use min of device_path and disk len
In function __assign_req_name() instead of using the DEVICE_NAME_LEN in
strncpy() use min of DISK_NAME_LEN and strlen(req->ns->device_path).

This is needed to turn off the following warnings:-

In file included from drivers/nvme/target/core.c:14:
In function ‘__assign_req_name’,
    inlined from ‘trace_event_raw_event_nvmet_req_init’ at drivers/nvme/target/./trace.h:58:1:
drivers/nvme/target/trace.h:52:3: warning: ‘strncpy’ specified bound 32 equals destination size [-Wstringop-truncation]
   strncpy(name, req->ns->device_path, DISK_NAME_LEN);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘__assign_req_name’,
    inlined from ‘perf_trace_nvmet_req_complete’ at drivers/nvme/target/./trace.h💯1:
drivers/nvme/target/trace.h:52:3: warning: ‘strncpy’ specified bound 32 equals destination size [-Wstringop-truncation]
   strncpy(name, req->ns->device_path, DISK_NAME_LEN);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘__assign_req_name’,
    inlined from ‘perf_trace_nvmet_req_init’ at drivers/nvme/target/./trace.h:58:1:
drivers/nvme/target/trace.h:52:3: warning: ‘strncpy’ specified bound 32 equals destination size [-Wstringop-truncation]
   strncpy(name, req->ns->device_path, DISK_NAME_LEN);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘__assign_req_name’,
    inlined from ‘trace_event_raw_event_nvmet_req_complete’ at drivers/nvme/target/./trace.h💯1:
drivers/nvme/target/trace.h:52:3: warning: ‘strncpy’ specified bound 32 equals destination size [-Wstringop-truncation]
   strncpy(name, req->ns->device_path, DISK_NAME_LEN);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:05 +01:00
Chaitanya Kulkarni
07116ea50f nvmet: use invalid cmd opcode helper
In the NVMeOF block device backend, file backend, and passthru backend
we reject and report the commands if opcode is not handled.

Use the previously introduced helper in the passthru backend to make the
error message uniform.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:05 +01:00
Chaitanya Kulkarni
1c2c761368 nvmet: use invalid cmd opcode helper
In the NVMeOF block device backend, file backend, and passthru backend
we reject and report the commands if opcode is not handled.

Use the previously introduced helper in file backend to reduce the
duplicate code and make the error message uniform.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:05 +01:00
Chaitanya Kulkarni
d81d57cf1b nvmet: add helper to report invalid opcode
In the NVMeOF block device backend, file backend, and passthru backend
we reject and report the commands if opcode is not handled.

Add an helper and use it in block device backend to keep the code
and error message uniform.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:05 +01:00
Chaitanya Kulkarni
3999434b6c nvmet: remove extra variable in id-ns handler
In nvmet_execute_identify_ns() local variable ctrl is accessed only in
one place, remove that and directly use it from nvmet_req->sq->ctrl.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:04 +01:00
Chaitanya Kulkarni
3a1f7c79ae nvmet: make nvmet_find_namespace() req based
The six callers of nvmet_find_namespace() duplicate the error log page
update and status setting code for each call on failure.

All callers are nvmet requests based functions, so we can pass req
to the nvmet_find_namesapce() & derive ctrl from req, that'll allow us
to update the error log page in nvmet_find_namespace(). Now that we
pass the request we can also get rid of the local variable in
nvmet_find_namespace() and use the req->ns and return the error code.

Replace the ctrl parameter with nvmet_req for nvmet_find_namespace(),
centralize the error log page update for non allocated namesapces, and
return uniform error for non-allocated namespace.

The nvmet_find_namespace() takes nsid parameter which is from NVMe
commands structures such as get_log_page, identify, rw and common. All
these commands have same offset for the nsid field.

Derive nsid from req->cmd->common.nsid) & remove the extra parameter
from the nvmet_find_namespace().

Lastly now we associate the ns to the req parameter that we pass to the
nvmet_find_namespace(), rename nvmet_find_namespace() to
nvmet_req_find_ns().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:04 +01:00
Chaitanya Kulkarni
aa0aff604a nvmet: return uniform error for invalid ns
For nvmet_find_namespace() error case we have inconsistent error code
mapping in the function nvmet_get_smart_log_nsid() and
nvmet_set_feat_write_protect().

There is no point in retrying for the invalid namesapce from the host
side. Set the error code to the NVME_SC_INVALID_NS | NVME_SC_DNR which
matches what we have in nvmet_execute_identify_desclist().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:04 +01:00
Chaitanya Kulkarni
40244ad36b nvmet: set status to 0 in case for invalid nsid
For unallocated namespace in nvmet_execute_identify_ns() don't set the
status to NVME_SC_INVALID_NS, set it to zero.

Fixes: bffcd50778 ("nvmet: set right status on error in id-ns handler")
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:04 +01:00
Christoph Hellwig
b5df8e79a2 nvmet-fc: add a missing __rcu annotation to nvmet_fc_tgt_assoc.queues
Make sparse happy after the recent conversion to RCU lookups.

Fixes: 4e2f02bf77 ("nvmet-fc: use RCU proctection for assoc_list")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
2021-02-10 16:38:04 +01:00
Keith Busch
73a1a2298f nvme-multipath: set nr_zones for zoned namespaces
The bio based drivers only require the request_queue's nr_zones is set,
so set this field in the head if the namespace path is zoned.

Fixes: 240e6ee272 ("nvme: support for zoned namespaces")
Reported-by: Minwoo Im <minwoo.im.dev@gmail.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:04 +01:00
Sagi Grimberg
0fbcfb089a nvmet-tcp: fix potential race of tcp socket closing accept_work
When we accept a TCP connection and allocate an nvmet-tcp queue we should
make sure not to fully establish it or reference it as the connection may
be already closing, which triggers queue release work, which does not
fence against queue establishment.

In order to address such a race, we make sure to check the sk_state and
contain the queue reference to be done underneath the sk_callback_lock
such that the queue release work correctly fences against it.

Fixes: 872d26a391 ("nvmet-tcp: add NVMe over TCP target driver")
Reported-by: Elad Grupi <elad.grupi@dell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:03 +01:00
Sagi Grimberg
fda871c0ba nvmet-tcp: fix receive data digest calculation for multiple h2cdata PDUs
When a host sends multiple h2cdata PDUs for a single command, we
should verify the data digest calculation per PDU and not
per command.

Fixes: 872d26a391 ("nvmet-tcp: add NVMe over TCP target driver")
Reported-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com>
Tested-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:03 +01:00
Chao Leng
62eca39722 nvme-rdma: handle nvme_rdma_post_send failures better
nvme_rdma_post_send failing is a path related error and should bounce
to another path when using nvme-multipath.  Call nvme_host_path_error
when nvme_rdma_post_send returns -EIO to ensure nvme_complete_rq gets
invoked to fail over to another path if there is one.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:03 +01:00
Chao Leng
ea5e5f42cd nvme-fabrics: avoid double completions in nvmf_fail_nonready_command
When reconnecting, the request may be completed with
NVME_SC_HOST_PATH_ERROR in nvmf_fail_nonready_command, which currently
set the state of the request to MQ_RQ_IN_FLIGHT before calling
nvme_complete_rq.  When this happens for a request that is freed by
the caller, such as nvme_submit_user_cmd, in the worst case the request
could be completed again in tear down process.

Instead of calling blk_mq_start_request from nvmf_fail_nonready_command,
just use the new nvme_host_path_error helper to complete the command
without starting it.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:03 +01:00
Chao Leng
dda3248e7f nvme: introduce a nvme_host_path_error helper
When using nvme native multipathing, if a path related error occurs
during ->queue_rq, the request needs to be completed with
NVME_SC_HOST_PATH_ERROR so that the request can be failed over.

Introduce a helper to complete the command from ->queue_rq in a wait
that invokes nvme_complete_rq.

Signed-off-by: Chao Leng <lengchao@huawei.com>
[hch: renamed, added a return value to clean up the callers a bit]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:03 +01:00
Chao Leng
83fba8c811 blk-mq: introduce blk_mq_set_request_complete
nvme drivers need to set the state of request to MQ_RQ_COMPLETE when
directly complete request in queue_rq.
So add blk_mq_set_request_complete.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:02 +01:00
Jiapeng Chong
f720a8edbc nvme: convert sysfs sprintf/snprintf family to sysfs_emit
Fix the following coccicheck warning:

./drivers/nvme/host/core.c:3580:8-16: WARNING: use scnprintf or sprintf.
./drivers/nvme/host/core.c:3570:8-16: WARNING: use scnprintf or sprintf.
./drivers/nvme/host/core.c:3560:8-16: WARNING: use scnprintf or sprintf.
./drivers/nvme/host/core.c:3526:8-16: WARNING: use scnprintf or sprintf.
./drivers/nvme/host/core.c:2833:8-16: WARNING: use scnprintf or sprintf.

Reported-by: Abaci Robot<abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:02 +01:00
Andy Lutomirski
ca24728378 x86/fault: Don't run fixups for SMAP violations
A SMAP-violating kernel access is not a recoverable condition.  Imagine
kernel code that, outside of a uaccess region, dereferences a pointer to
the user range by accident.  If SMAP is on, this will reliably generate
as an intentional user access.  This makes it easy for bugs to be
overlooked if code is inadequately tested both with and without SMAP.

This was discovered because BPF can generate invalid accesses to user
memory, but those warnings only got printed if SMAP was off. Make it so
that this type of error will be discovered with SMAP on as well.

 [ bp: Massage commit message. ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/66a02343624b1ff46f02a838c497fc05c1a871b3.1612924255.git.luto@kernel.org
2021-02-10 16:27:57 +01:00
Christoph Hellwig
f885056a48 mm: simplify swapdev_block
Open code the parts of map_swap_entry that was actually used by
swapdev_block, and remove the now unused map_swap_entry function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 08:23:04 -07:00
Joe Perches
6751c1e3cf bcache: Avoid comma separated statements
Use semicolons and braces.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 08:06:00 -07:00
Kai Krakow
afe78ab46f bcache: Move journal work to new flush wq
This is potentially long running and not latency sensitive, let's get
it out of the way of other latency sensitive events.

As observed in the previous commit, the `system_wq` comes easily
congested by bcache, and this fixes a few more stalls I was observing
every once in a while.

Let's not make this `WQ_MEM_RECLAIM` as it showed to reduce performance
of boot and file system operations in my tests. Also, without
`WQ_MEM_RECLAIM`, I no longer see desktop stalls. This matches the
previous behavior as `system_wq` also does no memory reclaim:

> // workqueue.c:
> system_wq = alloc_workqueue("events", 0, 0);

Cc: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.4+
Signed-off-by: Kai Krakow <kai@kaishome.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 08:06:00 -07:00
Kai Krakow
d797bd9897 bcache: Give btree_io_wq correct semantics again
Before killing `btree_io_wq`, the queue was allocated using
`create_singlethread_workqueue()` which has `WQ_MEM_RECLAIM`. After
killing it, it no longer had this property but `system_wq` is not
single threaded.

Let's combine both worlds and make it multi threaded but able to
reclaim memory.

Cc: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.4+
Signed-off-by: Kai Krakow <kai@kaishome.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 08:06:00 -07:00
Kai Krakow
9f233ffe02 Revert "bcache: Kill btree_io_wq"
This reverts commit 56b30770b2.

With the btree using the `system_wq`, I seem to see a lot more desktop
latency than I should.

After some more investigation, it looks like the original assumption
of 56b3077 no longer is true, and bcache has a very high potential of
congesting the `system_wq`. In turn, this introduces laggy desktop
performance, IO stalls (at least with btrfs), and input events may be
delayed.

So let's revert this. It's important to note that the semantics of
using `system_wq` previously mean that `btree_io_wq` should be created
before and destroyed after other bcache wqs to keep the same
assumptions.

Cc: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.4+
Signed-off-by: Kai Krakow <kai@kaishome.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 08:06:00 -07:00
Kai Krakow
d7fae7b4fa bcache: Fix register_device_aync typo
Should be `register_device_async`.

Cc: Coly Li <colyli@suse.de>
Signed-off-by: Kai Krakow <kai@kaishome.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 08:06:00 -07:00
dongdong tao
71dda2a562 bcache: consider the fragmentation when update the writeback rate
Current way to calculate the writeback rate only considered the
dirty sectors, this usually works fine when the fragmentation
is not high, but it will give us unreasonable small rate when
we are under a situation that very few dirty sectors consumed
a lot dirty buckets. In some case, the dirty bucekts can reached
to CUTOFF_WRITEBACK_SYNC while the dirty data(sectors) not even
reached the writeback_percent, the writeback rate will still
be the minimum value (4k), thus it will cause all the writes to be
stucked in a non-writeback mode because of the slow writeback.

We accelerate the rate in 3 stages with different aggressiveness,
the first stage starts when dirty buckets percent reach above
BCH_WRITEBACK_FRAGMENT_THRESHOLD_LOW (50), the second is
BCH_WRITEBACK_FRAGMENT_THRESHOLD_MID (57), the third is
BCH_WRITEBACK_FRAGMENT_THRESHOLD_HIGH (64). By default
the first stage tries to writeback the amount of dirty data
in one bucket (on average) in (1 / (dirty_buckets_percent - 50)) second,
the second stage tries to writeback the amount of dirty data in one bucket
in (1 / (dirty_buckets_percent - 57)) * 100 millisecond, the third
stage tries to writeback the amount of dirty data in one bucket in
(1 / (dirty_buckets_percent - 64)) millisecond.

the initial rate at each stage can be controlled by 3 configurable
parameters writeback_rate_fp_term_{low|mid|high}, they are by default
1, 10, 1000, the hint of IO throughput that these values are trying
to achieve is described by above paragraph, the reason that
I choose those value as default is based on the testing and the
production data, below is some details:

A. When it comes to the low stage, there is still a bit far from the 70
   threshold, so we only want to give it a little bit push by setting the
   term to 1, it means the initial rate will be 170 if the fragment is 6,
   it is calculated by bucket_size/fragment, this rate is very small,
   but still much reasonable than the minimum 8.
   For a production bcache with unheavy workload, if the cache device
   is bigger than 1 TB, it may take hours to consume 1% buckets,
   so it is very possible to reclaim enough dirty buckets in this stage,
   thus to avoid entering the next stage.

B. If the dirty buckets ratio didn't turn around during the first stage,
   it comes to the mid stage, then it is necessary for mid stage
   to be more aggressive than low stage, so i choose the initial rate
   to be 10 times more than low stage, that means 1700 as the initial
   rate if the fragment is 6. This is some normal rate
   we usually see for a normal workload when writeback happens
   because of writeback_percent.

C. If the dirty buckets ratio didn't turn around during the low and mid
   stages, it comes to the third stage, and it is the last chance that
   we can turn around to avoid the horrible cutoff writeback sync issue,
   then we choose 100 times more aggressive than the mid stage, that
   means 170000 as the initial rate if the fragment is 6. This is also
   inferred from a production bcache, I've got one week's writeback rate
   data from a production bcache which has quite heavy workloads,
   again, the writeback is triggered by the writeback percent,
   the highest rate area is around 100000 to 240000, so I believe this
   kind aggressiveness at this stage is reasonable for production.
   And it should be mostly enough because the hint is trying to reclaim
   1000 bucket per second, and from that heavy production env,
   it is consuming 50 bucket per second on average in one week's data.

Option writeback_consider_fragment is to control whether we want
this feature to be on or off, it's on by default.

Lastly, below is the performance data for all the testing result,
including the data from production env:
https://docs.google.com/document/d/1AmbIEa_2MhB9bqhC3rfga9tp7n9YX9PLn0jSUxscVW0/edit?usp=sharing

Signed-off-by: dongdong tao <dongdong.tao@canonical.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 08:05:59 -07:00
Stefan Ursella
1ebe718bb4 usb: quirks: add quirk to start video capture on ELMO L-12F document camera reliable
Without this quirk starting a video capture from the device often fails with

kernel: uvcvideo: Failed to set UVC probe control : -110 (exp. 34).

Signed-off-by: Stefan Ursella <stefan.ursella@wolfvision.net>
Link: https://lore.kernel.org/r/20210210140713.18711-1-stefan.ursella@wolfvision.net
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-10 15:59:27 +01:00
Greg Kroah-Hartman
c85bfed171 USB-serial updates for 5.12-rc1
Here are the USB-serial updates for 5.12-rc1, including:
 
  - a line-speed fix for newer pl2303 devices
  - a line-speed fix for FTDI FT-X devices
  - a new xr_serial driver for MaxLinear/Exar devices (non-ACM mode)
  - a cdc-acm blacklist entry for when the xr_serial driver is enabled
  - cp210x support for software flow control
  - various cp210x modem-control fixes
  - an updated ZTE P685M modem entry to stop claiming the QMI interface
  - an update to drop the port_remove() driver-callback return value
 
 Included are also various clean ups.
 
 All have been in linux-next with no reported issues.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQHbPq+cpGvN/peuzMLxc3C7H1lCAUCYCPxBQAKCRALxc3C7H1l
 CP1iAQCRn7/4ulkGXgSjVL2o8TfGAQRhvxL14qtzysOyPLwgrAD6ApuJdPRHnetL
 q0TDaRqnXqVTV6uUfoSC5eVEF4dS/Qs=
 =5ZLn
 -----END PGP SIGNATURE-----

Merge tag 'usb-serial-5.12-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-next

Johan writes:

USB-serial updates for 5.12-rc1

Here are the USB-serial updates for 5.12-rc1, including:

 - a line-speed fix for newer pl2303 devices
 - a line-speed fix for FTDI FT-X devices
 - a new xr_serial driver for MaxLinear/Exar devices (non-ACM mode)
 - a cdc-acm blacklist entry for when the xr_serial driver is enabled
 - cp210x support for software flow control
 - various cp210x modem-control fixes
 - an updated ZTE P685M modem entry to stop claiming the QMI interface
 - an update to drop the port_remove() driver-callback return value

Included are also various clean ups.

All have been in linux-next with no reported issues.

* tag 'usb-serial-5.12-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial: (41 commits)
  USB: serial: drop bogus to_usb_serial_port() checks
  USB: serial: make remove callback return void
  USB: serial: drop if with an always false condition
  USB: serial: option: update interface mapping for ZTE P685M
  USB: serial: ftdi_sio: restore divisor-encoding comments
  USB: serial: ftdi_sio: fix FTX sub-integer prescaler
  USB: serial: cp210x: clean up auto-RTS handling
  USB: serial: cp210x: fix RTS handling
  USB: serial: cp210x: clean up printk zero padding
  USB: serial: cp210x: clean up flow-control debug message
  USB: serial: cp210x: drop shift macros
  USB: serial: cp210x: fix modem-control handling
  USB: serial: cp210x: suppress modem-control errors
  USB: serial: mos7720: fix error code in mos7720_write()
  USB: serial: xr: fix B0 handling
  USB: serial: xr: fix pin configuration
  USB: serial: xr: fix gpio-mode handling
  USB: serial: xr: simplify line-speed logic
  USB: serial: xr: clean up line-settings handling
  USB: serial: xr: document vendor-request recipient
  ...
2021-02-10 15:58:04 +01:00
Damien Le Moal
78e1663f19 sd_zbc: clear zone resources for non-zoned case
For host-aware ZBC disk, setting the device zoned model to BLK_ZONED_HA
using blk_queue_set_zoned() in sd_read_block_characteristics() may
result in the block device effective zoned model to be "none"
(BLK_ZONED_NONE) if partitions are present on the device. In this case,
sd_zbc_read_zones() should not setup the zone related queue limits for
the disk so that the device limits and configuration is consistent with
a regular disk and resources not uselessly allocated (e.g. the zone
write pointer tracking array for zone append emulation).

Furthermore, if the disk zoned model changes at run time due to the
creation of a partition by the user, the zone related resources can be
released.

Fix both problems by introducing the function sd_zbc_clear_zone_info()
to reset the scsi disk zone information and free resources and by
returning early in sd_zbc_read_zones() for a block device that has a
zoned model equal to BLK_ZONED_NONE.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:44:41 -07:00
Damien Le Moal
508aebb805 block: introduce blk_queue_clear_zone_settings()
Introduce the internal function blk_queue_clear_zone_settings() to
cleanup all limits and resources related to zoned block devices. This
new function is called from blk_queue_set_zoned() when a disk zoned
model is set to BLK_ZONED_NONE. This particular case can happens when a
partition is created on a host-aware scsi disk.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:44:41 -07:00
Damien Le Moal
0f1ba5f5d8 zonefs: use zone write granularity as block size
Zoned block devices have different granularity constraints for write
operations into sequential zones. E.g. ZBC and ZAC devices require that
writes be aligned to the device physical block size while NVMe ZNS
devices allow logical block size aligned write operations. To correctly
handle such difference, use the device zone write granularity limit to
set the block size of a zonefs volume, thus allowing the smallest
possible write unit for all zoned device types.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:44:41 -07:00
Damien Le Moal
a805a4fa4f block: introduce zone_write_granularity limit
Per ZBC and ZAC specifications, host-managed SMR hard-disks mandate that
all writes into sequential write required zones be aligned to the device
physical block size. However, NVMe ZNS does not have this constraint and
allows write operations into sequential zones to be aligned to the
device logical block size. This inconsistency does not help with
software portability across device types.

To solve this, introduce the zone_write_granularity queue limit to
indicate the alignment constraint, in bytes, of write operations into
zones of a zoned block device. This new limit is exported as a
read-only sysfs queue attribute and the helper
blk_queue_zone_write_granularity() introduced for drivers to set this
limit.

The function blk_queue_set_zoned() is modified to set this new limit to
the device logical block size by default. NVMe ZNS devices as well as
zoned nullb devices use this default value as is. The scsi disk driver
is modified to execute the blk_queue_zone_write_granularity() helper to
set the zone write granularity of host-managed SMR disks to the disk
physical block size.

The accessor functions queue_zone_write_granularity() and
bdev_zone_write_granularity() are also introduced.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:44:40 -07:00
Damien Le Moal
eafc63a9f7 block: use blk_queue_set_zoned in add_partition()
When changing the zoned model of host-aware zoned block devices, use
blk_queue_set_zoned() instead of directly assigning the gendisk queue
zoned limit.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:44:40 -07:00
Damien Le Moal
5752dc78a1 nullb: use blk_queue_set_zoned() to setup zoned devices
Use blk_queue_set_zoned() to set a nullb device zone model instead of
directly assigning the device queue zoned limit. This initialization of
the devicve zoned model as well as the setup of the queue flag
QUEUE_FLAG_ZONE_RESETALL and of the device queue elevator feature are
moved from null_init_zoned_dev() to null_register_zoned_dev() so that
the initialization of the queue limits is done when the gendisk of the
nullb device is available.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:44:40 -07:00
Damien Le Moal
73d90386b5 nvme: cleanup zone information initialization
For a zoned namespace, in nvme_update_ns_info(), call
nvme_update_zone_info() after executing nvme_update_disk_info() so that
the namespace queue logical and physical block size limits are set.
This allows setting the namespace queue max_zone_append_sectors limit
in nvme_update_zone_info() instead of nvme_revalidate_zones(),
simplifying this function. Also use blk_queue_set_zoned() to set the
namespace zoned model.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:44:40 -07:00
Damien Le Moal
f1836426ce block: document zone_append_max_bytes attribute
The description of the zone_append_max_bytes sysfs queue attribute is
missing from Documentation/block/queue-sysfs.rst. Add it.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:44:40 -07:00
Jens Axboe
26bfa89e25 io_uring: place ring SQ/CQ arrays under memcg memory limits
Instead of imposing rlimit memlock limits for the rings themselves,
ensure that we account them properly under memcg with __GFP_ACCOUNT.
We retain rlimit memlock for registered buffers, this is just for the
ring arrays themselves.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:33:15 -07:00
Jens Axboe
91f245d5d5 io_uring: enable kmemcg account for io_uring requests
This puts io_uring under the memory cgroups accounting and limits for
requests.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:33:15 -07:00
Jens Axboe
c7dae4ba46 io_uring: enable req cache for IRQ driven IO
This is the last class of requests that cannot utilize the req alloc
cache. Add a per-ctx req cache that is protected by the completion_lock,
and refill our submit side cache when it gets over our batch count.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:33:12 -07:00
Hao Xu
ed670c3f90 io_uring: fix possible deadlock in io_uring_poll
Abaci reported follow issue:

[   30.615891] ======================================================
[   30.616648] WARNING: possible circular locking dependency detected
[   30.617423] 5.11.0-rc3-next-20210115 #1 Not tainted
[   30.618035] ------------------------------------------------------
[   30.618914] a.out/1128 is trying to acquire lock:
[   30.619520] ffff88810b063868 (&ep->mtx){+.+.}-{3:3}, at: __ep_eventpoll_poll+0x9f/0x220
[   30.620505]
[   30.620505] but task is already holding lock:
[   30.621218] ffff88810e952be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0
[   30.622349]
[   30.622349] which lock already depends on the new lock.
[   30.622349]
[   30.623289]
[   30.623289] the existing dependency chain (in reverse order) is:
[   30.624243]
[   30.624243] -> #1 (&ctx->uring_lock){+.+.}-{3:3}:
[   30.625263]        lock_acquire+0x2c7/0x390
[   30.625868]        __mutex_lock+0xae/0x9f0
[   30.626451]        io_cqring_overflow_flush.part.95+0x6d/0x70
[   30.627278]        io_uring_poll+0xcb/0xd0
[   30.627890]        ep_item_poll.isra.14+0x4e/0x90
[   30.628531]        do_epoll_ctl+0xb7e/0x1120
[   30.629122]        __x64_sys_epoll_ctl+0x70/0xb0
[   30.629770]        do_syscall_64+0x2d/0x40
[   30.630332]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   30.631187]
[   30.631187] -> #0 (&ep->mtx){+.+.}-{3:3}:
[   30.631985]        check_prevs_add+0x226/0xb00
[   30.632584]        __lock_acquire+0x1237/0x13a0
[   30.633207]        lock_acquire+0x2c7/0x390
[   30.633740]        __mutex_lock+0xae/0x9f0
[   30.634258]        __ep_eventpoll_poll+0x9f/0x220
[   30.634879]        __io_arm_poll_handler+0xbf/0x220
[   30.635462]        io_issue_sqe+0xa6b/0x13e0
[   30.635982]        __io_queue_sqe+0x10b/0x550
[   30.636648]        io_queue_sqe+0x235/0x470
[   30.637281]        io_submit_sqes+0xcce/0xf10
[   30.637839]        __x64_sys_io_uring_enter+0x3fb/0x5b0
[   30.638465]        do_syscall_64+0x2d/0x40
[   30.638999]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   30.639643]
[   30.639643] other info that might help us debug this:
[   30.639643]
[   30.640618]  Possible unsafe locking scenario:
[   30.640618]
[   30.641402]        CPU0                    CPU1
[   30.641938]        ----                    ----
[   30.642664]   lock(&ctx->uring_lock);
[   30.643425]                                lock(&ep->mtx);
[   30.644498]                                lock(&ctx->uring_lock);
[   30.645668]   lock(&ep->mtx);
[   30.646321]
[   30.646321]  *** DEADLOCK ***
[   30.646321]
[   30.647642] 1 lock held by a.out/1128:
[   30.648424]  #0: ffff88810e952be8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_enter+0x3f0/0x5b0
[   30.649954]
[   30.649954] stack backtrace:
[   30.650592] CPU: 1 PID: 1128 Comm: a.out Not tainted 5.11.0-rc3-next-20210115 #1
[   30.651554] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[   30.652290] Call Trace:
[   30.652688]  dump_stack+0xac/0xe3
[   30.653164]  check_noncircular+0x11e/0x130
[   30.653747]  ? check_prevs_add+0x226/0xb00
[   30.654303]  check_prevs_add+0x226/0xb00
[   30.654845]  ? add_lock_to_list.constprop.49+0xac/0x1d0
[   30.655564]  __lock_acquire+0x1237/0x13a0
[   30.656262]  lock_acquire+0x2c7/0x390
[   30.656788]  ? __ep_eventpoll_poll+0x9f/0x220
[   30.657379]  ? __io_queue_proc.isra.88+0x180/0x180
[   30.658014]  __mutex_lock+0xae/0x9f0
[   30.658524]  ? __ep_eventpoll_poll+0x9f/0x220
[   30.659112]  ? mark_held_locks+0x5a/0x80
[   30.659648]  ? __ep_eventpoll_poll+0x9f/0x220
[   30.660229]  ? _raw_spin_unlock_irqrestore+0x2d/0x40
[   30.660885]  ? trace_hardirqs_on+0x46/0x110
[   30.661471]  ? __io_queue_proc.isra.88+0x180/0x180
[   30.662102]  ? __ep_eventpoll_poll+0x9f/0x220
[   30.662696]  __ep_eventpoll_poll+0x9f/0x220
[   30.663273]  ? __ep_eventpoll_poll+0x220/0x220
[   30.663875]  __io_arm_poll_handler+0xbf/0x220
[   30.664463]  io_issue_sqe+0xa6b/0x13e0
[   30.664984]  ? __lock_acquire+0x782/0x13a0
[   30.665544]  ? __io_queue_proc.isra.88+0x180/0x180
[   30.666170]  ? __io_queue_sqe+0x10b/0x550
[   30.666725]  __io_queue_sqe+0x10b/0x550
[   30.667252]  ? __fget_files+0x131/0x260
[   30.667791]  ? io_req_prep+0xd8/0x1090
[   30.668316]  ? io_queue_sqe+0x235/0x470
[   30.668868]  io_queue_sqe+0x235/0x470
[   30.669398]  io_submit_sqes+0xcce/0xf10
[   30.669931]  ? xa_load+0xe4/0x1c0
[   30.670425]  __x64_sys_io_uring_enter+0x3fb/0x5b0
[   30.671051]  ? lockdep_hardirqs_on_prepare+0xde/0x180
[   30.671719]  ? syscall_enter_from_user_mode+0x2b/0x80
[   30.672380]  do_syscall_64+0x2d/0x40
[   30.672901]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   30.673503] RIP: 0033:0x7fd89c813239
[   30.673962] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05  3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48
[   30.675920] RSP: 002b:00007ffc65a7c628 EFLAGS: 00000217 ORIG_RAX: 00000000000001aa
[   30.676791] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd89c813239
[   30.677594] RDX: 0000000000000000 RSI: 0000000000000014 RDI: 0000000000000003
[   30.678678] RBP: 00007ffc65a7c720 R08: 0000000000000000 R09: 0000000003000000
[   30.679492] R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000400ff0
[   30.680282] R13: 00007ffc65a7c840 R14: 0000000000000000 R15: 0000000000000000

This might happen if we do epoll_wait on a uring fd while reading/writing
the former epoll fd in a sqe in the former uring instance.
So let's don't flush cqring overflow list, just do a simple check.

Reported-by: Abaci <abaci@linux.alibaba.com>
Fixes: 6c503150ae ("io_uring: patch up IOPOLL overflow_flush sync")
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:28:44 -07:00
Pavel Begunkov
e5d1bc0a91 io_uring: defer flushing cached reqs
Awhile there are requests in the allocation cache -- use them, only if
those ended go for the stashed memory in comp.free_list. As list
manipulation are generally heavy and are not good for caches, flush them
all or as much as can in one go.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
[axboe: return success/failure from io_flush_cached_reqs()]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:28:43 -07:00
Pavel Begunkov
c5eef2b944 io_uring: take comp_state from ctx
__io_queue_sqe() is always called with a non-NULL comp_state, which is
taken directly from context. Don't pass it around but infer from ctx.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:28:43 -07:00
Jens Axboe
65453d1efb io_uring: enable req cache for task_work items
task_work is run without utilizing the req alloc cache, so any deferred
items don't get to take advantage of either the alloc or free side of it.
With task_work now being wrapped by io_uring, we can use the ctx
completion state to both use the req cache and the completion flush
batching.

With this, the only request type that cannot take advantage of the req
cache is IRQ driven IO for regular files / block devices. Anything else,
including IOPOLL polled IO to those same tyes, will take advantage of it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:28:43 -07:00
Jens Axboe
7cbf1722d5 io_uring: provide FIFO ordering for task_work
task_work is a LIFO list, due to how it's implemented as a lockless
list. For long chains of task_work, this can be problematic as the
first entry added is the last one processed. Similarly, we'd waste
a lot of CPU cycles reversing this list.

Wrap the task_work so we have a single task_work entry per task per
ctx, and use that to run it in the right order.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:28:43 -07:00
Jens Axboe
1b4c351f6e io_uring: use persistent request cache
Now that we have the submit_state in the ring itself, we can have io_kiocb
allocations that are persistent across invocations. This reduces the time
spent doing slab allocations and frees.

[sil: rebased]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:28:43 -07:00
Pavel Begunkov
6ff119a6e4 io_uring: feed reqs back into alloc cache
Make io_req_free_batch(), which is used for inline executed requests and
IOPOLL, to return requests back into the allocation cache, so avoid
most of kmalloc()/kfree() for those cases.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:28:43 -07:00
Pavel Begunkov
bf019da7fc io_uring: persistent req cache
Don't free batch-allocated requests across syscalls.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:28:43 -07:00
Pavel Begunkov
9ae7246321 io_uring: count ctx refs separately from reqs
Currently batch free handles request memory freeing and ctx ref putting
together. Separate them and use different counters, that will be needed
for reusing reqs memory.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:28:43 -07:00
Pavel Begunkov
3893f39f22 io_uring: remove fallback_req
Remove fallback_req for now, it gets in the way of other changes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:28:43 -07:00