linux

Author	SHA1	Message	Date
Chaitanya Kulkarni	76574f37bf	nvmet: add interface to update error-log page This patch adds nvmet_req based interface to the nvmet-core so that we can update the error log page. We update error log page in the request completion path when status is not set to NVME_SC_SUCCESS. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:59:03 +01:00
Chaitanya Kulkarni	e4a976254e	nvmet: add error-log definitions This patch adds necessary fields in the target data structures to support error log page. For a target controller, we add a new error log field to maintain the error log, at any given point we maintain error entries equal to NVMET_ERROR_LOG_SLOTS for each controller. In the following patch, we also update the error log page entry in the I/O completion path so we introduce a spinlock for synchronization of the log. For nvmet_req, we add a new field error_loc to hold the location of the error in the command when the actual error occurs for each request and a starting LBA if applicable. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:59:02 +01:00
Chaitanya Kulkarni	b34de7cee0	nvme: add error log page slot definition This patch adds the NVMe error slot definition from the spec. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:59:01 +01:00
Chaitanya Kulkarni	b7c8f3663d	nvme: remove nvme_common command cdw10 array This is a preparation patch which removes the nvme common command cdw10 array and replace with individual fields. This is needed for the nvmet error log page implementation make is error log page entry offset assignment easier. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:59:01 +01:00
Sagi Grimberg	16d3a280d4	nvmet: remove unused variable Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:59:00 +01:00
Jens Axboe	cb5b7262b0	nvme: provide fallback for discard alloc failure When boxes are run near (or to) OOM, we have a problem with the discard page allocation in nvme. If we fail allocating the special page, we return busy, and it'll get retried. But since ordering is honored for dispatch requests, we can keep retrying this same IO and failing. Behind that IO could be requests that want to free memory, but they never get the chance. Allocate a fixed discard page per controller for a safe fallback, and use that if the initial allocation fails. Signed-off-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:59:00 +01:00
Chengguang Xu	8eb5d89f48	nvme: add __exit annotation Add __exit annotation to cleanup helper which is only called once in the module. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:58:59 +01:00
Sagi Grimberg	3f2304f8c6	nvme-tcp: add NVMe over TCP host driver This patch implements the NVMe over TCP host driver. It can be used to connect to remote NVMe over Fabrics subsystems over good old TCP/IP. The driver implements the TP 8000 of how nvme over fabrics capsules and data are encapsulated in nvme-tcp pdus and exchaged on top of a TCP byte stream. nvme-tcp header and data digest are supported as well. To connect to all NVMe over Fabrics controllers reachable on a given taget port over TCP use the following command: nvme connect-all -t tcp -a $IPADDR This requires the latest version of nvme-cli with TCP support. Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Roy Shterman <roys@lightbitslabs.com> Signed-off-by: Solganik Alexander <sashas@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:58:58 +01:00
Sagi Grimberg	ad4f530e95	nvmet: allow configfs tcp trtype configuration Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:58:58 +01:00
Sagi Grimberg	872d26a391	nvmet-tcp: add NVMe over TCP target driver This patch implements the TCP transport driver for the NVMe over Fabrics target stack. This allows exporting NVMe over Fabrics functionality over good old TCP/IP. The driver implements the TP 8000 of how nvme over fabrics capsules and data are encapsulated in nvme-tcp pdus and exchaged on top of a TCP byte stream. nvme-tcp header and data digest are supported as well. Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Roy Shterman <roys@lightbitslabs.com> Signed-off-by: Solganik Alexander <sashas@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:58:57 +01:00
Sagi Grimberg	fc221d0544	nvme-tcp: Add protocol header Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:58:57 +01:00
Sagi Grimberg	20d44e8632	nvme-fabrics: allow user passing data digest Data digest is a nvme-tcp specific feature, but nothing prevents other transports reusing the concept so do not associate with tcp transport solely. Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:58:56 +01:00
Sagi Grimberg	3b49fa8072	nvme-fabrics: allow user passing header digest Header digest is a nvme-tcp specific feature, but nothing prevents other transports reusing the concept so do not associate with tcp transport solely. Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:58:56 +01:00
Sagi Grimberg	1672ddb8d6	nvmet: Add install_queue callout nvmet-tcp will implement it to allocate queue commands which are only known at nvmf connect time (sq size). Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:58:55 +01:00
Sagi Grimberg	65d69e2505	datagram: introduce skb_copy_and_hash_datagram_iter helper Introduce a helper to copy datagram into an iovec iterator but also update a predefined hash. This is useful for consumers of skb_copy_datagram_iter to also support inflight data digest without having to finish to copy and only then traverse the iovec and calculate the digest hash. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:58:55 +01:00
Sagi Grimberg	d05f443554	iov_iter: introduce hash_and_copy_to_iter helper Allow consumers that want to use iov iterator helpers and also update a predefined hash calculation online when copying data. This is useful when copying incoming network buffers to a local iterator and calculate a digest on the incoming stream. nvme-tcp host driver that will be introduced in following patches is the first consumer via skb_copy_and_hash_datagram_iter. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:58:54 +01:00
Sagi Grimberg	950fcaecd5	datagram: consolidate datagram copy to iter helpers skb_copy_datagram_iter and skb_copy_and_csum_datagram are essentialy the same but with a couple of differences: The first is the copy operation used which either a simple copy or a csum_and_copy, and the second are the behavior on the "short copy" path where simply copy needs to return the number of bytes successfully copied while csum_and_copy needs to fault immediately as the checksum is partial. Introduce __skb_datagram_iter that additionally accepts: 1. copy operation function pointer 2. private data that goes with the copy operation 3. fault_short flag to indicate the action on short copy Suggested-by: David S. Miller <davem@davemloft.net> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:58:53 +01:00
Sagi Grimberg	cb002d074d	iov_iter: pass void csum pointer to csum_and_copy_to_iter The single caller to csum_and_copy_to_iter is skb_copy_and_csum_datagram and we are trying to unite its logic with skb_copy_datagram_iter by passing a callback to the copy function that we want to apply. Thus, we need to make the checksum pointer private to the function. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:58:53 +01:00
Sagi Grimberg	0fc07791bc	datagram: open-code copy_page_to_iter This will be useful to consolidate skb_copy_and_hash_datagram_iter and skb_copy_and_csum_datagram to a single code path. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:58:52 +01:00
Sagi Grimberg	3152a97467	ath6kl: add ath6kl_ prefix to crypto_type Prevent a namespace conflict as in following patches as skbuff.h will include the crypto API. Acked-by: David S. Miller <davem@davemloft.net> Cc: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-13 09:58:52 +01:00
Dennis Zhou	0273ac349f	blkcg: handle dying request_queue when associating a blkg Between v3 [1] and v4 [2] of the blkg association series, the association point moved from generic_make_request_checks(), which is called after the request enters the queue, to bio_set_dev(), which is when the bio is formed before submit_bio(). When the request_queue goes away, the blkgs supporting the request_queue are destroyed and then the q->root_blkg is set to %NULL. This patch adds a %NULL check to blkg_tryget_closest() to prevent the NPE caused by the above. It also adds a guard to see if the request_queue is dying when creating a blkg to prevent creating a blkg for a dead request_queue. [1] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/ [2] https://lore.kernel.org/lkml/20181126211946.77067-1-dennis@kernel.org/ Fixes: `5cdf2e3fea` ("blkcg: associate blkg when associating a device") Reported-and-tested-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Dennis Zhou <dennis@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-12 17:43:33 -07:00
Ming Lei	544fbd16a4	block: deactivate blk_stat timer in wbt_disable_default() rwb_enabled() can't be changed when there is any inflight IO. wbt_disable_default() may set rwb->wb_normal as zero, however the blk_stat timer may still be pending, and the timer function will update wrb->wb_normal again. This patch introduces blk_stat_deactivate() and applies it in wbt_disable_default(), then the following IO hang triggered when running parted & switching io scheduler can be fixed: [ 369.937806] INFO: task parted:3645 blocked for more than 120 seconds. [ 369.938941] Not tainted 4.20.0-rc6-00284-g906c801e5248 #498 [ 369.939797] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.940768] parted D 0 3645 3239 0x00000000 [ 369.941500] Call Trace: [ 369.941874] ? __schedule+0x6d9/0x74c [ 369.942392] ? wbt_done+0x5e/0x5e [ 369.942864] ? wbt_cleanup_cb+0x16/0x16 [ 369.943404] ? wbt_done+0x5e/0x5e [ 369.943874] schedule+0x67/0x78 [ 369.944298] io_schedule+0x12/0x33 [ 369.944771] rq_qos_wait+0xb5/0x119 [ 369.945193] ? karma_partition+0x1c2/0x1c2 [ 369.945691] ? wbt_cleanup_cb+0x16/0x16 [ 369.946151] wbt_wait+0x85/0xb6 [ 369.946540] __rq_qos_throttle+0x23/0x2f [ 369.947014] blk_mq_make_request+0xe6/0x40a [ 369.947518] generic_make_request+0x192/0x2fe [ 369.948042] ? submit_bio+0x103/0x11f [ 369.948486] ? __radix_tree_lookup+0x35/0xb5 [ 369.949011] submit_bio+0x103/0x11f [ 369.949436] ? blkg_lookup_slowpath+0x25/0x44 [ 369.949962] submit_bio_wait+0x53/0x7f [ 369.950469] blkdev_issue_flush+0x8a/0xae [ 369.951032] blkdev_fsync+0x2f/0x3a [ 369.951502] do_fsync+0x2e/0x47 [ 369.951887] __x64_sys_fsync+0x10/0x13 [ 369.952374] do_syscall_64+0x89/0x149 [ 369.952819] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 369.953492] RIP: 0033:0x7f95a1e729d4 [ 369.953996] Code: Bad RIP value. [ 369.954456] RSP: 002b:00007ffdb570dd48 EFLAGS: 00000246 ORIG_RAX: 000000000000004a [ 369.955506] RAX: ffffffffffffffda RBX: 000055c2139c6be0 RCX: 00007f95a1e729d4 [ 369.956389] RDX: 0000000000000001 RSI: 0000000000001261 RDI: 0000000000000004 [ 369.957325] RBP: 0000000000000002 R08: 0000000000000000 R09: 000055c2139c6ce0 [ 369.958199] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c2139c0380 [ 369.959143] R13: 0000000000000004 R14: 0000000000000100 R15: 0000000000000008 Cc: stable@vger.kernel.org Cc: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-12 06:47:51 -07:00
Jens Axboe	b2dbff1bb8	sbitmap: flush deferred clears for resize and shallow gets We're missing a deferred clear off the shallow get, which can cause a hang. Additionally, when we resize the sbitmap, we should also flush deferred clears for good measure. Ensure we have full coverage on batch clears, even for paths where we would not be doing deferred clear. This makes it less error prone for future additions. Reported-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 18:39:41 -07:00
Igor Konopko	2c4d5356e6	lightnvm: pblk: do not overwrite ppa list with meta list Ehen using pblk with 0 sized metadata both ppa list and meta list points to the same memory since pblk_dma_meta_size() returns 0 in that case. This patch fix that issue by ensuring that pblk_dma_meta_size() always returns space equal to sizeof(struct pblk_sec_meta) and thus ppa list and meta list points to different memory address. Even that in that case drive does not really care about meta_list pointer, this is the easiest way to fix that issue without introducing changes in many places in the code just for 0 sized metadata case. The same approach needs to be also done for pblk_get_sec_meta() since we also cannot point to the same memory address in meta buffer when we are using it for pblk recovery process Reported-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Tested-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:35 -07:00
Igor Konopko	55d8ec3539	lightnvm: pblk: support packed metadata pblk performs recovery of open lines by storing the LBA in the per LBA metadata field. Recovery therefore only works for drives that has this field. This patch adds support for packed metadata, which store l2p mapping for open lines in last sector of every write unit and enables drives without per IO metadata to recover open lines. After this patch, drives with OOB size <16B will use packed metadata and metadata size larger than16B will continue to use the device per IO metadata. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:35 -07:00
Igor Konopko	a16816b9e4	lightnvm: disable interleaved metadata Currently pblk only check the size of I/O metadata and does not take into account if this metadata is in a separate buffer or interleaved in a single metadata buffer. In reality only the first scenario is supported, where second mode will break pblk functionality during any IO operation. This patch prevents pblk to be instantiated in case device only supports interleaved metadata. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:35 -07:00
Igor Konopko	24828d0536	lightnvm: dynamic DMA pool entry size Currently lightnvm and pblk uses single DMA pool, for which the entry size always is equal to PAGE_SIZE. The contents of each entry allocated from the DMA pool consists of a PPA list (8bytes * 64), leaving 56bytes * 64 space for metadata. Since the metadata field can be bigger, such as 128 bytes, the static size does not cover this use-case. This patch adds support for I/O metadata above 56 bytes by changing DMA pool size based on device meta size and allows pblk to use OOB metadata >=16B. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:35 -07:00
Igor Konopko	faa79f27f0	lightnvm: pblk: add helpers for OOB metadata pblk currently assumes that size of OOB metadata on drive is always equal to size of pblk_sec_meta struct. This commit add helpers which will allow to handle different sizes of OOB metadata on drive in the future. After this patch only OOB metadata equal to 16 bytes is supported. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:35 -07:00
Igor Konopko	dd439496df	lightnvm: pblk: move lba list to partial read context Currently DMA allocated memory is reused on partial read for lba_list_mem and lba_list_media arrays. In preparation for dynamic DMA pool sizes we need to move this arrays into pblk_pr_ctx structures. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Igor Konopko <igor.j.konopko@intel.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:34 -07:00
Javier González	42bd0384d7	lightnvm: pblk: avoid ref warning on cache creation The current kref implementation around pblk global caches triggers a false positive on refcount_inc_checked() (when called) as the kref is initialized to 0. Instead of usint kref_inc() on a 0 reference, which is in principle correct, use kref_init() to avoid the check. This is also more explicit about what actually happens on cache creation. In the process, do a small refactoring to use kref helpers. Fixes: `1864de94ec` "lightnvm: pblk: stop recreating global caches" Signed-off-by: Javier González <javier@cnexlabs.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:34 -07:00
Matias Bjørling	85136c0102	lightnvm: simplify geometry enumeration Currently the geometry of an OCSSD is enumerated using a two step approach: First, nvm_register is called, the OCSSD identify command is issued, and second the geometry sos and csecs values are read either from the OCSSD identify if it is a 1.2 drive, or from the NVMe namespace data structure if it is a 2.0 device. This patch recombines it into a single step, such that nvm_register can use the csecs and sos fields independent of which version is used. This enables one to dynamically size the lightnvm subsystem dma pool. Reviewed-by: Igor Konopko <igor.j.konopko@intel.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:34 -07:00
Javier González	361d889f83	lightnvm: pblk: add comments wrt locking in recovery path pblk's recovery path is single threaded and therefore a number of assumptions regarding concurrency can be made. To avoid confusion, make this explicit with a couple of comments in the code. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:34 -07:00
Hua Su	fde201a466	lightnvm: pblk: add lock protection to list operations Protect the list_add on the pblk_line_init_bb() error path in case this code is used for some other purpose in the future. Signed-off-by: Hua Su <suhua.tanke@gmail.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:34 -07:00
Hua Su	6e82f0ba00	lightnvm: pblk: fix spelling in comment Signed-off-by: Hua Su <suhua.tanke@gmail.com> Updated description. Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:34 -07:00
Hans Holmberg	e698d9f4e6	lightnvm: pblk: remove dead code in pblk_recov_l2p Remove the call to pblk_line_replace_data as it returns directly because we have not set l_mg->data_next yet. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:34 -07:00
Hans Holmberg	0934ce87b5	lightnvm: pblk: fix pblk_lines_init error handling path The chunk metadata is allocated with vmalloc, so we need to use vfree to free it. Fixes: `090ee26fd5` ("lightnvm: use internal allocation for chunk log page") Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:34 -07:00
Hans Holmberg	c9a1d640d5	lightnvm: pblk: remove unused macro ADDR_POOL_SIZE is not used anymore, so remove the macro. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:34 -07:00
Hans Holmberg	3bcebc5bac	lightnvm: pblk: set conservative threshold for user writes In a worst-case scenario (random writes), OP% of sectors in each line will be invalid, and we will then need to move data out of 100/OP% lines to free a single line. So, to prevent the possibility of running out of lines, temporarily block user writes when there is less than 100/OP% free lines. Also ensure that pblk creation does not produce instances with insufficient over provisioning. Insufficient over-provising is not a problem on real hardware, but often an issue when running QEMU simulations (with few lines). 100 lines is enough to create a sane instance with the standard (11%) over provisioning. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:33 -07:00
Hans Holmberg	525f7bb2c9	lightnvm: pblk: stop writes gracefully when running out of lines If mapping fails (i.e. when running out of lines), handle the error and stop writing. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:33 -07:00
Hans Holmberg	ab3887be1e	lightnvm: pblk: account for write error sectors in emeta Lines inflicted with write errors lines might be recovered if they have not been recycled after write error garbage collection. Ensure that the emeta accounting of valid lbas is correct for such lines to avoid recovery inconsistencies. Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:33 -07:00
Hans Holmberg	c12fa401ac	lightnvm: pblk: fix resubmission of overwritten write err lbas Make sure we only look up valid lba addresses on the resubmission path. If an lba is invalidated in the write buffer, that sector will be submitted to disk (as it is already mapped to a ppa), and that write might fail, resulting in a crash when trying to look up the lba in the mapping table (as the lba is marked as invalid). Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:33 -07:00
Hans Holmberg	96076f7dde	lightnvm: pblk: fix chunk close trace event check The check for chunk closes suffers from an off-by-one issue, leading to chunk close events not being traced. Fixes: `4c44abf43d` ("lightnvm: pblk: add trace events for chunk states") Signed-off-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:33 -07:00
Geert Uytterhoeven	55e58c5e78	lightnvm: Fix uninitialized return value in nvm_get_chunk_meta() With gcc 4.1: drivers/lightnvm/core.c: In function ‘nvm_get_bb_meta’: drivers/lightnvm/core.c:977: warning: ‘ret’ may be used uninitialized in this function and drivers/nvme/host/lightnvm.c: In function ‘nvme_nvm_get_chk_meta’: drivers/nvme/host/lightnvm.c:580: warning: ‘ret’ may be used uninitialized in this function Indeed, if (for the former) the number of channels or LUNs is zero, or (for both) the passed number of chunks is zero, ret will be returned uninitialized. Fix this by preinitializing ret to zero. Fixes: `aff3fb18f9` ("lightnvm: move bad block and chunk state logic to core") Fixes: `a294c19945` ("lightnvm: implement get log report chunk helpers") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:33 -07:00
Zhoujie Wu	f40a62d267	lightnvm: pblk: ignore the smeta oob area scan The smeta area l2p mapping is empty, and actually the recovery procedure only need to restore data sector's l2p mapping. So ignore the smeta oob scan. Signed-off-by: Zhoujie Wu <zjwu@marvell.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Hans Holmberg <hans.holmberg@cnexlabs.com> Signed-off-by: Matias Bjørling <mb@lightnvm.io> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 12:22:33 -07:00
Mike Snitzer	c4576aed8d	dm: fix request-based dm's use of dm_wait_for_completion The md->wait waitqueue is used by both bio-based and request-based DM. Commit `dbd3bbd291` ("dm rq: leverage blk_mq_queue_busy() to check for outstanding IO") lost sight of the requirement that dm_wait_for_completion() must work with all types of DM devices. Fix md_in_flight() to call the blk-mq or bio-based method accordingly. Fixes: `dbd3bbd291` ("dm rq: leverage blk_mq_queue_busy() to check for outstanding IO") Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 07:40:02 -07:00
Jens Axboe	6451fe73fa	nvme: fix irq vs io_queue calculations Guenter reported an boot hang issue on HPPA after we default to 0 poll queues. We have two issues in the queue count calculations: 1) We don't separate the poll queues from the read/write queues. This is important, since the former doesn't need interrupts. 2) The adjust logic is broken. Adjust the poll queue count before doing nvme_calc_io_queues(). The poll queue count is only limited by the IO queue count we were able to get from the controller, not failures in the IRQ allocation loop. This leaves nvme_calc_io_queues() just adjusting the read/write queue map. Reported-by: Reported-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-11 06:27:46 -07:00
Jens Axboe	b7934ba414	dm: fix inflight IO check After switching to percpu inflight counters, the inflight check is totally buggy. It's perfectly valid for some counters to be non-zero while having a total inflight IO count of 0, that's how these kinds of counters work (inc on one CPU, dec on another). Fix the md_in_flight() check to sum all counters before returning a false positive, potentially. While at it, remove the inflight read for IO completion. We don't need it, just wake anyone that's waiting for the IO count to drop to zero. The caller needs to re-check that value anyway when woken, which it does. Fixes: `6f75723190` ("dm: remove the pending IO accounting") Acked-by: Mike Snitzer <snitzer@redhat.com> Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-10 18:10:34 -07:00
Jens Axboe	4ba09f69e2	mtip32xx: use BLK_STS_DEV_RESOURCE for device resources For cases where we can only fail with IO in-flight, we should be using BLK_STS_DEV_RESOURCE instead of BLK_STS_RESOURCE. The latter refers to system wide resource constraints. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-10 14:45:19 -07:00
Arnd Bergmann	e4025e46f0	mtip32xx: avoid using semaphores The "cmd_slot_unal" semaphore is never used in a blocking way but only as an atomic counter. Change the code to using atomic_dec_if_positive() as a better API. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-10 14:44:56 -07:00
Mikulas Patocka	6f75723190	dm: remove the pending IO accounting Remove the "pending" atomic counters, that duplicate block-core's in_flight counters, and update md_in_flight() to look at percpu in_flight counters. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-12-10 08:30:38 -07:00

1 2 3 4 5 ...

798577 Commits