linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-19 17:41:29 +00:00

Author	SHA1	Message	Date
Javier González	0f9248cf1e	lightnvm: pblk: remove redundant check on read path A partial read I/O in pblk is an I/O where some sectors reside in the write buffer in main memory and some are persisted on the device. Such an I/O must at least contain 2 lbas, therefore checking for the case where a single lba is mapped is not necessary. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	7bd4d370db	lightnvm: pblk: guarantee line integrity on reads When a line is recycled during garbage collection, reads can still be issued to the line. If the line is freed in the middle of this process, data corruption might occur. This patch guarantees that lines are not freed in the middle of reads that target them (lines). Specifically, we use the existing line reference to decide when a line is eligible for being freed after the recycle process. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	a4809fee4e	lightnvm: pblk: check lba sanity on read path As part of pblk's recovery scheme, we store the lba mapped to each physical sector on the device's out-of-bound (OOB) area. On the read path, we can use this information to validate that the data being delivered to the upper layers corresponds to the lba being requested. The cost of this check is an extra copy on the DMA region on the device and an extra comparison in the host, given that (i) the OOB area is being read together with the data in the media, and (ii) the DMA region allocated for the ppa list can be reused for the metadata stored on the OOB area. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	26532ee52b	lightnvm: pblk: use rqd->end_io for completion For consistency with the rest of pblk, use rqd->end_io to point to the function taking care of ending the request on the completion path. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	67bf26a322	lightnvm: pblk: refactor rqd alloc/free Refactor the rqd allocation and free functions so that all I/O types can use these helper functions. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	e2cddf2082	lightnvm: pblk: improve naming for internal req. Each request type sent to the LightNVM subsystem requires different metadata. Until now, we have tailored this metadata based on write, read and erase commands. However, pblk uses different metadata for internal writes that do not hit the write buffer. Instead of abusing the metadata for reads, create a new request type - internal write to improve code readability. In the process, create internal values for each I/O type instead of abusing the READ/WRITE macros, as suggested by Christoph. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	875d94f3a4	lightnvm: pblk: allocate bio size more accurately Wait until we know the exact number of ppas to be sent to the device, before allocating the bio. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	6ca2f71f3e	lightnvm: pblk: simplify path on REQ_PREFLUSH On REQ_PREFLUSH, directly tag the I/O context flags to signal a flush in the write to cache path, instead of finding the correct entry context and imposing a memory barrier. This simplifies the code and might potentially prevent race conditions when adding functionality to the write path. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	55e836d401	lightnvm: pblk: put bio on bio completion Simplify put bio by doing it on bio end_io instead of manually putting it on the completion path. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	2a19b10d42	lightnvm: pblk: refactor read path on GC Simplify the part of the garbage collector where data is read from the line being recycled and moved into an internal queue before being copied to the memory buffer. This allows to get rid of a dedicated function, which introduces an unnecessary dependency on the code. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	d340121eb7	lightnvm: pblk: simplify data validity check on GC When a line is selected for recycling by the garbage collector (GC), the line state changes and the invalid bitmap is frozen, preventing invalidations from happening. Throughout the GC, the L2P map is checked to verify that not data being recycled has been updated. The last check is done before the new map is being stored on the L2P table. Though this algorithm works, it requires a number of corner cases to be checked each time the L2P table is being updated. This complicates readability and is error prone in case that the recycling algorithm is modified. Instead, this patch makes the invalid bitmap accessible even when the line is being recycled. When recycled data is being remapped, it is enough to check the invalid bitmap for the line before updating the L2P table. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	84454e6de5	lightnvm: pblk: refactor read lba sanity check Refactor lba sanity check on read path to avoid code duplication. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	9f6cb13bb4	lightnvm: pblk: normalize ppa namings Normalize the way we name ppa variables to improve code readability. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	3627896a4b	lightnvm: pblk: use constant for GC max inflight Use a constant to set the maximum number of inflight GC requests allowed. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	2942f50fa3	lightnvm: pblk: remove checks on mempool alloc. As part of the mempool audit on pblk, remove unnecessary mempool allocation checks on mempools. Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	e72ec1d31b	lightnvm: pblk: do not use a mempool for line bitmaps pblk holds two sector bitmaps: one to keep track of the mapped sectors while the line is active and another one to keep track of the invalid sectors. The latter is kept during the whole live of the line, until it is recycled. Since we cannot guarantee forward progress for the mempool in this case, get rid of the mempool and simply allocate memory through kmalloc. Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	0d880398cb	lightnvm: pblk: decouple read/erase mempools Since read and erase paths offer different guarantees for inflight I/Os, separate the mempools to set the right min_nr for each on creation. Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	b84ae4a8b8	lightnvm: pblk: simplify work_queue mempool In pblk, we have a mempool to allocate a generic structure that we pass along workqueues. This is heavily used in the GC path in order to have enough inflight reads and fully utilize the GC bandwidth. However, the current GC path copies data to the host memory and puts it back into the write buffer. This requires a vmalloc allocation for the data and a memory copy. Thus, guaranteeing the allocation by using a mempool for the structure in itself does not give us much. Until we implement support for vector copy to avoid moving data through the host, just allocate the workqueue structure using kmalloc. This allows us to have a much smaller mempool. Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	bd43241768	lightnvm: pblk: fix min size for page mempool pblk uses an internal page mempool for allocating pages on internal bios. The main two users of this memory pool are partial reads (reads with some sectors in cache and some on media) and padded writes, which need to add dummy pages to an existing bio already containing valid data (and with a large enough bioset allocated). In both cases, the maximum number of pages per bio is defined by the maximum number of physical sectors supported by the underlying device. This patch fixes a bad mempool allocation, where the min_nr of elements on the pool was fixed (to 16), which is lower than the maximum number of sectors supported by NVMe (as of the time for this patch). Instead, use the maximum number of allowed sectors reported by the device. Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	da67e68fb9	lightnvm: pblk: avoid deadlock on low LUN config On low LUN configurations, make sure not to send bios that are bigger than the buffer size. Fixes: `a4bd217b43` ("lightnvm: physical block device (pblk) target") Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	e0e12a707f	lightnvm: pblk: fix write I/O sync stat Fix stat counter to collect the right number of I/Os being synced on the completion path. Fixes: `0880a9aa2d` ("lightnvm: pblk: delete redundant buffer pointer") Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	cd8ddbf7a5	lightnvm: pblk: free padded entries in write buffer When a REQ_FLUSH reaches pblk, the bio cannot be directly completed. Instead, data on the write buffer is flushed and the bio is completed on the completion pah. This might require some sectors to be padded in order to guarantee a successful write. This patch fixes a memory leak on the padded pages. A consequence of this bad free was that internal bios not containing data (only a flush) were not being completed. Fixes: `a4bd217b43` ("lightnvm: physical block device (pblk) target") Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	7d327a9ed6	lightnvm: pblk: use right flag for GC allocation The data buffer for the GC path allocates virtual memory through vmalloc. When this change was introduced, a flag signaling kmalloc'ed memory was wrongly introduced. Use the right flag when creating a bio from this buffer. Fixes: `de54e703a4` ("lightnvm: pblk: use vmalloc for GC data buffer") Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	a1121176ff	lightnvm: pblk: initialize debug stat counter Initialize the stat counter for garbage collected reads. Fixes: `a4bd217b43` ("lightnvm: physical block device (pblk) target") Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Rakesh Pandit	32825ebb06	lightnvm: pblk: reuse pblk_gc_should_kick This is a trivial change which reuses pblk_gc_should_kick instead of repeating it again in pblk_rl_free_lines_inc. Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Made it apply to the common case. Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Rakesh Pandit	c79819bc08	lightnvm: pblk: print incompatible line version correctly Correct it by converting little endian to cpu endian and also define a macro for line version so that maintenance is easy. Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Rakesh Pandit	c5493845b7	lightnvm: pblk: improve error message if down_timeout fails The two pr_err messages are useless as they don't differentiate error code. Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Rakesh Pandit	4e76af53e1	lightnvm: pblk: fix message if L2P MAP is in device This usually happens if we are developing with qemu and ll2pmode has default value. Improve description. Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Rakesh Pandit	e57903fd97	lightnvm: pblk: protect line bitmap while submitting meta io It seems pblk_dealloc_page would race against pblk_alloc_pages for line bitmap for sector allocation.The chances are very low but might as well protect the bitmap properly. Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Rakesh Pandit	32c662c58a	lightnvm: include NVM Express driver if OCSSD is selected for build Because NVM needs BLK_DEV_NVME, select it automatically if we mark NVM in config file before building kernel. Also append PCI to depends as select doesn't automatically add dependencies. Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Rakesh Pandit	c9d84b350f	lightnvm: pblk: fix error path in pblk_lines_alloc_metadata Use appropriate memory free calls based on allocation type used and also fix number of times free is called if kmalloc fails. Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Rakesh Pandit	a96d50fa0c	lightnvm: remove already calculated nr_chnls Remove repeated calculation for number of channels while creating a target device. Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Rakesh Pandit	88d31ea267	lightnvm: protect target type list with correct locks nvm_tgt_types list was protected by wrong lock for NVM_INFO ioctl call and can race with addition or removal of target types. Also unregistering target type was not protected correctly. Fixes: `5cd907853` ("lightnvm: remove nested lock conflict with mm") Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Rakesh Pandit	bb6aa6f082	lightnvm: prevent bd removal if busy When a virtual block device is formatted and mounted after creating with "nvme lnvm create... -t pblk", a removal from "nvm lnvm remove" would result in this: 446416.309757] bdi-block not registered [446416.309773] ------------[ cut here ]------------ [446416.309780] WARNING: CPU: 3 PID: 4319 at fs/fs-writeback.c:2159 __mark_inode_dirty+0x268/0x340 Ideally removal should return -EBUSY as block device is mounted after formatting. This patch tries to address this checking if whole device or any partition of it already mounted or not before removal. Whole device is checked using "bd_super" member of block device. This member is always set once block device has been mounted using a filesystem. Another member "bd_part_count" takes care of checking any if any partitions are under use. "bd_part_count" is only updated under locks when partitions are opened or closed (first open and last release). This at least does take care sending -EBUSY if removal is being attempted while whole block device or any partition is mounted. Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Rakesh Pandit	900148296b	lightnvm: prevent target type module removal when in use If target type module e.g. pblk here is unloaded (rmmod) while module is in use (after creating target) system crashes. We fix this by using module API refcnt. Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-10-13 08:34:57 -06:00
Javier González	75cb8e939c	lightnvm: pblk: advance bio according to lba index When a lba either hits the cache or corresponds to an empty entry in the L2P table, we need to advance the bio according to the position in which the lba is located. Otherwise, we will copy data in the wrong page, thus causing data corruption for the application. In case of a cache hit, we assumed that bio->bi_iter.bi_idx would contain the correct index, but this is no necessarily true. Instead, use the local bio advance counter and iterator. This guarantees that lbas hitting the cache are copied into the right bv_page. In case of an empty L2P entry, we omitted to advance the bio. In the cases when the same I/O also contains a cache hit, data corresponding to this lba will be copied to the wrong bv_page. Fix this by advancing the bio as we do in the case of a cache hit. Fixes: `a4bd217b43` lightnvm: physical block device (pblk) target Signed-off-by: Javier González <javier@javigon.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-07-28 08:06:00 -06:00
Javier González	56c76417ad	lightnvm: pblk: remove unnecessary checks Remove unnecessary checks when freeing dma memory in the completion path. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-07-07 13:17:36 -06:00
Javier González	3eaa11e278	lightnvm: pblk: control I/O flow also on tear down When removing a pblk instance, control the write I/O flow to the controller as we do in the fast path. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-07-07 13:17:34 -06:00
Javier González	a84ebb837b	lightnvm: pblk: set line bitmap check under debug Do bitmap checks only when debug mode is enable. The line bitmap used for mapping to physical addresses is fairly large (~512KB) and it is expensive to do this checks on the fast path. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-30 11:08:18 -06:00
Javier González	076984669d	lightnvm: pblk: verify that cache read is still valid When a read is directed to the cache, we risk that the lba has been updated during the time we made the L2P table lookup and the time we are actually reading form the cache. We intentionally not hold the L2P lock not to block other threads. While strict ordering is not a guarantee at this level (unless REQ_FLUSH has been previously issued), we have experience that some databases that have recently implemented direct I/O support, issue metadata reads very close to the writes, without issuing a fsync in the middle. An easy way to support them while they is to make an extra effort and check the L2P map right before reading the cache. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-30 11:08:18 -06:00
Javier González	b5e063a286	lightnvm: pblk: add initialization check Add a sanity check to the pblk initialization sequence in order to ensure that enough LUNs have been allocated to store the line metadata. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-30 11:08:18 -06:00
Javier González	ee8d5c1ad5	lightnvm: pblk: remove target using async. I/Os When removing a pblk instance, pad the current line using asynchronous I/O. This reduces the removal time from ~1 minute in the worst case to a couple of seconds. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-30 11:08:18 -06:00
Javier González	de54e703a4	lightnvm: pblk: use vmalloc for GC data buffer For now, we allocate a per I/O buffer for GC data. Since the potential size of the buffer is 256KB and GC is not in the fast path, do this allocation with vmalloc. This puts lets pressure on the memory allocator at no performance cost. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-30 11:08:18 -06:00
Javier González	8224cbd80b	lightnvm: pblk: use right metadata buffer for recovery Fix bad metadata buffer assignations introduced when refactoring the medatada write path. Fixes: `dd2a434373` lightnvm: pblk: sched. metadata on write thread Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-30 11:08:18 -06:00
Javier González	1088812978	lightnvm: pblk: schedule if data is not ready When user threads place data into the write buffer, they reserve space and do the memory copy out of the lock. As a consequence, when the write thread starts persisting data, there is a chance that it is not copied yet. In this case, avoid polling, and schedule before retrying. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-30 11:08:18 -06:00
Javier González	653cbb8472	lightnvm: pblk: remove unused return variable Remove unused variable. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-30 11:08:18 -06:00
Javier González	2950e7e610	lightnvm: pblk: fix double-free on pblk init Prevent pblk->lines being double freed in case of an error during pblk initialization. Fixes: `dd2a434373`: "lightnvm: pblk: sched. metadata on write thread" Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-30 11:08:18 -06:00
Javier González	f417aa0bd8	lightnvm: pblk: fix bad le64 assignations Use the right types and conversions on le64 variables. Reported by sparse. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-30 11:08:18 -06:00
Rakesh Pandit	12e9a6d622	lightnvm: if LUNs are already allocated fix return While creating new device with NVM_DEV_CREATE if LUNs are already allocated ioctl would return -ENOMEM which is wrong. This patch propagates -EBUSY from nvm_reserve_luns which is correct response. Fixes: `ade69e243` ("lightnvm: merge gennvm with core") Reviewed-by: Frans Klaver <fransklaver@gmail.com> Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-27 08:22:09 -06:00
Javier González	588726d3ec	lightnvm: pblk: fail gracefully on irrec. error Due to user writes being decoupled from media writes because of the need of an intermediate write buffer, irrecoverable media write errors lead to pblk stalling; user writes fill up the buffer and end up in an infinite retry loop. In order to let user writes fail gracefully, it is necessary for pblk to keep track of its own internal state and prevent further writes from being placed into the write buffer. This patch implements a state machine to keep track of internal errors and, in case of failure, fail further user writes in an standard way. Depending on the type of error, pblk will do its best to persist buffered writes (which are already acknowledged) and close down on a graceful manner. This way, data might be recovered by re-instantiating pblk. Such state machine paves out the way for a state-based FTL log. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-26 16:27:39 -06:00
Javier González	ef5764946b	lightnvm: pblk: set mempool and workqueue params. Make constants to define sizes for internal mempools and workqueues. In this process, adjust the values to be more meaningful given the internal constrains of the FTL. In order to do this for workqueues, separate the current auxiliary workqueue into two dedicated workqueues to manage lines being closed and bad blocks. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-26 16:27:39 -06:00
Javier González	b20ba1bc74	lightnvm: pblk: redesign GC algorithm At the moment, in order to get enough read parallelism, we have recycled several lines at the same time. This approach has proven not to work well when reaching capacity, since we end up mixing valid data from all lines, thus not maintaining a sustainable free/recycled line ratio. The new design, relies on a two level workqueue mechanism. In the first level, we read the metadata for a number of lines based on the GC list they reside on (this is governed by the number of valid sectors in each line). In the second level, we recycle a single line at a time. Here, we issue reads in parallel, while a single GC write thread places data in the write buffer. This design allows to (i) only move data from one line at a time, thus maintaining a sane free/recycled ration and (ii) maintain the GC writer busy with recycled data. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-26 16:27:39 -06:00
Javier González	476118c981	lightnvm: pblk: add lock assertions on helpers Add lockdep assertions on helper functions. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-26 16:27:39 -06:00
Javier González	0c0ea8817e	lightnvm: pblk: cleanup unnecessary code Cleanup unnecessary headers and code lines. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-26 16:27:39 -06:00
Javier González	63e3809cf7	lightnvm: pblk: set metadata list for all I/Os Set a dma area for all I/Os in order to read/write from/to the metadata stored on the per-sector out-of-bound area. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-26 16:27:39 -06:00
Javier González	d45ebd470b	lightnvm: pblk: choose optimal victim GC line At the moment, we separate the closed lines on three different list based on their number of valid sectors. GC recycles lines from each list based on capacity. Lines from each list are taken in a FIFO fashion. Since the number of lines is limited (it corresponds to the number of blocks in a LUN, which is somewhere between 1000-2000), we can afford scanning the lists to choose the optimal line to be recycled. This helps specially in lines with a high number of valid sectors. If the number of blocks per LUN increases, we will consider a more efficient policy. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-26 16:27:39 -06:00
Javier González	dffdd960ee	lightnvm: pblk: decouple bad block from line alloc Decouple bad block discovery from line allocation logic. This allows to return meaningful error codes in case of bad block discovery failure. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-26 16:27:39 -06:00
Javier González	f680f19aa6	lightnvm: pblk: simplify meta. memory allocation smeta size will always be suitable for a kmalloc allocation. Simplify the code and leave the vmalloc fallback only for emeta, where the pblk configuration has an impact. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-26 16:27:39 -06:00
Javier González	f9c101523d	lightnvm: pblk: issue multiplane reads if possible If a read request is sequential and its size aligns with a multi-plane page size, use the multi-plane hint to process the I/O in parallel in the controller. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-26 16:27:39 -06:00
Javier González	0880a9aa2d	lightnvm: pblk: delete redundant buffer pointer After refactoring the metadata path, the backpointer controlling synced I/Os in a line becomes unnecessary; metadata is scheduled on the write thread, thus we know when the end of the line is reached and act on it directly. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-26 16:27:39 -06:00
Javier González	fd1b0158f5	lightnvm: pblk: delete redundant debug line stat Remove a legacy variable that helped verifying the consistency of the run-time metadata for the free line list. With the new metadata layout, this check is no longer necessary. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-26 16:27:39 -06:00
Javier González	dd2a434373	lightnvm: pblk: sched. metadata on write thread At the moment, line metadata is persisted on a separate work queue, that is kicked each time that a line is closed. The assumption when designing this was that freeing the write thread from creating a new write request was better than the potential impact of writes colliding on the media (user I/O and metadata I/O). Experimentation has proven that this assumption is wrong; collision can cause up to 25% of bandwidth and introduce long tail latencies on the write thread, which potentially cause user write threads to spend more time spinning to get a free entry on the write buffer. This patch moves the metadata logic to the write thread. When a line is closed, remaining metadata is written in memory and is placed on a metadata queue. The write thread then takes the metadata corresponding to the previous line, creates the write request and schedules it to minimize collisions on the media. Using this approach, we see that we can saturate the media's bandwidth, which helps reducing both write latencies and the spinning time for user writer threads. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-26 16:27:39 -06:00
Javier González	084ec9ba07	lightnvm: pblk: rename read request pool Read requests allocate some extra memory to store its per I/O context. Instead of requiring yet another memory pool for other type of requests, generalize this context allocation (and change naming accordingly). Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-26 16:27:13 -06:00
Javier González	d624f371d5	lightnvm: pblk: generalize erase path Erase I/Os are scheduled with the following goals in mind: (i) minimize LUNs collisions with write I/Os, and (ii) even out the price of erasing on every write, instead of putting all the burden on when garbage collection runs. This works well on the current design, but is specific to the default mapping algorithm. This patch generalizes the erase path so that other mapping algorithms can select an arbitrary line to be erased instead. It also gets rid of the erase semaphore since it creates jittering for user writes. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-26 16:24:53 -06:00
Javier González	c2e9f5d457	lightnvm: pblk: expose max sec per write on sysfs Allow to configure the number of maximum sectors per write command through sysfs. This makes it easier to tune write command sizes for different controller configurations. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-26 16:24:53 -06:00
Javier González	db7ada33cd	lightnvm: pblk: add debug stat for read cache hits Add a new debug counter to measure cache hits on the read path Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-26 16:24:53 -06:00
Javier González	caa69fa560	lightnvm: pblk: spare double cpu_to_le64 calc. Spare a double calculation on the fast write path. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-26 16:24:53 -06:00
Javier González	3e505afb45	lightnvm: re-convert ppa format on I/O failure In case of a failure when submitting a request, convert the ppa_list addresses to the target format so that it can interpret ppas for recovery Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-26 16:24:53 -06:00
NeilBrown	b25d52379a	lightnvm/pblk-read: use bio_clone_fast() pblk_submit_read() uses bio_clone_bioset() but doesn't change the io_vec, so bio_clone_fast() is a better choice. It also uses fs_bio_set which is intended for filesystems. Using it in a device driver can deadlock. So allocate a new bioset, and and use bio_clone_fast(). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Javier González <javier@cnexlabs.com> Tested-by: Javier González <javier@cnexlabs.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-18 12:40:59 -06:00
NeilBrown	af67c31fba	blk: remove bio_set arg from blk_queue_split() blk_queue_split() is always called with the last arg being q->bio_split, where 'q' is the first arg. Also blk_queue_split() sometimes uses the passed-in 'bs' and sometimes uses q->bio_split. This is inconsistent and unnecessary. Remove the last arg and always use q->bio_split inside blk_queue_split() Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Credit-to: Javier González <jg@lightnvm.io> (Noticed that lightnvm was missed) Reviewed-by: Javier González <javier@cnexlabs.com> Tested-by: Javier González <javier@cnexlabs.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-18 12:40:59 -06:00
Christoph Hellwig	4e4cbee93d	block: switch bios to blk_status_t Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-06-09 09:27:32 -06:00
Javier González	507f7d68fe	lightnvm: fix bad back free on error path Free memory correctly when an allocation fails on a loop and we free backwards previously successful allocations. Signed-off-by: Javier González <javier@cnexlabs.com> Reviewed-by: Matias Bjørling <matias@cnexlabs.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-05-04 07:53:04 -06:00
Wei Yongjun	5136a4fd58	lightnvm: fix possible memory leak in pblk_bb_discovery() 'blks' is malloced in pblk_bb_discovery() and should be freed before leaving from the nvm_get_tgt_bb_tbl() error handling cases, otherwise it will cause memory leak. Also skip assign blks to rlun->bb_list when error. Fixes: `a4bd217b43` ("lightnvm: physical block device (pblk) target") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Javier GonzÃ¡lez <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-25 10:44:29 -06:00
Javier González	a44f53faf4	lightnvm: pblk: fix erase counters on error fail When block erases fail, these blocks are marked bad. The number of valid blocks in the line was not updated, which could cause an infinite loop on the erase path. Fix this atomic counter and, in order to avoid taking an irq lock on the interrupt context, make the erase counters atomic too. Also, in the case that a significant number of blocks become bad in a line, the result is the double shared metadata buffer (emeta) to stop the pipeline until all metadata is flushed to the media. Increase the number of metadata lines from 2 to 4 to avoid this case. Fixes: `a4bd217b43` "lightnvm: physical block device (pblk) target" Signed-off-by: Javier González <javier@cnexlabs.com> Reviewed-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-23 16:57:52 -06:00
Javier González	be388d9fbd	lightnvm: pblk: free metadata on line alloc failure When a line allocation fails, for example, due to having too many bad blocks, free its metadata correctly. Fixes: `a4bd217b43` "lightnvm: physical block device (pblk) target" Signed-off-by: Javier González <javier@cnexlabs.com> Reviewed-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-23 16:57:52 -06:00
Javier González	33db9fd46e	lightnvm: pblk: fix memory leak on error path When write recovery fails, Free memory for the recovery structure. Fixes: `a4bd217b43` "lightnvm: physical block device (pblk) target" Signed-off-by: Javier González <javier@cnexlabs.com> Reviewed-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-23 16:57:52 -06:00
Javier González	f3236cef5a	lightnvm: pblk: fix bad error check Fix bad error check Fixes: `a4bd217b43` "lightnvm: physical block device (pblk) target" Signed-off-by: Javier González <javier@cnexlabs.com> Reviewed-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-23 16:57:52 -06:00
Javier González	3dc001f343	lightnvm: pblk: fix race condition on line retry When a pblk line fails (or is recovered), make sure to take the line management lock. Fixes: `a4bd217b43` "lightnvm: physical block device (pblk) target" Signed-off-by: Javier González <javier@cnexlabs.com> Reviewed-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-23 16:57:52 -06:00
Dan Carpenter	659226eb63	lightnvm: don't print a warning for ADDR_EMPTY Reading from ADDR_EMPTY is out of bounds. The current code generates a static checker warning because we check for out of bounds "lba" before we check for ADDR_EMPTY, so the second check is always false. It looks like we intended ADDR_EMPTY to be a no-op without printing a warning. Fixes: `a4bd217b43` ("lightnvm: physical block device (pblk) target") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-21 16:49:56 -06:00
Dan Carpenter	5bf1e1ee62	lightnvm: potential underflow in pblk_read_rq() This is a static checker fix, and perhaps not a real bug. The static checker thinks that nr_secs could be negative. It would result in zeroing more memory than intended. Anyway, even if it's not a bug, changing this variable to unsigned makes the code easier to audit. Fixes: `a4bd217b43` ("lightnvm: physical block device (pblk) target") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Javier González <javier@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-21 16:48:40 -06:00
Rakesh Pandit	8d77bb8276	lightnvm: propagate pblk_init return to userspace From userspace calling ioctl(NVM_DEV_CREATE) was returning ENOMEM for invalid arguments even though pblk (pblk_init) was returning correctly -EINVAL to nvm_create_tgt inside core. This patch propagates the correct return value to userspace. Because pblk was introduced recently this only needs to go in 4.12. Fixes: `a4bd217b43` ("lightnvm: physical block device (pblk) target") Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-21 12:40:41 -06:00
Rakesh Pandit	75ba4ada82	ligtnvm: fix double blk_put_queue on same queue On an error path in NVM_DEV_CREATE ioctl blk_put_queue is being called twice: one via blk_cleanup_queue and another via put_disk. Straight fix seems to remove queue pointer so that disk_release never ends up caling blk_put_queue again. [ 391.808827] WARNING: CPU: 1 PID: 1250 at lib/refcount.c:128 refcount_sub_and_test+0x70/0x80 [ 391.808830] refcount_t: underflow; use-after-free. [ 391.808832] Modules linked in: nf_conntrack_netbios_ns............ [ 391.809052] CPU: 1 PID: 1250 Comm: nvme Not tainted......... [ 391.809057] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 391.809060] Call Trace: [ 391.809079] dump_stack+0x63/0x86 [ 391.809094] __warn+0xcb/0xf0 [ 391.809103] warn_slowpath_fmt+0x5f/0x80 [ 391.809118] refcount_sub_and_test+0x70/0x80 [ 391.809125] refcount_dec_and_test+0x11/0x20 [ 391.809136] kobject_put+0x1f/0x60 [ 391.809149] blk_put_queue+0x15/0x20 [ 391.809159] disk_release+0xae/0xf0 [ 391.809172] device_release+0x32/0x90 [ 391.809184] kobject_release+0x6a/0x170 [ 391.809196] kobject_put+0x2f/0x60 [ 391.809206] put_disk+0x17/0x20 [ 391.809219] nvm_ioctl_dev_create.isra.16+0x897/0xa30 [ 391.809236] nvm_ctl_ioctl+0x23c/0x4c0 [ 391.809248] do_vfs_ioctl+0xa3/0x5f0 [ 391.809258] SyS_ioctl+0x79/0x90 [ 391.809271] entry_SYSCALL_64_fastpath+0x1a/0xa9 [ 391.809280] RIP: 0033:0x7f5d3ef363c7 [ 391.809286] RSP: 002b:00007ffc72ed8d78 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [ 391.809296] RAX: ffffffffffffffda RBX: 00007ffc72edb552 RCX: 00007f5d3ef363c7 [ 391.809301] RDX: 00007ffc72ed8d90 RSI: 0000000040804c22 RDI: 0000000000000003 [ 391.809306] RBP: 0000000000000001 R08: 0000000000000020 R09: 0000000000000001 [ 391.809311] R10: 000000000000053f R11: 0000000000000206 R12: 0000000000000000 [ 391.809316] R13: 0000000000000000 R14: 00007ffc72edb58d R15: 00007ffc72edb581 Signed-off-by: Rakesh Pandit <rakesh@tuxera.com> Reviewed-by: Matias Bjørling <matias@cnexlabs.com> Fixes: `7d1ef2f408` "lightnvm: fix cleanup order of disk on init error" Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-20 08:17:47 -06:00
Arnd Bergmann	ef697902a1	lightnvm: assume 64-bit lba numbers The driver uses both u64 and sector_t to refer to offsets, and assigns between the two. This causes one harmless warning when sector_t is 32-bit: drivers/lightnvm/pblk-rb.c: In function 'pblk_rb_write_entry_gc': include/linux/lightnvm.h:215:20: error: large integer implicitly truncated to unsigned type [-Werror=overflow] drivers/lightnvm/pblk-rb.c:324:22: note: in expansion of macro 'ADDR_EMPTY' As the driver is already doing this inconsistently, changing the type won't make it worse and is an easy way to avoid the warning. Fixes: `a4bd217b43` ("lightnvm: physical block device (pblk) target") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-19 12:07:28 -06:00
Dan Carpenter	1c6286f263	lightnvm: fix some error code in pblk-init.c There were a bunch of places in pblk_lines_init() where we didn't set an error code. And in pblk_writer_init() we accidentally return 1 instead of a correct error code, which would result in a Oops later. Fixes: 11a5d6fdf919 ("lightnvm: physical block device (pblk) target") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-16 10:06:34 -06:00
Dan Carpenter	2a79efd833	lightnvm: fix some WARN() messages WARN_ON() takes a condition, not an error message. I slightly tweaked some conditions so hopefully it's more clear. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-16 10:06:34 -06:00
Dan Carpenter	503ec94eca	lightnvm: pblk-gc: fix an error pointer dereference in init These labels are reversed so we could end up dereferencing an error pointer or leaking. Fixes: 7f347ba6bb3a ("lightnvm: physical block device (pblk) target") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-16 10:06:34 -06:00
Javier González	a4bd217b43	lightnvm: physical block device (pblk) target This patch introduces pblk, a host-side translation layer for Open-Channel SSDs to expose them like block devices. The translation layer allows data placement decisions, and I/O scheduling to be managed by the host, enabling users to optimize the SSD for their specific workloads. An open-channel SSD has a set of LUNs (parallel units) and a collection of blocks. Each block can be read in any order, but writes must be sequential. Writes may also fail, and if a block requires it, must also be reset before new writes can be applied. To manage the constraints, pblk maintains a logical to physical address (L2P) table, write cache, garbage collection logic, recovery scheme, and logic to rate-limit user I/Os versus garbage collection I/Os. The L2P table is fully-associative and manages sectors at a 4KB granularity. Pblk stores the L2P table in two places, in the out-of-band area of the media and on the last page of a line. In the cause of a power failure, pblk will perform a scan to recover the L2P table. The user data is organized into lines. A line is data striped across blocks and LUNs. The lines enable the host to reduce the amount of metadata to maintain besides the user data and makes it easier to implement RAID or erasure coding in the future. pblk implements multi-tenant support and can be instantiated multiple times on the same drive. Each instance owns a portion of the SSD - both regarding I/O bandwidth and capacity - providing I/O isolation for each case. Finally, pblk also exposes a sysfs interface that allows user-space to peek into the internals of pblk. The interface is available at /dev/block//pblk/ where is the block device name exposed. This work also contains contributions from: Matias Bjørling <matias@cnexlabs.com> Simon A. F. Lund <slund@cnexlabs.com> Young Tack Jin <youngtack.jin@gmail.com> Huaicheng Li <huaicheng@cs.uchicago.edu> Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-16 10:06:33 -06:00
Javier González	6eb082452d	lightnvm: convert sprintf into strlcpy Convert sprintf calls to strlcpy in order to make possible buffer overflow more obvious. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-16 10:06:25 -06:00
Javier González	d788c59be5	lightnvm: fix type checks on rrpc sector_t is always unsigned, therefore avoid < 0 checks on it. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-16 10:06:25 -06:00
Javier González	5d30f3bd55	lightnvm: clean unused variable Clean unused variable on lightnvm core. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-16 10:06:25 -06:00
Javier González	46b160ceda	lightnvm: make nvm_free static Prefix the nvm_free static function with a missing static keyword. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-16 10:06:25 -06:00
Javier González	4af3f75d79	lightnvm: allow to init targets on factory mode Target initialization has two responsibilities: creating the target partition and instantiating the target. This patch enables to create a factory partition (e.g., do not trigger recovery on the given target). This is useful for target development and for being able to restore the device state at any moment in time without requiring a full-device erase. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-16 10:06:25 -06:00
Javier González	7d1ef2f408	lightnvm: fix cleanup order of disk on init error Reorder disk allocation such that the disk structure can be put safely. Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-16 10:06:25 -06:00
Javier González	edee1bdd66	lightnvm: double-clear of dev->lun_map on target init error The dev->lun_map bits are cleared twice if an target init error occurs. First in the target clean routine, and then next in the nvm_tgt_create error function. Make sure that it is only cleared once by extending nvm_remove_tgt_devi() with a clear bit, such that clearing of bits can ignored when cleaning up a successful initialized target. Signed-off-by: Javier González <javier@cnexlabs.com> Fix style. Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-16 10:06:25 -06:00
NeilBrown	b0e0306ce1	lightnvm: don't check for failure from mempool_alloc() mempool_alloc() cannot fail if the gfp flags allow it to sleep, and both GFP_KERNEL and GFP_NOIO allows for sleeping. So rrpc_move_valid_pages() and rrpc_make_rq() don't need to test the return value. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-16 10:06:25 -06:00
Javier González	7a3de2b33f	lightnvm: free reverse device map Free the reverse mapping table correctly on target tear down Signed-off-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-16 10:06:25 -06:00
Javier González	17912c49ed	lightnvm: submit erases using the I/O path Until now erases have been submitted as synchronous commands through a dedicated erase function. In order to enable targets implementing asynchronous erases, refactor the erase path so that it uses the normal async I/O submission functions. If a target requires sync I/O, it can implement it internally. Also, adapt rrpc to use the new erase path. Signed-off-by: Javier González <javier@cnexlabs.com> Fixed spelling error. Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-16 10:06:25 -06:00
Christophe JAILLET	654a01b788	lightnvm: Fix error handling According to error handling in this function, it is likely that going to 'out' was expected here. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-16 10:06:25 -06:00
Matias Bjørling	6732c74010	lightnvm: set default lun range when no luns are specified The create target ioctl takes a lun begin and lun end parameter, which defines the range of luns to initialize a target with. If the user does not set the parameters, it default to only using lun 0. Instead, defaults to use all luns in the OCSSD, as it is the usual behaviour users want. Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-02-15 08:27:21 -07:00
Matias Bjørling	0e5ffd1cb5	lightnvm: fix off-by-one error on target initialization If one specifies the end lun id to be the absolute number of luns, without taking zero indexing into account, the lightnvm core will pass the off-by-one end lun id to target creation, which then panics during nvm_ioctl_dev_create. Signed-off-by: Matias Bjørling <matias@cnexlabs.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-02-15 08:27:19 -07:00

1 2 3 4 5 ...

296 Commits