linux

Author	SHA1	Message	Date
Mikulas Patocka	adc0daad36	dm: report suspended device during destroy The function dm_suspended returns true if the target is suspended. However, when the target is being suspended during unload, it returns false. An example where this is a problem: the test "!dm_suspended(wc->ti)" in writecache_writeback is not sufficient, because dm_suspended returns zero while writecache_suspend is in progress. As is, without an enhanced dm_suspended, simply switching from flush_workqueue to drain_workqueue still emits warnings: workqueue writecache-writeback: drain_workqueue() isn't complete after 10 tries workqueue writecache-writeback: drain_workqueue() isn't complete after 100 tries workqueue writecache-writeback: drain_workqueue() isn't complete after 200 tries workqueue writecache-writeback: drain_workqueue() isn't complete after 300 tries workqueue writecache-writeback: drain_workqueue() isn't complete after 400 tries writecache_suspend calls flush_workqueue(wc->writeback_wq) - this function flushes the current work. However, the workqueue may re-queue itself and flush_workqueue doesn't wait for re-queued works to finish. Because of this - the function writecache_writeback continues execution after the device was suspended and then concurrently with writecache_dtr, causing a crash in writecache_writeback. We must use drain_workqueue - that waits until the work and all re-queued works finish. As a prereq for switching to drain_workqueue, this commit fixes dm_suspended to return true after the presuspend hook and before the postsuspend hook - just like during a normal suspend. It allows simplifying the dm-integrity and dm-writecache targets so that they don't have to maintain suspended flags on their own. With this change use of drain_workqueue() can be used effectively. This change was tested with the lvm2 testsuite and cryptsetup testsuite and the are no regressions. Fixes: `48debafe4f` ("dm: add writecache target") Cc: stable@vger.kernel.org # 4.18+ Reported-by: Corey Marthaler <cmarthal@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-02-27 16:40:58 -05:00
Theodore Ts'o	3918e0667b	dm thin metadata: fix lockdep complaint [ 3934.173244] ====================================================== [ 3934.179572] WARNING: possible circular locking dependency detected [ 3934.185884] 5.4.21-xfstests #1 Not tainted [ 3934.190151] ------------------------------------------------------ [ 3934.196673] dmsetup/8897 is trying to acquire lock: [ 3934.201688] ffffffffbce82b18 (shrinker_rwsem){++++}, at: unregister_shrinker+0x22/0x80 [ 3934.210268] but task is already holding lock: [ 3934.216489] ffff92a10cc5e1d0 (&pmd->root_lock){++++}, at: dm_pool_metadata_close+0xba/0x120 [ 3934.225083] which lock already depends on the new lock. [ 3934.564165] Chain exists of: shrinker_rwsem --> &journal->j_checkpoint_mutex --> &pmd->root_lock For a more detailed lockdep report, please see: https://lore.kernel.org/r/20200220234519.GA620489@mit.edu We shouldn't need to hold the lock while are just tearing down and freeing the whole metadata pool structure. Fixes: `44d8ebf436` ("dm thin metadata: use pool locking at end of dm_pool_metadata_close") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-02-27 12:00:53 -05:00
Mikulas Patocka	7cdf6a0aae	dm cache: fix a crash due to incorrect work item cancelling The crash can be reproduced by running the lvm2 testsuite test lvconvert-thin-external-cache.sh for several minutes, e.g.: while :; do make check T=shell/lvconvert-thin-external-cache.sh; done The crash happens in this call chain: do_waker -> policy_tick -> smq_tick -> end_hotspot_period -> clear_bitset -> memset -> __memset -- which accesses an invalid pointer in the vmalloc area. The work entry on the workqueue is executed even after the bitmap was freed. The problem is that cancel_delayed_work doesn't wait for the running work item to finish, so the work item can continue running and re-submitting itself even after cache_postsuspend. In order to make sure that the work item won't be running, we must use cancel_delayed_work_sync. Also, change flush_workqueue to drain_workqueue, so that if some work item submits itself or another work item, we are properly waiting for both of them. Fixes: `c6b4fcbad0` ("dm: add cache target") Cc: stable@vger.kernel.org # v3.9 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-02-27 12:00:52 -05:00
Mikulas Patocka	7fc2e47f40	dm integrity: fix invalid table returned due to argument count mismatch If the flag SB_FLAG_RECALCULATE is present in the superblock, but it was not specified on the command line (i.e. ic->recalculate_flag is false), dm-integrity would return invalid table line - the reported number of arguments would not match the real number. Fixes: `468dfca38b` ("dm integrity: add a bitmap mode") Cc: stable@vger.kernel.org # v5.2+ Reported-by: Ondrej Kozina <okozina@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-02-25 12:06:08 -05:00
Mikulas Patocka	53770f0ec5	dm integrity: fix a deadlock due to offloading to an incorrect workqueue If we need to perform synchronous I/O in dm_integrity_map_continue(), we must make sure that we are not in the map function - in order to avoid the deadlock due to bio queuing in generic_make_request. To avoid the deadlock, we offload the request to metadata_wq. However, metadata_wq also processes metadata updates for write requests. If there are too many requests that get offloaded to metadata_wq at the beginning of dm_integrity_map_continue, the workqueue metadata_wq becomes clogged and the system is incapable of processing any metadata updates. This causes a deadlock because all the requests that need to do metadata updates wait for metadata_wq to proceed and metadata_wq waits inside wait_and_add_new_range until some existing request releases its range lock (which doesn't happen because the range lock is released after metadata update). In order to fix the deadlock, we create a new workqueue offload_wq and offload requests to it - so that processing of offload_wq is independent from processing of metadata_wq. Fixes: `7eada909bf` ("dm: add integrity target") Cc: stable@vger.kernel.org # v4.12+ Reported-by: Heinz Mauelshagen <heinzm@redhat.com> Tested-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-02-25 12:06:07 -05:00
Mikulas Patocka	d5bdf66108	dm integrity: fix recalculation when moving from journal mode to bitmap mode If we resume a device in bitmap mode and the on-disk format is in journal mode, we must recalculate anything above ic->sb->recalc_sector. Otherwise, there would be non-recalculated blocks which would cause I/O errors. Fixes: `468dfca38b` ("dm integrity: add a bitmap mode") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-02-25 12:06:06 -05:00
Mike Snitzer	47ace7e012	dm: fix potential for q->make_request_fn NULL pointer Move blk_queue_make_request() to dm.c:alloc_dev() so that q->make_request_fn is never NULL during the lifetime of a DM device (even one that is created without a DM table). Otherwise generic_make_request() will crash simply by doing: dmsetup create -n test mount /dev/dm-N /mnt While at it, move ->congested_data initialization out of dm.c:alloc_dev() and into the bio-based specific init method. Reported-by: Stefan Bader <stefan.bader@canonical.com> BugLink: https://bugs.launchpad.net/bugs/1860231 Fixes: `ff36ab3458` ("dm: remove request-based logic from make_request_fn wrapper") Depends-on: `c12c9a3c38` ("dm: various cleanups to md->queue initialization code") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-27 14:52:36 -05:00
Mikulas Patocka	dcd195071f	dm writecache: improve performance of large linear writes on SSDs When dm-writecache is used with SSD as a cache device, it would submit a separate bio for each written block. The I/Os would be merged by the disk scheduler, but this merging degrades performance. Improve dm-writecache performance by submitting larger bios - this is possible as long as there is consecutive free space on the cache device. Benchmark (arm64 with 64k page size, using /dev/ram0 as a cache device): fio --bs=512k --iodepth=32 --size=400M --direct=1 \ --filename=/dev/mapper/cache --rw=randwrite --numjobs=1 --name=test block old new size MiB/s MiB/s --------------------- 512 181 700 1k 347 1256 2k 644 2020 4k 1183 2759 8k 1852 3333 16k 2469 3509 32k 2974 3670 64k 3404 3810 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-16 13:34:17 -05:00
Anatol Pomazau	be240ff5e4	dm mpath: Add timeout mechanism for queue_if_no_path Add a configurable timeout mechanism to disable queue_if_no_path without assistance from userspace multipathd. This reimplements multipathd's no_path_retry mechanism in kernel space. This is motivated by the desire to prevent processes from hanging indefinitely waiting for IO in cases where multipathd might be unable to respond (after a failure or for whatever reason). Despite replicating userspace multipathd's policy configuration in kernel space, it is important to prevent IOs from hanging forever, waiting for userspace that may be incapable of behaving correctly. Use of the provided "queue_if_no_path_timeout_secs" dm-multipath module parameter is optional. This timeout mechanism is disabled by default (by being set to 0). Signed-off-by: Anatol Pomazau <anatol@google.com> Co-developed-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-14 20:23:14 -05:00
Mikulas Patocka	f06c03d1de	dm thin: change data device's flush_bio to be member of struct pool With commit fe64369163c5 ("dm thin: don't allow changing data device during thin-pool load") it is now possible to re-parent the data device's flush_bio from the pool_c to pool structure. Doing so offers improved lifetime guarantees for the flush_bio so that the call to dm_pool_register_pre_commit_callback can now be done safely from pool_ctr(). Depends-on: fe64369163c5 ("dm thin: don't allow changing data device during thin-pool load") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-14 20:23:13 -05:00
Mikulas Patocka	873937e75f	dm thin: don't allow changing data device during thin-pool reload The existing code allows changing the data device when the thin-pool target is reloaded. This capability is not required and only complicates device lifetime guarantees. This can cause crashes like the one reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1788596 where the kernel tries to issue a flush bio located in a structure that was already freed. Take the first step to simplifying the thin-pool's data device lifetime by disallowing changing it. Like the thin-pool's metadata device, the data device is now set in pool_create() and it cannot be changed for a given thin-pool. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-14 20:22:51 -05:00
Mike Snitzer	a4a8d28658	dm thin: fix use-after-free in metadata_pre_commit_callback dm-thin uses struct pool to hold the state of the pool. There may be multiple pool_c's pointing to a given pool, each pool_c represents a loaded target. pool_c's may be created and destroyed arbitrarily and the pool contains a reference count of pool_c's pointing to it. Since commit `694cfe7f31` ("dm thin: Flush data device before committing metadata") a pointer to pool_c is passed to dm_pool_register_pre_commit_callback and this function stores it in pmd->pre_commit_context. If this pool_c is freed, but pool is not (because there is another pool_c referencing it), we end up in a situation where pmd->pre_commit_context structure points to freed pool_c. It causes a crash in metadata_pre_commit_callback. Fix this by moving the dm_pool_register_pre_commit_callback() from pool_ctr() to pool_preresume(). This way the in-core thin-pool metadata is only ever armed with callback data whose lifetime matches the active thin-pool target. In should be noted that this fix preserves the ability to load a thin-pool table that uses a different data block device (that contains the same data) -- though it is unclear if that capability is still useful and/or needed. Fixes: `694cfe7f31` ("dm thin: Flush data device before committing metadata") Cc: stable@vger.kernel.org Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-14 20:22:50 -05:00
Mike Snitzer	44d8ebf436	dm thin metadata: use pool locking at end of dm_pool_metadata_close Ensure that the pool is locked during calls to __commit_transaction and __destroy_persistent_data_objects. Just being consistent with locking, but reality is dm_pool_metadata_close is called once pool is being destroyed so access to pool shouldn't be contended. Also, use pmd_write_lock_in_core rather than __pmd_write_lock in dm_pool_commit_metadata and rename __pmd_write_lock to pmd_write_lock_in_core -- there was no need for the alias. In addition, verify that the pool is locked in __commit_transaction(). Fixes: `873f258bec` ("dm thin metadata: do not write metadata if no changes occurred") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-14 20:22:49 -05:00
Mikulas Patocka	aa9509209c	dm writecache: fix incorrect flush sequence when doing SSD mode commit When committing state, the function writecache_flush does the following: 1. write metadata (writecache_commit_flushed) 2. flush disk cache (writecache_commit_flushed) 3. wait for data writes to complete (writecache_wait_for_ios) 4. increase superblock seq_count 5. write the superblock 6. flush disk cache It may happen that at step 3, when we wait for some write to finish, the disk may report the write as finished, but the write only hit the disk cache and it is not yet stored in persistent storage. At step 5 we write the superblock - it may happen that the superblock is written before the write that we waited for in step 3. If the machine crashes, it may result in incorrect data being returned after reboot. In order to fix the bug, we must swap steps 2 and 3 in the above sequence, so that we first wait for writes to complete and then flush the disk cache. Fixes: `48debafe4f` ("dm: add writecache target") Cc: stable@vger.kernel.org # 4.18+ Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-14 20:22:48 -05:00
Milan Broz	4ea9471fbd	dm crypt: fix benbi IV constructor crash if used in authenticated mode If benbi IV is used in AEAD construction, for example: cryptsetup luksFormat <device> --cipher twofish-xts-benbi --key-size 512 --integrity=hmac-sha256 the constructor uses wrong skcipher function and crashes: BUG: kernel NULL pointer dereference, address: 00000014 ... EIP: crypt_iv_benbi_ctr+0x15/0x70 [dm_crypt] Call Trace: ? crypt_subkey_size+0x20/0x20 [dm_crypt] crypt_ctr+0x567/0xfc0 [dm_crypt] dm_table_add_target+0x15f/0x340 [dm_mod] Fix this by properly using crypt_aead_blocksize() in this case. Fixes: `ef43aa3806` ("dm crypt: add cryptographic data integrity protection (authenticated encryption)") Cc: stable@vger.kernel.org # v4.12+ Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=941051 Reported-by: Jerad Simpson <jbsimpson@gmail.com> Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-14 20:22:47 -05:00
Milan Broz	bbb1658461	dm crypt: Implement Elephant diffuser for Bitlocker compatibility Add experimental support for BitLocker encryption with CBC mode and additional Elephant diffuser. The mode was used in older Windows systems and it is provided mainly for compatibility reasons. The userspace support to activate these devices is being added to cryptsetup utility. Read-write activation of such a device is very simple, for example: echo <password> \| cryptsetup bitlkOpen bitlk_image.img test The Elephant diffuser uses two rotations in opposite direction for data (Diffuser A and B) and also XOR operation with Sector key over the sector data; Sector key is derived from additional key data. The original public documentation is available here: http://download.microsoft.com/download/0/2/3/0238acaf-d3bf-4a6d-b3d6-0a0be4bbb36e/bitlockercipher200608.pdf The dm-crypt implementation is embedded to "elephant" IV (similar to tcw IV construction). Because we cannot modify original bio data for write (before encryption), an additional internal flag to pre-process data is added. Signed-off-by: Milan Broz <gmazyland@gmail.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-14 20:22:46 -05:00
Joe Thornber	4feaef830d	dm space map common: fix to ensure new block isn't already in use The space-maps track the reference counts for disk blocks allocated by both the thin-provisioning and cache targets. There are variants for tracking metadata blocks and data blocks. Transactionality is implemented by never touching blocks from the previous transaction, so we can rollback in the event of a crash. When allocating a new block we need to ensure the block is free (has reference count of 0) in both the current and previous transaction. Prior to this fix we were doing this by searching for a free block in the previous transaction, and relying on a 'begin' counter to track where the last allocation in the current transaction was. This 'begin' field was not being updated in all code paths (eg, increment of a data block reference count due to breaking sharing of a neighbour block in the same btree leaf). This fix keeps the 'begin' field, but now it's just a hint to speed up the search. Instead the current transaction is searched for a free block, and then the old transaction is double checked to ensure it's free. Much simpler. This fixes reports of sm_disk_new_block()'s BUG_ON() triggering when DM thin-provisioning's snapshots are heavily used. Reported-by: Eric Wheeler <dm-devel@lists.ewheeler.net> Cc: stable@vger.kernel.org Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-14 20:15:53 -05:00
xianrong.zhou	0a531c5a39	dm verity: don't prefetch hash blocks for already-verified data Try to skip prefetching hash blocks that won't be needed due to the "check_at_most_once" option being enabled and the corresponding data blocks already having been verified. Since prefetching operates on a range of data blocks, do this by just trimming the two ends of the range. This doesn't skip every unneeded hash block, since data blocks in the middle of the range could also be unneeded, and hash blocks are still prefetched in large clusters as controlled by dm_verity_prefetch_cluster. But it can still help a lot. In a test on Android Q launching 91 apps every 15s repeated 21 times, prefetching was only done for 447177/4776629 = 9.36% of data blocks. Tested-by: ruxian.feng <ruxian.feng@transsion.com> Co-developed-by: yuanjiong.gao <yuanjiong.gao@transsion.com> Signed-off-by: yuanjiong.gao <yuanjiong.gao@transsion.com> Signed-off-by: xianrong.zhou <xianrong.zhou@transsion.com> [EB: simplified the 'while' loops and improved the commit message] Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-07 12:07:33 -05:00
Mikulas Patocka	9402e95901	dm crypt: fix GFP flags passed to skcipher_request_alloc() GFP_KERNEL is not supposed to be or'd with GFP_NOFS (the result is equivalent to GFP_KERNEL). Also, we use GFP_NOIO instead of GFP_NOFS because we don't want any I/O being submitted in the direct reclaim path. Fixes: `39d13a1ac4` ("dm crypt: reuse eboiv skcipher for IV generation") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-07 12:07:32 -05:00
Jeffle Xu	4306904053	dm thin metadata: Fix trivial math error in on-disk format documentation Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-07 12:07:31 -05:00
zhengbin	63ee92d1c2	dm thin metadata: use true/false for bool variable Fixes coccicheck warning: drivers/md/dm-thin-metadata.c:814:3-14: WARNING: Assignment of 0/1 to bool variable drivers/md/dm-thin-metadata.c:1109:1-12: WARNING: Assignment of 0/1 to bool variable drivers/md/dm-thin-metadata.c:1621:1-12: WARNING: Assignment of 0/1 to bool variable drivers/md/dm-thin-metadata.c:1652:1-12: WARNING: Assignment of 0/1 to bool variable drivers/md/dm-thin-metadata.c:1706:1-12: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-07 12:07:24 -05:00
zhengbin	1d1dda8ca8	dm snapshot: use true/false for bool variable Fixes coccicheck warning: drivers/md/dm-snap.c:1064:3-18: WARNING: Assignment of 0/1 to bool variable drivers/md/dm-snap.c:1152:1-16: WARNING: Assignment of 0/1 to bool variable drivers/md/dm-snap.c:1317:1-16: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-07 12:07:17 -05:00
zhengbin	67b92d979b	dm bio prison v2: use true/false for bool variable Fixes coccicheck warning: drivers/md/dm-bio-prison-v2.c:327:2-22: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-07 12:07:08 -05:00
zhengbin	4ecc508190	dm mpath: use true/false for bool variable Fixes coccicheck warning: drivers/md/dm-mpath.c:1447:2-13: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-07 12:06:56 -05:00
Dmitry Fomichev	b399629503	dm zoned: support zone sizes smaller than 128MiB dm-zoned is observed to log failed kernel assertions and not work correctly when operating against a device with a zone size smaller than 128MiB (e.g. 32768 bits per 4K block). The reason is that the bitmap size per zone is calculated as zero with such a small zone size. Fix this problem and also make the code related to zone bitmap management be able to handle per zone bitmaps smaller than a single block. A dm-zoned-tools patch is required to properly format dm-zoned devices with zone sizes smaller than 128MiB. Fixes: `3b1a94c88b` ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-07 11:43:38 -05:00
Heinz Mauelshagen	43f3952a51	dm raid: table line rebuild status fixes raid_status() wasn't emitting rebuild flags on the table line properly because the rdev number was not yet set properly; index raid component devices array directly to solve. Also fix wrong argument count on emitted table line caused by 1 too many rebuild/write_mostly argument and consider any journal_(dev\|mode) pairs. Link: https://bugzilla.redhat.com/1782045 Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-07 11:43:37 -05:00
Bryan Gurney	88e7cafdca	dm dust: change ret to r in dust_map_write In the dust_map_write() function, change the return code variable "ret" to "r", to match the convention of the other device-mapper targets. Signed-off-by: Bryan Gurney <bgurney@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2020-01-07 11:43:36 -05:00
Linus Torvalds	f1fcd7786e	for-linus-20191212 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3y54EQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpqJuD/93LZmzS5UEWrNLkRaAaCyAy40MPxuXRZEp 42yk7cvAT4OcCr+W6nkAgG6IHGRXOz8QvOzt0P5/HfugpNlB2oz5a/6+TiTtcZTt YNt0Z4yuBMU5SXIIxc3lUMcJGxslzOr+L+9ZXD4u5UqIdG1fSrECAexSCrlmmTwu Fx02TakDc/bbUYDfLAQD1+/Z066rp1ZWDkjXqA4kUvbFzt8F7qEOc1Evq47SuR7d Iw0bM3LVASXwTq2lRc1bFFL2glku6wwkccjwdyjSrQmK4+8LhF396fQGtXuj0Mrs OzuWhaOoGhan7dpj1D8e4tqugflQy9rv9bcy6Z9PjBY+VauuFdgPr3iFcwPaPbXm 17ir4y7xJJxXlhZl/Bn06KIB2h+nLWDIaundFys5JnMmTiZvWIgSJ6Q3gWtMxgfH zWZLMw/UtRAmjHhLqvGsMaBTfgKX5ATpMbfGeZeXheVtVaOgGTunXunT56o7oRHB q4XWZqbydsYyHBUhgSzhBr03i67wbotxtebqg9VZ0UD8XM4iM8Kor/DleK03oUqD DsltKF66NAGNeOcV3TNzJuXHyF6S/vZdO7JdFHY29+pdljoTj5GB88+W9CbhwQRe WiKVpq7sAe/bh0wtqrD+QCByjSNSVU62kVgRhfqms47804j/vNqNvOKaC5UWTd0I 2LG4jfSbeg== =hmxJ -----END PGP SIGNATURE----- Merge tag 'for-linus-20191212' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: - stable fix for the bi_size overflow. Not a corruption issue, but a case wher we could merge but disallowed (Andreas) - NVMe pull request via Keith, with various fixes. - MD pull request from Song. - Merge window regression fix for the rq passthrough stats (Logan) - Remove unused blkcg_drain_queue() function (Guoqing) * tag 'for-linus-20191212' of git://git.kernel.dk/linux-block: blk-cgroup: remove blkcg_drain_queue block: fix NULL pointer dereference in account statistics with IDE md: make sure desc_nr less than MD_SB_DISKS md: raid1: check rdev before reference in raid1_sync_request func raid5: need to set STRIPE_HANDLE for batch head block: fix "check bi_size overflow before merge" nvme/pci: Fix read queue count nvme/pci Limit write queue sizes to possible cpus nvme/pci: Fix write and poll queue types nvme/pci: Remove last_cq_head nvme: Namepace identification descriptor list is optional nvme-fc: fix double-free scenarios on hw queues nvme: else following return is not needed nvme: add error message on mismatching controller ids nvme_fc: add module to ops template to allow module references nvmet-loop: Avoid preallocating big SGL for data nvme-fc: Avoid preallocating big SGL for data nvme-rdma: Avoid preallocating big SGL for data	2019-12-13 14:27:19 -08:00
Linus Torvalds	15da849c91	- Fix DM multipath by restoring full path selector functionality for bio-based configurations that don't haave a SCSI device handler. - Fix dm-btree removal to ensure non-root btree nodes have at least (max_entries / 3) entries. This resolves userspace thin_check utility's report of "too few entries in btree_node". - Fix both the DM thin-provisioning and dm-clone targets to properly flush the data device prior to metadata commit. This resolves the potential for inconsistency across a power loss event when the data device has a volatile writeback cache. - Small documentation fixes to dm-clone and dm-integrity. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl3yU6sTHHNuaXR6ZXJA cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWvO9B/0dsIxL09sWSHPe+wuzy7WXAOCHVm04 27dloxNzgXGFT5ftvU+JpLParOtDfJ2ral2BVGExjGzMs4QP8ZLrn5UuTFuR7nXi FDaypaCelRsh1/204bKDgb22vaZIAZFu7Rz2YsAzWqpCJZDjN5cgy9xz4GmCvXRt R13Qq8Dia4scR/y+xCkm5s4wH2xGz1CDmpSPzbLTpTfkMfY5yzp6Gzaipj4Fwq78 dDERNZNuabVr2o8mt8OGd/s1h4QtiJps1J8NV2He5C3Bf8daaFVkHDCl75+P2KQC ++VaIS/l1TfcOyDJmoztg7w2gmLkTxEskVpN/UQD/Ut9D5m7P9S7uaQg =6t9f -----END PGP SIGNATURE----- Merge tag 'for-5.5/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix DM multipath by restoring full path selector functionality for bio-based configurations that don't haave a SCSI device handler. - Fix dm-btree removal to ensure non-root btree nodes have at least (max_entries / 3) entries. This resolves userspace thin_check utility's report of "too few entries in btree_node". - Fix both the DM thin-provisioning and dm-clone targets to properly flush the data device prior to metadata commit. This resolves the potential for inconsistency across a power loss event when the data device has a volatile writeback cache. - Small documentation fixes to dm-clone and dm-integrity. * tag 'for-5.5/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: docs: dm-integrity: remove reference to ARC4 dm thin: Flush data device before committing metadata dm thin metadata: Add support for a pre-commit callback dm clone: Flush destination device before committing metadata dm clone metadata: Use a two phase commit dm clone metadata: Track exact changes per transaction dm btree: increase rebalance threshold in __rebalance2() dm: add dm-clone to the documentation index dm mpath: remove harmful bio-based optimization	2019-12-13 14:13:15 -08:00
Yufen Yu	3b7436cc94	md: make sure desc_nr less than MD_SB_DISKS For super_90_load, we need to make sure 'desc_nr' less than MD_SB_DISKS, avoiding invalid memory access of 'sb->disks'. Fixes: `228fc7d76d` ("md: avoid invalid memory access for array sb->dev_roles") Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2019-12-11 10:38:08 -08:00
Zhiqiang Liu	028288df63	md: raid1: check rdev before reference in raid1_sync_request func In raid1_sync_request func, rdev should be checked before reference. Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2019-12-11 10:36:13 -08:00
Guoqing Jiang	a7ede3d168	raid5: need to set STRIPE_HANDLE for batch head With commit `6ce220dd2f` ("raid5: don't set STRIPE_HANDLE to stripe which is in batch list"), we don't want to set STRIPE_HANDLE flag for sh which is already in batch list. However, the stripe which is the head of batch list should set this flag, otherwise panic could happen inside init_stripe at BUG_ON(sh->batch_head), it is reproducible with raid5 on top of nvdimm devices per Xiao oberserved. Thanks for Xiao's effort to verify the change. Fixes: `6ce220dd2f` ("raid5: don't set STRIPE_HANDLE to stripe which is in batch list") Reported-by: Xiao Ni <xni@redhat.com> Tested-by: Xiao Ni <xni@redhat.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>	2019-12-11 10:12:09 -08:00
Pankaj Bharadiya	c593642c8b	treewide: Use sizeof_field() macro Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h\|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" \| while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net	2019-12-09 10:36:44 -08:00
Nikos Tsironis	694cfe7f31	dm thin: Flush data device before committing metadata The thin provisioning target maintains per thin device mappings that map virtual blocks to data blocks in the data device. When we write to a shared block, in case of internal snapshots, or provision a new block, in case of external snapshots, we copy the shared block to a new data block (COW), update the mapping for the relevant virtual block and then issue the write to the new data block. Suppose the data device has a volatile write-back cache and the following sequence of events occur: 1. We write to a shared block 2. A new data block is allocated 3. We copy the shared block to the new data block using kcopyd (COW) 4. We insert the new mapping for the virtual block in the btree for that thin device. 5. The commit timeout expires and we commit the metadata, that now includes the new mapping from step (4). 6. The system crashes and the data device's cache has not been flushed, meaning that the COWed data are lost. The next time we read that virtual block of the thin device we read it from the data block allocated in step (2), since the metadata have been successfully committed. The data are lost due to the crash, so we read garbage instead of the old, shared data. This has the following implications: 1. In case of writes to shared blocks, with size smaller than the pool's block size (which means we first copy the whole block and then issue the smaller write), we corrupt data that the user never touched. 2. In case of writes to shared blocks, with size equal to the device's logical block size, we fail to provide atomic sector writes. When the system recovers the user will read garbage from that sector instead of the old data or the new data. 3. Even for writes to shared blocks, with size equal to the pool's block size (overwrites), after the system recovers, the written sectors will contain garbage instead of a random mix of sectors containing either old data or new data, thus we fail again to provide atomic sectors writes. 4. Even when the user flushes the thin device, because we first commit the metadata and then pass down the flush, the same risk for corruption exists (if the system crashes after the metadata have been committed but before the flush is passed down to the data device.) The only case which is unaffected is that of writes with size equal to the pool's block size and with the FUA flag set. But, because FUA writes trigger metadata commits, this case can trigger the corruption indirectly. Moreover, apart from internal and external snapshots, the same issue exists for newly provisioned blocks, when block zeroing is enabled. After the system recovers the provisioned blocks might contain garbage instead of zeroes. To solve this and avoid the potential data corruption we flush the pool's data device before committing its metadata. This ensures that the data blocks of any newly inserted mappings are properly written to non-volatile storage and won't be lost in case of a crash. Cc: stable@vger.kernel.org Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2019-12-06 11:46:16 -05:00
Nikos Tsironis	ecda7c0280	dm thin metadata: Add support for a pre-commit callback Add support for one pre-commit callback which is run right before the metadata are committed. This allows the thin provisioning target to run a callback before the metadata are committed and is required by the next commit. Cc: stable@vger.kernel.org Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2019-12-05 17:05:24 -05:00
Nikos Tsironis	8b3fd1f53a	dm clone: Flush destination device before committing metadata dm-clone maintains an on-disk bitmap which records which regions are valid in the destination device, i.e., which regions have already been hydrated, or have been written to directly, via user I/O. Setting a bit in the on-disk bitmap meas the corresponding region is valid in the destination device and we redirect all I/O regarding it to the destination device. Suppose the destination device has a volatile write-back cache and the following sequence of events occur: 1. A region gets hydrated, either through the background hydration or because it was written to directly, via user I/O. 2. The commit timeout expires and we commit the metadata, marking that region as valid in the destination device. 3. The system crashes and the destination device's cache has not been flushed, meaning the region's data are lost. The next time we read that region we read it from the destination device, since the metadata have been successfully committed, but the data are lost due to the crash, so we read garbage instead of the old data. This has several implications: 1. In case of background hydration or of writes with size smaller than the region size (which means we first copy the whole region and then issue the smaller write), we corrupt data that the user never touched. 2. In case of writes with size equal to the device's logical block size, we fail to provide atomic sector writes. When the system recovers the user will read garbage from the sector instead of the old data or the new data. 3. In case of writes without the FUA flag set, after the system recovers, the written sectors will contain garbage instead of a random mix of sectors containing either old data or new data, thus we fail again to provide atomic sector writes. 4. Even when the user flushes the dm-clone device, because we first commit the metadata and then pass down the flush, the same risk for corruption exists (if the system crashes after the metadata have been committed but before the flush is passed down). The only case which is unaffected is that of writes with size equal to the region size and with the FUA flag set. But, because FUA writes trigger metadata commits, this case can trigger the corruption indirectly. To solve this and avoid the potential data corruption we flush the destination device before committing the metadata. This ensures that any freshly hydrated regions, for which we commit the metadata, are properly written to non-volatile storage and won't be lost in case of a crash. Fixes: `7431b7835f` ("dm: add clone target") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2019-12-05 17:05:23 -05:00
Nikos Tsironis	8fdbfe8d16	dm clone metadata: Use a two phase commit Split the metadata commit in two parts: 1. dm_clone_metadata_pre_commit(): Prepare the current transaction for committing. After this is called, all subsequent metadata updates, done through either dm_clone_set_region_hydrated() or dm_clone_cond_set_range(), will be part of the next transaction. 2. dm_clone_metadata_commit(): Actually commit the current transaction to disk and start a new transaction. This is required by the following commit. It allows dm-clone to flush the destination device after step (1) to ensure that all freshly hydrated regions, for which we are updating the metadata, are properly written to non-volatile storage and won't be lost in case of a crash. Fixes: `7431b7835f` ("dm: add clone target") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2019-12-05 15:27:54 -05:00
Nikos Tsironis	e6a505f3f9	dm clone metadata: Track exact changes per transaction Extend struct dirty_map with a second bitmap which tracks the exact regions that were hydrated during the current metadata transaction. Moreover, fix __flush_dmap() to only commit the metadata of the regions that were hydrated during the current transaction. This is required by the following commits to fix a data corruption bug. Fixes: `7431b7835f` ("dm: add clone target") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2019-12-05 15:27:53 -05:00
Hou Tao	474e559567	dm btree: increase rebalance threshold in __rebalance2() We got the following warnings from thin_check during thin-pool setup: $ thin_check /dev/vdb examining superblock examining devices tree missing devices: [1, 84] too few entries in btree_node: 41, expected at least 42 (block 138, max_entries = 126) examining mapping tree The phenomenon is the number of entries in one node of details_info tree is less than (max_entries / 3). And it can be easily reproduced by the following procedures: $ new a thin pool $ presume the max entries of details_info tree is 126 $ new 127 thin devices (e.g. 1~127) to make the root node being full and then split $ remove the first 43 (e.g. 1~43) thin devices to make the children reblance repeatedly $ stop the thin pool $ thin_check The root cause is that the B-tree removal procedure in __rebalance2() doesn't guarantee the invariance: the minimal number of entries in non-root node should be >= (max_entries / 3). Simply fix the problem by increasing the rebalance threshold to make sure the number of entries in each child will be greater than or equal to (max_entries / 3 + 1), so no matter which child is used for removal, the number will still be valid. Cc: stable@vger.kernel.org Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2019-12-05 15:27:52 -05:00
Christoph Hellwig	ae58954d87	block: don't handle bio based drivers in blk_revalidate_disk_zones bio based drivers only need to update q->nr_zones. Do that manually instead of overloading blk_revalidate_disk_zones to keep that function simpler for the next round of changes that will rely even more on the request based functionality. Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-03 08:51:25 -07:00
Christoph Hellwig	9b38bb4b1e	block: simplify blkdev_nr_zones Simplify the arguments to blkdev_nr_zones by passing a gendisk instead of the block_device and capacity. This also removes the need for __blkdev_nr_zones as all callers are outside the fast path and can deal with the additional branch. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-12-03 08:51:24 -07:00
Mike Snitzer	dbaf971c9c	dm mpath: remove harmful bio-based optimization Removes the branching for edge-case where no SCSI device handler exists. The __map_bio_fast() method was far too limited, by only selecting a new pathgroup or path IFF there was a path failure, fix this be eliminating it in favor of __map_bio(). __map_bio()'s extra SCSI device handler specific MPATHF_PG_INIT_REQUIRED test is not in the fast path anyway. This change restores full path selector functionality for bio-based configurations that don't haave a SCSI device handler. But it should be noted that the path selectors do have an impact on performance for certain networks that are extremely fast (and don't require frequent switching). Fixes: `8d47e65948` ("dm mpath: remove unnecessary NVMe branching in favor of scsi_dh checks") Cc: stable@vger.kernel.org Reported-by: Drew Hastings <dhastings@crucialwebhost.com> Suggested-by: Martin Wilck <mwilck@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2019-11-26 10:22:46 -05:00
Linus Torvalds	eeee2827ae	- Fix DM core to disallow stacking request-based DM on partitions. - Fix DM raid target to properly resync raidset even if bitmap needed additional pages. - Fix DM crypt performance regression due to use of WQ_HIGHPRI for the IO and crypt workqueues. - Fix DM integrity metadata layout that was aligned on 128K boundary rather than the intended 4K boundary (removes 124K of wasted space for each metadata block). - Improve the DM thin, cache and clone targets to use spin_lock_irq rather than spin_lock_irqsave where possible. - Fix DM thin single thread performance that was lost due to needless workqueue wakeups. - Fix DM zoned target performance that was lost due to excessive backing device checks. - Add ability to trigger write failure with the DM dust test target. - Fix whitespace indentation in drivers/md/Kconfig. - Various smalls fixes and cleanups (e.g. use struct_size, fix uninitialized variable, variable renames, etc). -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl3X/uUTHHNuaXR6ZXJA cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWv4PCACAIapkVx6A+MCQMT1lFJ9Ad5RRE0jb xKjvte0KKozIsrabkLeRS/fOi6IVJwfdyF+rI5Q5BNxh6IzLrxvKvtcSatYyxY+O hd/ijcgntE7UBXU99nesBG9Vax66EXeAkXUU+UJWkijrIPikxAc62zkpl4KwK4c2 sVHRu7g7avYKSeN/CUl18WIPXKVGmKbKTUtWNd/R46V37y27EwNP2NXUGwQcrCHR G5TJBJIl3UL2nB14LbvbZ8+0nwLjiFgc6SJK72bTJwLOVQFA+0KrqxIejqtRxlGR fsEq9zfbm+9VdsQMESGYKAI89diq26uCLYBmBQe7OtJc7HBdBN0/Wkbe =CiR7 -----END PGP SIGNATURE----- Merge tag 'for-5.5/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Fix DM core to disallow stacking request-based DM on partitions. - Fix DM raid target to properly resync raidset even if bitmap needed additional pages. - Fix DM crypt performance regression due to use of WQ_HIGHPRI for the IO and crypt workqueues. - Fix DM integrity metadata layout that was aligned on 128K boundary rather than the intended 4K boundary (removes 124K of wasted space for each metadata block). - Improve the DM thin, cache and clone targets to use spin_lock_irq rather than spin_lock_irqsave where possible. - Fix DM thin single thread performance that was lost due to needless workqueue wakeups. - Fix DM zoned target performance that was lost due to excessive backing device checks. - Add ability to trigger write failure with the DM dust test target. - Fix whitespace indentation in drivers/md/Kconfig. - Various smalls fixes and cleanups (e.g. use struct_size, fix uninitialized variable, variable renames, etc). * tag 'for-5.5/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits) Revert "dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues" dm: Fix Kconfig indentation dm thin: wakeup worker only when deferred bios exist dm integrity: fix excessive alignment of metadata runs dm raid: Remove unnecessary negation of a shift in raid10_format_to_md_layout dm zoned: reduce overhead of backing device checks dm dust: add limited write failure mode dm dust: change ret to r in dust_map_read and dust_map dm dust: change result vars to r dm cache: replace spin_lock_irqsave with spin_lock_irq dm bio prison: replace spin_lock_irqsave with spin_lock_irq dm thin: replace spin_lock_irqsave with spin_lock_irq dm clone: add bucket_lock_irq/bucket_unlock_irq helpers dm clone: replace spin_lock_irqsave with spin_lock_irq dm writecache: handle REQ_FUA dm writecache: fix uninitialized variable warning dm stripe: use struct_size() in kmalloc() dm raid: streamline rs_get_progress() and its raid_status() caller side dm raid: simplify rs_setup_recovery call chain dm raid: to ensure resynchronization, perform raid set grow in preresume ...	2019-11-25 11:53:26 -08:00
Linus Torvalds	464a47f45d	for-5.5/zoned-20191122 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3YAiAQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpsJRD/wNfUGWVdIckw7iiFNuuipKBEy0Nd2VLt0B I+pVW/YjDsG2oxWXWPs5Nxc7ca2A8EzRXcWP0xEjBfOCcBh/9mULi1flkLRoWKcq v/OuTVif3ATvgJcwNkbMcoi0bYA/VwKi2dWC6ALhDDmZhyMTLeE362oIeOUNNnl6 GM8CGZHaRfmBzcH5t+WnxiS6rBlt5iwFJ35EvZo3GMXGGiLGlryxEXPAwZrf4haA Z4atNinKcNXhb80LWHo23aK3bpnaumwKP4BPuLEyvnjS4iU8SeYTXy+w5yq1BE+h HBP5s3no/mPiBAG8b6EZXqOJUGlN596AQfNLu7vCR78tmImZF0jKRFsHEAaKXf+B 1yRgZi7J+gV0qzK/Ufulg43vItk5/sTzEuV9YLfCpKTr14MFcWw908BAqaI5Kk1K e8uGqnb2KbZOLTW4QdPvpWg3eYtqEoluSoZUQ5elHxqQZ4MSZ1lK78FF1TeaW/pw sYH+v6rsWoVjEcFSwGoaaOMravzU4MKtavNAZrTJwKZx7qCqkwmi3R1k8WF6KsSV rTRAzUC1wpTdSOm1MYPMMKM/h5+BJRSJ/RjljOF4fXLnvpD5q0lequCWjrrEzc6c HPRKIgSBq7S620A19QD8UxwvZJ8bOivESqr0bux29v1Vpf7vJBrRMng8nLUrXfJs jdma5mK1UA== =/G9l -----END PGP SIGNATURE----- Merge tag 'for-5.5/zoned-20191122' of git://git.kernel.dk/linux-block Pull zoned block device update from Jens Axboe: "Enhancements and improvements to the zoned device support" * tag 'for-5.5/zoned-20191122' of git://git.kernel.dk/linux-block: scsi: sd_zbc: Remove set but not used variable 'buflen' block: rework zone reporting scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer() null_blk: Add zone_nr_conv to features null_blk: clean up report zones null_blk: clean up the block device operations block: Remove partition support for zoned block devices block: Simplify report zones execution block: cleanup the !zoned case in blk_revalidate_disk_zones block: Enhance blk_revalidate_disk_zones()	2019-11-25 11:22:37 -08:00
Linus Torvalds	2d53943090	for-5.5/drivers-20191121 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3WyEAQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgplgbD/4jNeqT0q2IkNcUUEWkZWsBOlfi0SiclS5v X8JY1IxTlL0kaBWm83mw06JewucQ97Fh7xblPE8/iDHJqpgEX4vvSQY1b8hcDulZ YOKUnLkFU22nICeT04/8x/+f8gqD5KOlGxkgEvUKViQW15oc0oNu4St/yFM1QEN0 qNMzpcfFXV9lYsOPl0y3pKdP+qbfcpeSmaFD9Z65gxN6rJy1WR8rtUGXy2luoiEc dh15IL9AGN/r8VTo8yRpD9PStiuJqpALIR8OHJSHPj+s0pQ6twk4aehcnYseAMbH zSDpa9AJrfqlnh8tUfKYLWi/PM7pMH0F01rAiQv47j/C0+QhbiOU/uTFTzUW5hQ1 eK6XzJ0slxwnDsHLKf+xJmCj0Oyk0jDimNQr/2MNsuhmr29V5lfvBNflub8eOLyZ ie2Eulv+z6pYBSJx6kqm0X3vhXOy4wgU+X8LzvfcP9iAjgU1rfzxUWxLEj+KfJS2 Nl+ERV9nafoPpoKpNR7zWRBUulp1qZJzo/U9JaUKiI5cWkIH1hhHmU2++xMeyJpb XHoDFNTGv6z/eef65eSveFD7F274TSi16K56Obk+4KWaSrIR0d6VwUA7FDmJbSI+ Jqk1OFdaRGsQ5OcVxF1Qo4WChn0FvhcD0c+yL0N19WZ01QeYsb3hlA+MUPDtGQ04 U79MPfu7iA== =i0jf -----END PGP SIGNATURE----- Merge tag 'for-5.5/drivers-20191121' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "Here are the main block driver updates for 5.5. Nothing major in here, mostly just fixes. This contains: - a set of bcache changes via Coly - MD changes from Song - loop unmap write-zeroes fix (Darrick) - spelling fixes (Geert) - zoned additions cleanups to null_blk/dm (Ajay) - allow null_blk online submit queue changes (Bart) - NVMe changes via Keith, nothing major here either" * tag 'for-5.5/drivers-20191121' of git://git.kernel.dk/linux-block: (56 commits) Revert "bcache: fix fifo index swapping condition in journal_pin_cmp()" drivers/md/raid5-ppl.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET drivers/md/raid5.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET bcache: don't export symbols bcache: remove the extra cflags for request.o bcache: at least try to shrink 1 node in bch_mca_scan() bcache: add idle_max_writeback_rate sysfs interface bcache: add code comments in bch_btree_leaf_dirty() bcache: fix deadlock in bcache_allocator bcache: add code comment bch_keylist_pop() and bch_keylist_pop_front() bcache: deleted code comments for dead code in bch_data_insert_keys() bcache: add more accurate error messages in read_super() bcache: fix static checker warning in bcache_device_free() bcache: fix a lost wake-up problem caused by mca_cannibalize_lock bcache: fix fifo index swapping condition in journal_pin_cmp() md/raid10: prevent access of uninitialized resync_pages offset md: avoid invalid memory access for array sb->dev_roles md/raid1: avoid soft lockup under high load null_blk: add zone open, close, and finish support dm: add zone open, close and finish support ...	2019-11-25 11:15:41 -08:00
Mike Snitzer	f612b2132d	Revert "dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues" This reverts commit `a1b89132dc`. Revert required hand-patching due to subsequent changes that were applied since commit `a1b89132dc`. Requires: `ed0302e830` ("dm crypt: make workqueue names device-specific") Cc: stable@vger.kernel.org Bug: https://bugzilla.kernel.org/show_bug.cgi?id=199857 Reported-by: Vito Caputo <vcaputo@pengaru.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2019-11-20 17:27:39 -05:00
Krzysztof Kozlowski	443633225e	dm: Fix Kconfig indentation Adjust indentation from spaces to tab (+optional two spaces) as in coding style with command like: $ sed -e 's/^ /\t/' -i */Kconfig Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2019-11-20 10:35:31 -05:00
Jens Axboe	00b89892c8	Revert "bcache: fix fifo index swapping condition in journal_pin_cmp()" Coly says: "Guoju Fang talked to me today, he told me this change was unnecessary and I was over-thought. Then I realize fifo_idx() uses a mask to handle the array index overflow condition, so the index swap in journal_pin_cmp() won't happen. And yes, Guoju and Kent are correct. Since you already applied this patch, can you please to remove this patch from your for-next branch? This single patch does not break thing, but it is unecessary at this moment." This reverts commit `c0e0954e90`. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-11-18 08:35:47 -07:00
Jeffle Xu	d256d79627	dm thin: wakeup worker only when deferred bios exist Single thread fio test (read, bs=4k, ioengine=libaio, iodepth=128, numjobs=1) over dm-thin device has poor performance versus bare nvme device. Further investigation with perf indicates that queue_work_on() consumes over 20% CPU time when doing IO over dm-thin device. The call stack is as follows. - 40.57% thin_map + 22.07% queue_work_on + 9.95% dm_thin_find_block + 2.80% cell_defer_no_holder 1.91% inc_all_io_entry.isra.33.part.34 + 1.78% bio_detain.isra.35 In cell_defer_no_holder(), wakeup_worker() is always called, no matter whether the tc->deferred_bio_list list is empty or not. In single thread IO model, this list is most likely empty. So skip waking up worker thread if tc->deferred_bio_list list is empty. Single thread IO performance improves from 448 MiB/s to 646 MiB/s (+44%) once the needless wake_worker() calls are properly skipped. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2019-11-18 10:03:12 -05:00
Mikulas Patocka	d537858ac8	dm integrity: fix excessive alignment of metadata runs Metadata runs are supposed to be aligned on 4k boundary (so that they work efficiently with disks with 4k sectors). However, there was a programming bug that makes them aligned on 128k boundary instead. The unused space is wasted. Fix this bug by providing a proper 4k alignment. In order to keep existing volumes working, we introduce a new flag SB_FLAG_FIXED_PADDING - when the flag is clear, we calculate the padding the old way. In order to make sure that the old version cannot mount the volume created by the new version, we increase superblock version to 4. Also in order to not break with old integritysetup, we fix alignment only if the parameter "fix_padding" is present when formatting the device. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2019-11-15 14:49:16 -05:00

1 2 3 4 5 ...

6055 Commits