mirror of
https://github.com/torvalds/linux.git
synced 2024-11-26 06:02:05 +00:00
849d18e27b
7779 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Song Liu
|
849d18e27b |
md: Remove deprecated CONFIG_MD_LINEAR
md-linear has been marked as deprecated for 2.5 years. Remove it. Cc: Christoph Hellwig <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Neil Brown <neilb@suse.de> Cc: Guoqing Jiang <guoqing.jiang@linux.dev> Cc: Mateusz Grzonka <mateusz.grzonka@intel.com> Cc: Jes Sorensen <jes@trained-monkey.org> Signed-off-by: Song Liu <song@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20231214222107.2016042-2-song@kernel.org |
||
Li Nan
|
ca294b34aa |
md/raid1: support read error check
After commit
|
||
Li Nan
|
1979dbbe32 |
md: factor out a helper exceed_read_errors() to check read_errors
Move check_decay_read_errors() to raid1-10.c and factor out a helper exceed_read_errors() to check if read_errors exceeds the limit, so that raid1 can also use it. There are no functional changes. Signed-off-by: Li Nan <linan122@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231215023852.3478228-2-linan666@huaweicloud.com |
||
Alex Lyakas
|
dc1cc22ed5 |
md: Whenassemble the array, consult the superblock of the freshest device
Upon assembling the array, both kernel and mdadm allow the devices to have event counter difference of 1, and still consider them as up-to-date. However, a device whose event count is behind by 1, may in fact not be up-to-date, and array resync with such a device may cause data corruption. To avoid this, consult the superblock of the freshest device about the status of a device, whose event counter is behind by 1. Signed-off-by: Alex Lyakas <alex.lyakas@zadara.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/1702470271-16073-1-git-send-email-alex.lyakas@zadara.com |
||
Gou Hao
|
af140f806a |
md/raid1: remove unnecessary null checking
If %__GFP_DIRECT_RECLAIM is set then bio_alloc_bioset will always be able to allocate a bio. See comment of bio_alloc_bioset. Signed-off-by: Gou Hao <gouhao@uniontech.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231214151458.28970-1-gouhao@uniontech.com |
||
Yu Kuai
|
fa2bbff7b0 |
md: synchronize flush io with array reconfiguration
Currently rcu is used to protect iterating rdev from submit_flushes():
submit_flushes remove_and_add_spares
synchronize_rcu
pers->hot_remove_disk()
rcu_read_lock()
rdev_for_each_rcu
if (rdev->raid_disk >= 0)
rdev->radi_disk = -1;
atomic_inc(&rdev->nr_pending)
rcu_read_unlock()
bi = bio_alloc_bioset()
bi->bi_end_io = md_end_flush
bi->private = rdev
submit_bio
// issue io for removed rdev
Fix this problem by grabbing 'acive_io' before iterating rdev, make sure
that remove_and_add_spares() won't concurrent with submit_flushes().
Fixes:
|
||
Yu Kuai
|
7ecab28c3b |
md/md-multipath: remove rcu protection to access rdev from conf
Because it's safe to accees rdev from conf: - If any spinlock is held, because synchronize_rcu() from md_kick_rdev_from_array() will prevent 'rdev' to be freed until spinlock is released; - If there is normal IO inflight, because mddev_suspend() will prevent rdev to be added or removed from array; And these will cover all the scenarios in md-multipath. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231125081604.3939938-6-yukuai1@huaweicloud.com |
||
Yu Kuai
|
ad8606702f |
md/raid5: remove rcu protection to access rdev from conf
Because it's safe to accees rdev from conf: - If any spinlock is held, because synchronize_rcu() from md_kick_rdev_from_array() will prevent 'rdev' to be freed until spinlock is released; - If 'reconfig_lock' is held, because rdev can't be added or removed from array; - If there is normal IO inflight, because mddev_suspend() will prevent rdev to be added or removed from array; - If there is sync IO inflight, because 'MD_RECOVERY_RUNNING' is checked in remove_and_add_spares(). And these will cover all the scenarios in raid456. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231125081604.3939938-5-yukuai1@huaweicloud.com |
||
Yu Kuai
|
2d32777d60 |
md/raid1: remove rcu protection to access rdev from conf
Because it's safe to accees rdev from conf: - If any spinlock is held, because synchronize_rcu() from md_kick_rdev_from_array() will prevent 'rdev' to be freed until spinlock is released; - If 'reconfig_lock' is held, because rdev can't be added or removed from array; - If there is normal IO inflight, because mddev_suspend() will prevent rdev to be added or removed from array; - If there is sync IO inflight, because 'MD_RECOVERY_RUNNING' is checked in remove_and_add_spares(). And these will cover all the scenarios in raid1. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231125081604.3939938-4-yukuai1@huaweicloud.com |
||
Yu Kuai
|
a448af25be |
md/raid10: remove rcu protection to access rdev from conf
Because it's safe to accees rdev from conf: - If any spinlock is held, because synchronize_rcu() from md_kick_rdev_from_array() will prevent 'rdev' to be freed until spinlock is released; - If 'reconfig_lock' is held, because rdev can't be added or removed from array; - If there is normal IO inflight, because mddev_suspend() will prevent rdev to be added or removed from array; - If there is sync IO inflight, because 'MD_RECOVERY_RUNNING' is checked in remove_and_add_spares(). And these will cover all the scenarios in raid10. This patch also cleanup the code to handle the case that replacement replace rdev while IO is still inflight. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231125081604.3939938-3-yukuai1@huaweicloud.com |
||
Yu Kuai
|
c891f1fd90 |
md: remove flag RemoveSynchronized
rcu is not used correctly here, because synchronize_rcu() is called before replacing old value, for example: remove_and_add_spares // other path synchronize_rcu // called before replacing old value set_bit(RemoveSynchronized) rcu_read_lock() rdev = conf->mirros[].rdev pers->hot_remove_disk conf->mirros[].rdev = NULL; if (!test_bit(RemoveSynchronized)) synchronize_rcu /* * won't be called, and won't wait * for concurrent readers to be done. */ // access rdev after remove_and_add_spares() rcu_read_unlock() Fortunately, there is a separate rcu protection to prevent such rdev to be freed: md_kick_rdev_from_array //other path rcu_read_lock() rdev = conf->mirros[].rdev list_del_rcu(&rdev->same_set) rcu_read_unlock() /* * rdev can be removed from conf, but * rdev won't be freed. */ synchronize_rcu() free rdev Hence remove this useless flag and prepare to remove rcu protection to access rdev from 'conf'. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231125081604.3939938-2-yukuai1@huaweicloud.com |
||
Junxiao Bi
|
bed9e27baf |
Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"
This reverts commit |
||
Junxiao Bi
|
d6e035aad6 |
md: bypass block throttle for superblock update
commit |
||
Linus Torvalds
|
bc893f744e |
block-6.7-2023-11-23
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmVfrJIQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpqKID/oCNxih4ErRdCYJOZA+tR0YkuWJBKmF9KSe w94GnsxpCdPFMo+Lr2J3WoHsxH/xTFiT5BcLMTKb5Rju7zSIvf40xZxmizwhnU3p aA1WhWjo2Qu20p35wbKy23frSOZIZAj3+PXUxOG9hso6fY96JVpDToBQ4I9k5OG1 1KHXorlktQgDWk1XryI0z9F73Q3Ja3Bt2G4aXRAGwnTyvvIHJd+jm8ksjaEKKkmQ YXv1+ZJXhHkaeSfcwYcOFDffx+O1dooWnHjcKbgyXu0UQq0kxxIwBkF5C9ltZVVT TW2WusplldV4EU/Z3ck7E03Kbmk81iTN9yyh7wAr0TmkEEOdx5Mv/PtqUHme0A4Z PJEBR7g6AVnERttGk4H2MOiPBsMOKBFT2UIVSqUTMsij2a5uaKjGvBXXDELdf4g9 45QKuP1QflEImtKm6HtELA9gWuAF1rGOrH7PMLLPhmQck643cgVIcw6Tz3eyaJMe cBuvTsejkbQVeLSiX1mdvZk0gxuowCP/A8+CXyODdhl+H/mrDnMgR6HCV07VOy6F lVXeXjQaqwd9fdQ2kxfjbstNkjE3Z/NRDGJ+y4oNFNj6YmLoVHivBAXDHhHIVpXb u/F+IUm7I2M8G3DfqS0VOABauqzbDxe7c8j10OzJeDskd06t1prt6IY/qV4uhImM G6XNMzeHjw== =EShg -----END PGP SIGNATURE----- Merge tag 'block-6.7-2023-11-23' of git://git.kernel.dk/linux Pull block fixes from Jens Axboe: "A bit bigger than usual at this time, but nothing really earth shattering: - NVMe pull request via Keith: - TCP TLS fixes (Hannes) - Authentifaction fixes (Mark, Hannes) - Properly terminate target names (Christoph) - MD pull request via Song, fixing a raid5 corruption issue - Disentanglement of the dependency mess in nvme introduced with the tls additions. Now it should actually build on all configs (Arnd) - Series of bcache fixes (Coly) - Removal of a dead helper (Damien) - s390 dasd fix (Muhammad, Jan) - lockdep blk-cgroup fixes (Ming)" * tag 'block-6.7-2023-11-23' of git://git.kernel.dk/linux: (33 commits) nvme: tcp: fix compile-time checks for TLS mode nvme: target: fix Kconfig select statements nvme: target: fix nvme_keyring_id() references nvme: move nvme_stop_keep_alive() back to original position nbd: pass nbd_sock to nbd_read_reply() instead of index s390/dasd: protect device queue against concurrent access s390/dasd: resolve spelling mistake block/null_blk: Fix double blk_mq_start_request() warning nvmet-tcp: always initialize tls_handshake_tmo_work nvmet: nul-terminate the NQNs passed in the connect command nvme: blank out authentication fabrics options if not configured nvme: catch errors from nvme_configure_metadata() nvme-tcp: only evaluate 'tls' option if TLS is selected nvme-auth: set explanation code for failure2 msgs nvme-auth: unlock mutex in one place only block: Remove blk_set_runtime_active() nbd: fix null-ptr-dereference while accessing 'nbd->config' nbd: factor out a helper to get nbd_config without holding 'config_lock' nbd: fold nbd config initialization into nbd_alloc_config() bcache: avoid NULL checking to c->root in run_cache_set() ... |
||
Jens Axboe
|
8a554c6234 |
Merge tag 'md-fixes-20231120' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.7
Pull MD fix from Song. * tag 'md-fixes-20231120' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: fix bi_status reporting in md_end_clone_io |
||
Coly Li
|
3eba5e0b24 |
bcache: avoid NULL checking to c->root in run_cache_set()
In run_cache_set() after c->root returned from bch_btree_node_get(), it is checked by IS_ERR_OR_NULL(). Indeed it is unncessary to check NULL because bch_btree_node_get() will not return NULL pointer to caller. This patch replaces IS_ERR_OR_NULL() by IS_ERR() for the above reason. Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20231120052503.6122-11-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Coly Li
|
31f5b956a1 |
bcache: add code comments for bch_btree_node_get() and __bch_btree_node_alloc()
This patch adds code comments to bch_btree_node_get() and __bch_btree_node_alloc() that NULL pointer will not be returned and it is unnecessary to check NULL pointer by the callers of these routines. Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20231120052503.6122-10-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Coly Li
|
f72f4312d4 |
bcache: replace a mistaken IS_ERR() by IS_ERR_OR_NULL() in btree_gc_coalesce()
Commit |
||
Mingzhe Zou
|
2faac25d79 |
bcache: fixup multi-threaded bch_sectors_dirty_init() wake-up race
We get a kernel crash about "unable to handle kernel paging request":
```dmesg
[368033.032005] BUG: unable to handle kernel paging request at ffffffffad9ae4b5
[368033.032007] PGD fc3a0d067 P4D fc3a0d067 PUD fc3a0e063 PMD 8000000fc38000e1
[368033.032012] Oops: 0003 [#1] SMP PTI
[368033.032015] CPU: 23 PID: 55090 Comm: bch_dirtcnt[0] Kdump: loaded Tainted: G OE --------- - - 4.18.0-147.5.1.es8_24.x86_64 #1
[368033.032017] Hardware name: Tsinghua Tongfang THTF Chaoqiang Server/072T6D, BIOS 2.4.3 01/17/2017
[368033.032027] RIP: 0010:native_queued_spin_lock_slowpath+0x183/0x1d0
[368033.032029] Code: 8b 02 48 85 c0 74 f6 48 89 c1 eb d0 c1 e9 12 83 e0
03 83 e9 01 48 c1 e0 05 48 63 c9 48 05 c0 3d 02 00 48 03 04 cd 60 68 93
ad <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 02
[368033.032031] RSP: 0018:ffffbb48852abe00 EFLAGS: 00010082
[368033.032032] RAX: ffffffffad9ae4b5 RBX: 0000000000000246 RCX: 0000000000003bf3
[368033.032033] RDX: ffff97b0ff8e3dc0 RSI: 0000000000600000 RDI: ffffbb4884743c68
[368033.032034] RBP: 0000000000000001 R08: 0000000000000000 R09: 000007ffffffffff
[368033.032035] R10: ffffbb486bb01000 R11: 0000000000000001 R12: ffffffffc068da70
[368033.032036] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
[368033.032038] FS: 0000000000000000(0000) GS:ffff97b0ff8c0000(0000) knlGS:0000000000000000
[368033.032039] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[368033.032040] CR2: ffffffffad9ae4b5 CR3: 0000000fc3a0a002 CR4: 00000000003626e0
[368033.032042] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[368033.032043] bcache: bch_cached_dev_attach() Caching rbd479 as bcache462 on set 8cff3c36-4a76-4242-afaa-7630206bc70b
[368033.032045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[368033.032046] Call Trace:
[368033.032054] _raw_spin_lock_irqsave+0x32/0x40
[368033.032061] __wake_up_common_lock+0x63/0xc0
[368033.032073] ? bch_ptr_invalid+0x10/0x10 [bcache]
[368033.033502] bch_dirty_init_thread+0x14c/0x160 [bcache]
[368033.033511] ? read_dirty_submit+0x60/0x60 [bcache]
[368033.033516] kthread+0x112/0x130
[368033.033520] ? kthread_flush_work_fn+0x10/0x10
[368033.034505] ret_from_fork+0x35/0x40
```
The crash occurred when call wake_up(&state->wait), and then we want
to look at the value in the state. However, bch_sectors_dirty_init()
is not found in the stack of any task. Since state is allocated on
the stack, we guess that bch_sectors_dirty_init() has exited, causing
bch_dirty_init_thread() to be unable to handle kernel paging request.
In order to verify this idea, we added some printing information during
wake_up(&state->wait). We find that "wake up" is printed twice, however
we only expect the last thread to wake up once.
```dmesg
[ 994.641004] alcache: bch_dirty_init_thread() wake up
[ 994.641018] alcache: bch_dirty_init_thread() wake up
[ 994.641523] alcache: bch_sectors_dirty_init() init exit
```
There is a race. If bch_sectors_dirty_init() exits after the first wake
up, the second wake up will trigger this bug("unable to handle kernel
paging request").
Proceed as follows:
bch_sectors_dirty_init
kthread_run ==============> bch_dirty_init_thread(bch_dirtcnt[0])
... ...
atomic_inc(&state.started) ...
... ...
atomic_read(&state.enough) ...
... atomic_set(&state->enough, 1)
kthread_run ======================================================> bch_dirty_init_thread(bch_dirtcnt[1])
... atomic_dec_and_test(&state->started) ...
atomic_inc(&state.started) ... ...
... wake_up(&state->wait) ...
atomic_read(&state.enough) atomic_dec_and_test(&state->started)
... ...
wait_event(state.wait, atomic_read(&state.started) == 0) ...
return ...
wake_up(&state->wait)
We believe it is very common to wake up twice if there is no dirty, but
crash is an extremely low probability event. It's hard for us to reproduce
this issue. We attached and detached continuously for a week, with a total
of more than one million attaches and only one crash.
Putting atomic_inc(&state.started) before kthread_run() can avoid waking
up twice.
Fixes:
|
||
Mingzhe Zou
|
e34820f984 |
bcache: fixup lock c->root error
We had a problem with io hung because it was waiting for c->root to
release the lock.
crash> cache_set.root -l cache_set.list ffffa03fde4c0050
root = 0xffff802ef454c800
crash> btree -o 0xffff802ef454c800 | grep rw_semaphore
[ffff802ef454c858] struct rw_semaphore lock;
crash> struct rw_semaphore ffff802ef454c858
struct rw_semaphore {
count = {
counter = -4294967297
},
wait_list = {
next = 0xffff00006786fc28,
prev = 0xffff00005d0efac8
},
wait_lock = {
raw_lock = {
{
val = {
counter = 0
},
{
locked = 0 '\000',
pending = 0 '\000'
},
{
locked_pending = 0,
tail = 0
}
}
}
},
osq = {
tail = {
counter = 0
}
},
owner = 0xffffa03fdc586603
}
The "counter = -4294967297" means that lock count is -1 and a write lock
is being attempted. Then, we found that there is a btree with a counter
of 1 in btree_cache_freeable.
crash> cache_set -l cache_set.list ffffa03fde4c0050 -o|grep btree_cache
[ffffa03fde4c1140] struct list_head btree_cache;
[ffffa03fde4c1150] struct list_head btree_cache_freeable;
[ffffa03fde4c1160] struct list_head btree_cache_freed;
[ffffa03fde4c1170] unsigned int btree_cache_used;
[ffffa03fde4c1178] wait_queue_head_t btree_cache_wait;
[ffffa03fde4c1190] struct task_struct *btree_cache_alloc_lock;
crash> list -H ffffa03fde4c1140|wc -l
973
crash> list -H ffffa03fde4c1150|wc -l
1123
crash> cache_set.btree_cache_used -l cache_set.list ffffa03fde4c0050
btree_cache_used = 2097
crash> list -s btree -l btree.list -H ffffa03fde4c1140|grep -E -A2 "^ lock = {" > btree_cache.txt
crash> list -s btree -l btree.list -H ffffa03fde4c1150|grep -E -A2 "^ lock = {" > btree_cache_freeable.txt
[root@node-3 127.0.0.1-2023-08-04-16:40:28]# pwd
/var/crash/127.0.0.1-2023-08-04-16:40:28
[root@node-3 127.0.0.1-2023-08-04-16:40:28]# cat btree_cache.txt|grep counter|grep -v "counter = 0"
[root@node-3 127.0.0.1-2023-08-04-16:40:28]# cat btree_cache_freeable.txt|grep counter|grep -v "counter = 0"
counter = 1
We found that this is a bug in bch_sectors_dirty_init() when locking c->root:
(1). Thread X has locked c->root(A) write.
(2). Thread Y failed to lock c->root(A), waiting for the lock(c->root A).
(3). Thread X bch_btree_set_root() changes c->root from A to B.
(4). Thread X releases the lock(c->root A).
(5). Thread Y successfully locks c->root(A).
(6). Thread Y releases the lock(c->root B).
down_write locked ---(1)----------------------┐
| |
| down_read waiting ---(2)----┐ |
| | ┌-------------┐ ┌-------------┐
bch_btree_set_root ===(3)========>> | c->root A | | c->root B |
| | └-------------┘ └-------------┘
up_write ---(4)---------------------┘ | |
| | |
down_read locked ---(5)-----------┘ |
| |
up_read ---(6)-----------------------------┘
Since c->root may change, the correct steps to lock c->root should be
the same as bch_root_usage(), compare after locking.
static unsigned int bch_root_usage(struct cache_set *c)
{
unsigned int bytes = 0;
struct bkey *k;
struct btree *b;
struct btree_iter iter;
goto lock_root;
do {
rw_unlock(false, b);
lock_root:
b = c->root;
rw_lock(false, b, b->level);
} while (b != c->root);
for_each_key_filter(&b->keys, k, &iter, bch_ptr_bad)
bytes += bkey_bytes(k);
rw_unlock(false, b);
return (bytes * 100) / btree_bytes(c);
}
Fixes:
|
||
Mingzhe Zou
|
7cc47e64d3 |
bcache: fixup init dirty data errors
We found that after long run, the dirty_data of the bcache device
will have errors. This error cannot be eliminated unless re-register.
We also found that reattach after detach, this error can accumulate.
In bch_sectors_dirty_init(), all inode <= d->id keys will be recounted
again. This is wrong, we only need to count the keys of the current
device.
Fixes:
|
||
Rand Deeb
|
2c7f497ac2 |
bcache: prevent potential division by zero error
In SHOW(), the variable 'n' is of type 'size_t.' While there is a conditional check to verify that 'n' is not equal to zero before executing the 'do_div' macro, concerns arise regarding potential division by zero error in 64-bit environments. The concern arises when 'n' is 64 bits in size, greater than zero, and the lower 32 bits of it are zeros. In such cases, the conditional check passes because 'n' is non-zero, but the 'do_div' macro casts 'n' to 'uint32_t,' effectively truncating it to its lower 32 bits. Consequently, the 'n' value becomes zero. To fix this potential division by zero error and ensure precise division handling, this commit replaces the 'do_div' macro with div64_u64(). div64_u64() is designed to work with 64-bit operands, guaranteeing that division is performed correctly. This change enhances the robustness of the code, ensuring that division operations yield accurate results in all scenarios, eliminating the possibility of division by zero, and improving compatibility across different 64-bit environments. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Rand Deeb <rand.sec96@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20231120052503.6122-5-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Colin Ian King
|
be93825f0e |
bcache: remove redundant assignment to variable cur_idx
Variable cur_idx is being initialized with a value that is never read, it is being re-assigned later in a while-loop. Remove the redundant assignment. Cleans up clang scan build warning: drivers/md/bcache/writeback.c:916:2: warning: Value stored to 'cur_idx' is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20231120052503.6122-4-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Coly Li
|
777967e7e9 |
bcache: check return value from btree_node_alloc_replacement()
In btree_gc_rewrite_node(), pointer 'n' is not checked after it returns from btree_gc_rewrite_node(). There is potential possibility that 'n' is a non NULL ERR_PTR(), referencing such error code is not permitted in following code. Therefore a return value checking is necessary after 'n' is back from btree_node_alloc_replacement(). Signed-off-by: Coly Li <colyli@suse.de> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20231120052503.6122-3-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Coly Li
|
baf8fb7e0e |
bcache: avoid oversize memory allocation by small stripe_size
Arraies bcache->stripe_sectors_dirty and bcache->full_dirty_stripes are used for dirty data writeback, their sizes are decided by backing device capacity and stripe size. Larger backing device capacity or smaller stripe size make these two arraies occupies more dynamic memory space. Currently bcache->stripe_size is directly inherited from queue->limits.io_opt of underlying storage device. For normal hard drives, its limits.io_opt is 0, and bcache sets the corresponding stripe_size to 1TB (1<<31 sectors), it works fine 10+ years. But for devices do declare value for queue->limits.io_opt, small stripe_size (comparing to 1TB) becomes an issue for oversize memory allocations of bcache->stripe_sectors_dirty and bcache->full_dirty_stripes, while the capacity of hard drives gets much larger in recent decade. For example a raid5 array assembled by three 20TB hardrives, the raid device capacity is 40TB with typical 512KB limits.io_opt. After the math calculation in bcache code, these two arraies will occupy 400MB dynamic memory. Even worse Andrea Tomassetti reports that a 4KB limits.io_opt is declared on a new 2TB hard drive, then these two arraies request 2GB and 512MB dynamic memory from kzalloc(). The result is that bcache device always fails to initialize on his system. To avoid the oversize memory allocation, bcache->stripe_size should not directly inherited by queue->limits.io_opt from the underlying device. This patch defines BCH_MIN_STRIPE_SZ (4MB) as minimal bcache stripe size and set bcache device's stripe size against the declared limits.io_opt value from the underlying storage device, - If the declared limits.io_opt > BCH_MIN_STRIPE_SZ, bcache device will set its stripe size directly by this limits.io_opt value. - If the declared limits.io_opt < BCH_MIN_STRIPE_SZ, bcache device will set its stripe size by a value multiplying limits.io_opt and euqal or large than BCH_MIN_STRIPE_SZ. Then the minimal stripe size of a bcache device will always be >= 4MB. For a 40TB raid5 device with 512KB limits.io_opt, memory occupied by bcache->stripe_sectors_dirty and bcache->full_dirty_stripes will be 50MB in total. For a 2TB hard drive with 4KB limits.io_opt, memory occupied by these two arraies will be 2.5MB in total. Such mount of memory allocated for bcache->stripe_sectors_dirty and bcache->full_dirty_stripes is reasonable for most of storage devices. Reported-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com> Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Eric Wheeler <bcache@lists.ewheeler.net> Link: https://lore.kernel.org/r/20231120052503.6122-2-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> |
||
Song Liu
|
45b478951b |
md: fix bi_status reporting in md_end_clone_io
md_end_clone_io() may overwrite error status in orig_bio->bi_status with
BLK_STS_OK. This could happen when orig_bio has BIO_CHAIN (split by
md_submit_bio => bio_split_to_limits, for example). As a result, upper
layer may miss error reported from md (or the device) and consider the
failed IO was successful.
Fix this by only update orig_bio->bi_status when current bio reports
error and orig_bio is BLK_STS_OK. This is the same behavior as
__bio_chain_endio().
Fixes:
|
||
Mikulas Patocka
|
13648e04a9 |
dm-crypt: start allocating with MAX_ORDER
Commit
|
||
Mikulas Patocka
|
28f07f2ab4 |
dm-verity: don't use blocking calls from tasklets
The commit |
||
Mikulas Patocka
|
2a695062a5 |
dm-bufio: fix no-sleep mode
dm-bufio has a no-sleep mode. When activated (with the DM_BUFIO_CLIENT_NO_SLEEP flag), the bufio client is read-only and we could call dm_bufio_get from tasklets. This is used by dm-verity. Unfortunately, commit |
||
Mikulas Patocka
|
ccadc8a21e |
dm-delay: avoid duplicate logic
This is small refactoring of dm-delay - we avoid duplicate logic in flush_delayed_bios and flush_delayed_bios_fast and join these two functions into one. We also add cond_resched() to flush_delayed_bios because the list may have unbounded number of entries. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> |
||
Mikulas Patocka
|
38cfff5681 |
dm-delay: fix bugs introduced by kthread mode
This commit fixes the following bugs introduced by commit |
||
Mikulas Patocka
|
6fc45b6ed9 |
dm-delay: fix a race between delay_presuspend and delay_bio
In delay_presuspend, we set the atomic variable may_delay and then stop the timer and flush pending bios. The intention here is to prevent the delay target from re-arming the timer again. However, this test is racy. Suppose that one thread goes to delay_bio, sees that dc->may_delay is one and proceeds; now, another thread executes delay_presuspend, it sets dc->may_delay to zero, deletes the timer and flushes pending bios. Then, the first thread continues and adds the bio to delayed->list despite the fact that dc->may_delay is false. Fix this bug by changing may_delay's type from atomic_t to bool and only access it while holding the delayed_bios_lock mutex. Note that we don't have to grab the mutex in delay_resume because there are no bios in flight at this point. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@kernel.org> |
||
Linus Torvalds
|
ecae0bd517 |
Many singleton patches against the MM code. The patch series which are
included in this merge do the following: - Kemeng Shi has contributed some compation maintenance work in the series "Fixes and cleanups to compaction". - Joel Fernandes has a patchset ("Optimize mremap during mutual alignment within PMD") which fixes an obscure issue with mremap()'s pagetable handling during a subsequent exec(), based upon an implementation which Linus suggested. - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the following patch series: mm/damon: misc fixups for documents, comments and its tracepoint mm/damon: add a tracepoint for damos apply target regions mm/damon: provide pseudo-moving sum based access rate mm/damon: implement DAMOS apply intervals mm/damon/core-test: Fix memory leaks in core-test mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval - In the series "Do not try to access unaccepted memory" Adrian Hunter provides some fixups for the recently-added "unaccepted memory' feature. To increase the feature's checking coverage. "Plug a few gaps where RAM is exposed without checking if it is unaccepted memory". - In the series "cleanups for lockless slab shrink" Qi Zheng has done some maintenance work which is preparation for the lockless slab shrinking code. - Qi Zheng has redone the earlier (and reverted) attempt to make slab shrinking lockless in the series "use refcount+RCU method to implement lockless slab shrink". - David Hildenbrand contributes some maintenance work for the rmap code in the series "Anon rmap cleanups". - Kefeng Wang does more folio conversions and some maintenance work in the migration code. Series "mm: migrate: more folio conversion and unification". - Matthew Wilcox has fixed an issue in the buffer_head code which was causing long stalls under some heavy memory/IO loads. Some cleanups were added on the way. Series "Add and use bdev_getblk()". - In the series "Use nth_page() in place of direct struct page manipulation" Zi Yan has fixed a potential issue with the direct manipulation of hugetlb page frames. - In the series "mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO" has improved our handling of gigantic pages in the hugetlb vmmemmep optimizaton code. This provides significant boot time improvements when significant amounts of gigantic pages are in use. - Matthew Wilcox has sent the series "Small hugetlb cleanups" - code rationalization and folio conversions in the hugetlb code. - Yin Fengwei has improved mlock()'s handling of large folios in the series "support large folio for mlock" - In the series "Expose swapcache stat for memcg v1" Liu Shixin has added statistics for memcg v1 users which are available (and useful) under memcg v2. - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable) prctl so that userspace may direct the kernel to not automatically propagate the denial to child processes. The series is named "MDWE without inheritance". - Kefeng Wang has provided the series "mm: convert numa balancing functions to use a folio" which does what it says. - In the series "mm/ksm: add fork-exec support for prctl" Stefan Roesch makes is possible for a process to propagate KSM treatment across exec(). - Huang Ying has enhanced memory tiering's calculation of memory distances. This is used to permit the dax/kmem driver to use "high bandwidth memory" in addition to Optane Data Center Persistent Memory Modules (DCPMM). The series is named "memory tiering: calculate abstract distance based on ACPI HMAT" - In the series "Smart scanning mode for KSM" Stefan Roesch has optimized KSM by teaching it to retain and use some historical information from previous scans. - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the series "mm: memcg: fix tracking of pending stats updates values". - In the series "Implement IOCTL to get and optionally clear info about PTEs" Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits us to atomically read-then-clear page softdirty state. This is mainly used by CRIU. - Hugh Dickins contributed the series "shmem,tmpfs: general maintenance" - a bunch of relatively minor maintenance tweaks to this code. - Matthew Wilcox has increased the use of the VMA lock over file-backed page faults in the series "Handle more faults under the VMA lock". Some rationalizations of the fault path became possible as a result. - In the series "mm/rmap: convert page_move_anon_rmap() to folio_move_anon_rmap()" David Hildenbrand has implemented some cleanups and folio conversions. - In the series "various improvements to the GUP interface" Lorenzo Stoakes has simplified and improved the GUP interface with an eye to providing groundwork for future improvements. - Andrey Konovalov has sent along the series "kasan: assorted fixes and improvements" which does those things. - Some page allocator maintenance work from Kemeng Shi in the series "Two minor cleanups to break_down_buddy_pages". - In thes series "New selftest for mm" Breno Leitao has developed another MM self test which tickles a race we had between madvise() and page faults. - In the series "Add folio_end_read" Matthew Wilcox provides cleanups and an optimization to the core pagecache code. - Nhat Pham has added memcg accounting for hugetlb memory in the series "hugetlb memcg accounting". - Cleanups and rationalizations to the pagemap code from Lorenzo Stoakes, in the series "Abstract vma_merge() and split_vma()". - Audra Mitchell has fixed issues in the procfs page_owner code's new timestamping feature which was causing some misbehaviours. In the series "Fix page_owner's use of free timestamps". - Lorenzo Stoakes has fixed the handling of new mappings of sealed files in the series "permit write-sealed memfd read-only shared mappings". - Mike Kravetz has optimized the hugetlb vmemmap optimization in the series "Batch hugetlb vmemmap modification operations". - Some buffer_head folio conversions and cleanups from Matthew Wilcox in the series "Finish the create_empty_buffers() transition". - As a page allocator performance optimization Huang Ying has added automatic tuning to the allocator's per-cpu-pages feature, in the series "mm: PCP high auto-tuning". - Roman Gushchin has contributed the patchset "mm: improve performance of accounted kernel memory allocations" which improves their performance by ~30% as measured by a micro-benchmark. - folio conversions from Kefeng Wang in the series "mm: convert page cpupid functions to folios". - Some kmemleak fixups in Liu Shixin's series "Some bugfix about kmemleak". - Qi Zheng has improved our handling of memoryless nodes by keeping them off the allocation fallback list. This is done in the series "handle memoryless nodes more appropriately". - khugepaged conversions from Vishal Moola in the series "Some khugepaged folio conversions". -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZULEMwAKCRDdBJ7gKXxA jhQHAQCYpD3g849x69DmHnHWHm/EHQLvQmRMDeYZI+nx/sCJOwEAw4AKg0Oemv9y FgeUPAD1oasg6CP+INZvCj34waNxwAc= =E+Y4 -----END PGP SIGNATURE----- Merge tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Kemeng Shi has contributed some compation maintenance work in the series 'Fixes and cleanups to compaction' - Joel Fernandes has a patchset ('Optimize mremap during mutual alignment within PMD') which fixes an obscure issue with mremap()'s pagetable handling during a subsequent exec(), based upon an implementation which Linus suggested - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the following patch series: mm/damon: misc fixups for documents, comments and its tracepoint mm/damon: add a tracepoint for damos apply target regions mm/damon: provide pseudo-moving sum based access rate mm/damon: implement DAMOS apply intervals mm/damon/core-test: Fix memory leaks in core-test mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval - In the series 'Do not try to access unaccepted memory' Adrian Hunter provides some fixups for the recently-added 'unaccepted memory' feature. To increase the feature's checking coverage. 'Plug a few gaps where RAM is exposed without checking if it is unaccepted memory' - In the series 'cleanups for lockless slab shrink' Qi Zheng has done some maintenance work which is preparation for the lockless slab shrinking code - Qi Zheng has redone the earlier (and reverted) attempt to make slab shrinking lockless in the series 'use refcount+RCU method to implement lockless slab shrink' - David Hildenbrand contributes some maintenance work for the rmap code in the series 'Anon rmap cleanups' - Kefeng Wang does more folio conversions and some maintenance work in the migration code. Series 'mm: migrate: more folio conversion and unification' - Matthew Wilcox has fixed an issue in the buffer_head code which was causing long stalls under some heavy memory/IO loads. Some cleanups were added on the way. Series 'Add and use bdev_getblk()' - In the series 'Use nth_page() in place of direct struct page manipulation' Zi Yan has fixed a potential issue with the direct manipulation of hugetlb page frames - In the series 'mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO' has improved our handling of gigantic pages in the hugetlb vmmemmep optimizaton code. This provides significant boot time improvements when significant amounts of gigantic pages are in use - Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code rationalization and folio conversions in the hugetlb code - Yin Fengwei has improved mlock()'s handling of large folios in the series 'support large folio for mlock' - In the series 'Expose swapcache stat for memcg v1' Liu Shixin has added statistics for memcg v1 users which are available (and useful) under memcg v2 - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable) prctl so that userspace may direct the kernel to not automatically propagate the denial to child processes. The series is named 'MDWE without inheritance' - Kefeng Wang has provided the series 'mm: convert numa balancing functions to use a folio' which does what it says - In the series 'mm/ksm: add fork-exec support for prctl' Stefan Roesch makes is possible for a process to propagate KSM treatment across exec() - Huang Ying has enhanced memory tiering's calculation of memory distances. This is used to permit the dax/kmem driver to use 'high bandwidth memory' in addition to Optane Data Center Persistent Memory Modules (DCPMM). The series is named 'memory tiering: calculate abstract distance based on ACPI HMAT' - In the series 'Smart scanning mode for KSM' Stefan Roesch has optimized KSM by teaching it to retain and use some historical information from previous scans - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the series 'mm: memcg: fix tracking of pending stats updates values' - In the series 'Implement IOCTL to get and optionally clear info about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits us to atomically read-then-clear page softdirty state. This is mainly used by CRIU - Hugh Dickins contributed the series 'shmem,tmpfs: general maintenance', a bunch of relatively minor maintenance tweaks to this code - Matthew Wilcox has increased the use of the VMA lock over file-backed page faults in the series 'Handle more faults under the VMA lock'. Some rationalizations of the fault path became possible as a result - In the series 'mm/rmap: convert page_move_anon_rmap() to folio_move_anon_rmap()' David Hildenbrand has implemented some cleanups and folio conversions - In the series 'various improvements to the GUP interface' Lorenzo Stoakes has simplified and improved the GUP interface with an eye to providing groundwork for future improvements - Andrey Konovalov has sent along the series 'kasan: assorted fixes and improvements' which does those things - Some page allocator maintenance work from Kemeng Shi in the series 'Two minor cleanups to break_down_buddy_pages' - In thes series 'New selftest for mm' Breno Leitao has developed another MM self test which tickles a race we had between madvise() and page faults - In the series 'Add folio_end_read' Matthew Wilcox provides cleanups and an optimization to the core pagecache code - Nhat Pham has added memcg accounting for hugetlb memory in the series 'hugetlb memcg accounting' - Cleanups and rationalizations to the pagemap code from Lorenzo Stoakes, in the series 'Abstract vma_merge() and split_vma()' - Audra Mitchell has fixed issues in the procfs page_owner code's new timestamping feature which was causing some misbehaviours. In the series 'Fix page_owner's use of free timestamps' - Lorenzo Stoakes has fixed the handling of new mappings of sealed files in the series 'permit write-sealed memfd read-only shared mappings' - Mike Kravetz has optimized the hugetlb vmemmap optimization in the series 'Batch hugetlb vmemmap modification operations' - Some buffer_head folio conversions and cleanups from Matthew Wilcox in the series 'Finish the create_empty_buffers() transition' - As a page allocator performance optimization Huang Ying has added automatic tuning to the allocator's per-cpu-pages feature, in the series 'mm: PCP high auto-tuning' - Roman Gushchin has contributed the patchset 'mm: improve performance of accounted kernel memory allocations' which improves their performance by ~30% as measured by a micro-benchmark - folio conversions from Kefeng Wang in the series 'mm: convert page cpupid functions to folios' - Some kmemleak fixups in Liu Shixin's series 'Some bugfix about kmemleak' - Qi Zheng has improved our handling of memoryless nodes by keeping them off the allocation fallback list. This is done in the series 'handle memoryless nodes more appropriately' - khugepaged conversions from Vishal Moola in the series 'Some khugepaged folio conversions'" [ bcachefs conflicts with the dynamically allocated shrinkers have been resolved as per Stephen Rothwell in https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/ with help from Qi Zheng. The clone3 test filtering conflict was half-arsed by yours truly ] * tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits) mm/damon/sysfs: update monitoring target regions for online input commit mm/damon/sysfs: remove requested targets when online-commit inputs selftests: add a sanity check for zswap Documentation: maple_tree: fix word spelling error mm/vmalloc: fix the unchecked dereference warning in vread_iter() zswap: export compression failure stats Documentation: ubsan: drop "the" from article title mempolicy: migration attempt to match interleave nodes mempolicy: mmap_lock is not needed while migrating folios mempolicy: alloc_pages_mpol() for NUMA policy without vma mm: add page_rmappable_folio() wrapper mempolicy: remove confusing MPOL_MF_LAZY dead code mempolicy: mpol_shared_policy_init() without pseudo-vma mempolicy trivia: use pgoff_t in shared mempolicy tree mempolicy trivia: slightly more consistent naming mempolicy trivia: delete those ancient pr_debug()s mempolicy: fix migrate_pages(2) syscall return nr_failed kernfs: drop shared NUMA mempolicy hooks hugetlbfs: drop shared NUMA mempolicy pretence mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets() ... |
||
Linus Torvalds
|
426ee5196d |
sysctl-6.7-rc1
To help make the move of sysctls out of kernel/sysctl.c not incur a size penalty sysctl has been changed to allow us to not require the sentinel, the final empty element on the sysctl array. Joel Granados has been doing all this work. On the v6.6 kernel we got the major infrastructure changes required to support this. For v6.7-rc1 we have all arch/ and drivers/ modified to remove the sentinel. Both arch and driver changes have been on linux-next for a bit less than a month. It is worth re-iterating the value: - this helps reduce the overall build time size of the kernel and run time memory consumed by the kernel by about ~64 bytes per array - the extra 64-byte penalty is no longer inncurred now when we move sysctls out from kernel/sysctl.c to their own files For v6.8-rc1 expect removal of all the sentinels and also then the unneeded check for procname == NULL. The last 2 patches are fixes recently merged by Krister Johansen which allow us again to use softlockup_panic early on boot. This used to work but the alias work broke it. This is useful for folks who want to detect softlockups super early rather than wait and spend money on cloud solutions with nothing but an eventual hung kernel. Although this hadn't gone through linux-next it's also a stable fix, so we might as well roll through the fixes now. -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmVCqKsSHG1jZ3JvZkBr ZXJuZWwub3JnAAoJEM4jHQowkoinEgYQAIpkqRL85DBwems19Uk9A27lkctwZ6Fc HdslQCObQTsbuKVimZFP4IL2beUfUE0cfLZCXlzp+4nRDOf6vyhyf3w19jPQtI0Q YdqwTk9y6G5VjDsb35QK0+UBloY/kZ1H3/LW4uCwjXTuksUGmWW2Qvey35696Scv hDMLADqKQmdpYxLUaNi9QyYbEAjYtOai2ezg3+i7hTG168t1k/Ab2BxIFrPVsCR2 FAiq05L4ugWjNskdsWBjck05JZsx9SK/qcAxpIPoUm4nGiFNHApXE0E0hs3vsnmn WIHIbxCQw8ZlUDlmw4S+0YH3NFFzFbWfmW8k2b0f2qZTJm/rU4KiJfcJVknkAUVF raFox6XDW0AUQ9L/NOUJ9ip5rup57GcFrMYocdJ3PPAvvmHKOb1D1O741p75RRcc 9j7zwfIRrzjPUqzhsQS/GFjdJu3lJNmEBK1AcgrVry6WoItrAzJHKPPDC7TwaNmD eXpjxMl1sYzzHqtVh4hn+xkUYphj/6gTGMV8zdo+/FopFswgeJW9G8kHtlEWKDPk MRIKwACmfetP6f3ngHunBg+BOipbjCANL7JI0nOhVOQoaULxCCPx+IPJ6GfSyiuH AbcjH8DGI7fJbUkBFoF0dsRFZ2gH8ds1PYMbWUJ6x3FtuCuv5iIuvQYoaWU6itm7 6f0KvCogg0fU =Qf50 -----END PGP SIGNATURE----- Merge tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "To help make the move of sysctls out of kernel/sysctl.c not incur a size penalty sysctl has been changed to allow us to not require the sentinel, the final empty element on the sysctl array. Joel Granados has been doing all this work. On the v6.6 kernel we got the major infrastructure changes required to support this. For v6.7-rc1 we have all arch/ and drivers/ modified to remove the sentinel. Both arch and driver changes have been on linux-next for a bit less than a month. It is worth re-iterating the value: - this helps reduce the overall build time size of the kernel and run time memory consumed by the kernel by about ~64 bytes per array - the extra 64-byte penalty is no longer inncurred now when we move sysctls out from kernel/sysctl.c to their own files For v6.8-rc1 expect removal of all the sentinels and also then the unneeded check for procname == NULL. The last two patches are fixes recently merged by Krister Johansen which allow us again to use softlockup_panic early on boot. This used to work but the alias work broke it. This is useful for folks who want to detect softlockups super early rather than wait and spend money on cloud solutions with nothing but an eventual hung kernel. Although this hadn't gone through linux-next it's also a stable fix, so we might as well roll through the fixes now" * tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (23 commits) watchdog: move softlockup_panic back to early_param proc: sysctl: prevent aliased sysctls from getting passed to init intel drm: Remove now superfluous sentinel element from ctl_table array Drivers: hv: Remove now superfluous sentinel element from ctl_table array raid: Remove now superfluous sentinel element from ctl_table array fw loader: Remove the now superfluous sentinel element from ctl_table array sgi-xp: Remove the now superfluous sentinel element from ctl_table array vrf: Remove the now superfluous sentinel element from ctl_table array char-misc: Remove the now superfluous sentinel element from ctl_table array infiniband: Remove the now superfluous sentinel element from ctl_table array macintosh: Remove the now superfluous sentinel element from ctl_table array parport: Remove the now superfluous sentinel element from ctl_table array scsi: Remove now superfluous sentinel element from ctl_table array tty: Remove now superfluous sentinel element from ctl_table array xen: Remove now superfluous sentinel element from ctl_table array hpet: Remove now superfluous sentinel element from ctl_table array c-sky: Remove now superfluous sentinel element from ctl_talbe array powerpc: Remove now superfluous sentinel element from ctl_table arrays riscv: Remove now superfluous sentinel element from ctl_table array x86/vdso: Remove now superfluous sentinel element from ctl_table array ... |
||
Linus Torvalds
|
0364249d20 |
- Update DM core to directly call the map function for both the linear
and stripe targets; which are provided by DM core. - Various updates to use new safer string functions. - Update DM core to respect REQ_NOWAIT flag in normal bios so that memory allocations are always attempted with GFP_NOWAIT. - Add Mikulas Patocka to MAINTAINERS as a DM maintainer! - Improve DM delay target's handling of short delays (< 50ms) by using a kthread to check expiration of IOs rather than timers and a wq. - Update the DM error target so that it works with zoned storage. This helps xfstests to provide proper IO error handling coverage when testing a filesystem with native zoned storage support. - Update both DM crypt and integrity targets to improve performance by using crypto_shash_digest() rather than init+update+final sequence. - Fix DM crypt target by backfilling missing memory allocation accounting for compound pages. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmVCeG8ACgkQxSPxCi2d A1oXAAgAnT0mb1psEwDhKiJG26bUeJDJHIaNTPPw4UpCvKEWU7lTxzJUmthnwLZI D+JnkrenAfGppdsFkHh1ogTBz8r60aeBTY31SUoCUVS/clwHxRstLCkSE2655an3 WNgrGItta5BRtcgTMxU7bmqVOD4Xyw704L7BK5CM4zKBQuXJtkegylALhdXvYSjs mvHWoPczFW6Rv79CkF0uTkon7Ji041BvKVjbp+fYQf9Pul5d6S1v4Jrvlo2EqblU 6OM75OosNgziQhdw7faVOgFCVkQfLkEeYsJ47dBOjpC3+sbyKENbC4r+ZmkCZGyI uJERswpmAkW7pp+Ab6sqG582DXvPvA== =PPwQ -----END PGP SIGNATURE----- Merge tag 'for-6.7/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Update DM core to directly call the map function for both the linear and stripe targets; which are provided by DM core - Various updates to use new safer string functions - Update DM core to respect REQ_NOWAIT flag in normal bios so that memory allocations are always attempted with GFP_NOWAIT - Add Mikulas Patocka to MAINTAINERS as a DM maintainer! - Improve DM delay target's handling of short delays (< 50ms) by using a kthread to check expiration of IOs rather than timers and a wq - Update the DM error target so that it works with zoned storage. This helps xfstests to provide proper IO error handling coverage when testing a filesystem with native zoned storage support - Update both DM crypt and integrity targets to improve performance by using crypto_shash_digest() rather than init+update+final sequence - Fix DM crypt target by backfilling missing memory allocation accounting for compound pages * tag 'for-6.7/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm crypt: account large pages in cc->n_allocated_pages dm integrity: use crypto_shash_digest() in sb_mac() dm crypt: use crypto_shash_digest() in crypt_iv_tcw_whitening() dm error: Add support for zoned block devices dm delay: for short delays, use kthread instead of timers and wq MAINTAINERS: add Mikulas Patocka as a DM maintainer dm: respect REQ_NOWAIT flag in normal bios issued to DM dm: enhance alloc_multiple_bios() to be more versatile dm: make __send_duplicate_bios return unsigned int dm log userspace: replace deprecated strncpy with strscpy dm ioctl: replace deprecated strncpy with strscpy_pad dm crypt: replace open-coded kmemdup_nul dm cache metadata: replace deprecated strncpy with strscpy dm: shortcut the calls to linear_map and stripe_map |
||
Linus Torvalds
|
90d624af2e |
for-6.7/block-2023-10-30
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmU/vjMQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpqVcEADaNf6X7LVKKrdQ4sA38dBZYGM3kNz0SCYV vkjQAs0Fyylbu6EhYOLO/R+UCtpytLlnbr4NmFDbhaEG4OJcwoDLDxpMQ7Gda58v 4RBXAiIlhZX3g99/ebvtNtVEvQa9gF4h8k2n/gKsG+PoS+cbkKAI0Na2duI1d/pL B5nQ31VAHhsyjUv1nIPLrQS6lsL7ZTFvH8L6FLcEVM03poy8PE2H6kN7WoyXwtfo LN3KK0Nu7B0Wx2nDx0ffisxcDhbChGs7G2c9ndPTvxg6/4HW+2XSeNUwTxXYpyi2 ZCD+AHCzMB/w6GNNWFw4xfau5RrZ4c4HdBnmyR6+fPb1u6nGzjgquzFyLyLu5MkA n/NvOHP1Cbd3QIXG1TnBi2kDPkQ5FOIAjFSe9IZAGT4dUkZ63wBoDil1jCgMLuCR C+AFPLhiIg3cFvu9+fdZ6BkCuZYESd3YboBtRKeMionEexrPTKt4QWqIoVJgd/Y7 nwvR8jkIBpVgQZT8ocYqhSycLCYV2lGqEBSq4rlRiEb/W1G9Awmg8UTGuUYFSC1G vGPCwhGi+SBsbo84aPCfSdUkKDlruNWP0GwIFxo0hsiTOoHP+7UWeenJ2Jw5lNPt p0Y72TEDDaSMlE4cJx6IWdWM/B+OWzCyRyl3uVcy7bToEsVhIbBSSth7+sh2n7Cy WgH1lrtMzg== =sace -----END PGP SIGNATURE----- Merge tag 'for-6.7/block-2023-10-30' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: - Improvements to the queue_rqs() support, and adding null_blk support for that as well (Chengming) - Series improving badblocks support (Coly) - Key store support for sed-opal (Greg) - IBM partition string handling improvements (Jan) - Make number of ublk devices supported configurable (Mike) - Cancelation improvements for ublk (Ming) - MD pull requests via Song: - Handle timeout in md-cluster, by Denis Plotnikov - Cleanup pers->prepare_suspend, by Yu Kuai - Rewrite mddev_suspend(), by Yu Kuai - Simplify md_seq_ops, by Yu Kuai - Reduce unnecessary locking array_state_store(), by Mariusz Tkaczyk - Make rdev add/remove independent from daemon thread, by Yu Kuai - Refactor code around quiesce() and mddev_suspend(), by Yu Kuai - NVMe pull request via Keith: - nvme-auth updates (Mark) - nvme-tcp tls (Hannes) - nvme-fc annotaions (Kees) - Misc cleanups and improvements (Jiapeng, Joel) * tag 'for-6.7/block-2023-10-30' of git://git.kernel.dk/linux: (95 commits) block: ublk_drv: Remove unused function md: cleanup pers->prepare_suspend() nvme-auth: allow mixing of secret and hash lengths nvme-auth: use transformed key size to create resp nvme-auth: alloc nvme_dhchap_key as single buffer nvmet-tcp: use 'spin_lock_bh' for state_lock() powerpc/pseries: PLPKS SED Opal keystore support block: sed-opal: keystore access for SED Opal keys block:sed-opal: SED Opal keystore ublk: simplify aborting request ublk: replace monitor with cancelable uring_cmd ublk: quiesce request queue when aborting queue ublk: rename mm_lock as lock ublk: move ublk_cancel_dev() out of ub->mutex ublk: make sure io cmd handled in submitter task context ublk: don't get ublk device reference in ublk_abort_queue() ublk: Make ublks_max configurable ublk: Limit dev_id/ub_number values md-cluster: check for timeout while a new disk adding nvme: rework NVME_AUTH Kconfig selection ... |
||
Mikulas Patocka
|
9793c269da |
dm crypt: account large pages in cc->n_allocated_pages
The commit |
||
Eric Biggers
|
070bb43ab0 |
dm integrity: use crypto_shash_digest() in sb_mac()
Simplify sb_mac() by using crypto_shash_digest() instead of an init+update+final sequence. This should also improve performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> |
||
Eric Biggers
|
6d0ee3b680 |
dm crypt: use crypto_shash_digest() in crypt_iv_tcw_whitening()
Simplify crypt_iv_tcw_whitening() by using crypto_shash_digest() instead of an init+update+final sequence. This should also improve performance. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> |
||
Damien Le Moal
|
a951104333 |
dm error: Add support for zoned block devices
dm-error is used in several test cases in the xfstests test suite to check the handling of IO errors in file systems. However, with several file systems getting native support for zoned block devices (e.g. btrfs and f2fs), dm-error's lack of zoned block device support creates problems as the file system attempts executing zone commands (e.g. a zone append operation) against a dm-error non-zoned block device, which causes various issues in the block layer (e.g. WARN_ON triggers). This commit adds supports for zoned block devices to dm-error, allowing a DM device table containing an error target to be exposed as a zoned block device (if all targets have a compatible zoned model support and mapping). This is done as follows: 1) Allow passing 2 arguments to an error target, similar to dm-linear: a backing device and a start sector. These arguments are optional and dm-error retains its characteristics if the arguments are not specified. 2) Implement the iterate_devices method so that dm-core can normally check the zone support and restrictions (e.g. zone alignment of the targets). When the backing device arguments are not specified, the iterate_devices method never calls the fn() argument. When no backing device is specified, as before, we assume that the DM device is not zoned. When the backing device arguments are specified, the zoned model of the DM device will depend on the backing device type: - If the backing device is zoned and its model and mapping is compatible with other targets of the device, the resulting device will be zoned, with the dm-error mapped portion always returning errors (similar to the default non-zoned case). - If the backing device is not zoned, then the DM device will not be either. This zone support for dm-error requires the definition of a functional report_zones operation so that dm_revalidate_zones() can operate correctly and resources for emulating zone append operations initialized. This is necessary for cases where dm-error is used to partially map a device and have an overall correct handling of zone append. This means that dm-error does not fail report zones operations. Two changes that are not obvious are included to avoid issues: 1) dm_table_supports_zoned_model() is changed to directly check if the backing device of a wildcard target (= dm-error target) is zoned. Otherwise, we wouldn't be able to catch the invalid setup of dm-error without a backing device (non zoned case) being combined with zoned targets. 2) dm_table_supports_dax() is modified to return false if the wildcard target is found. Otherwise, when dm-error is set without a backing device, we end up with a NULL pointer dereference in set_dax_synchronous (dax_dev is NULL). This is consistent with the current behavior because dm_table_supports_dax() always returned false for targets that do not define the iterate_devices method. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org> |
||
Christian Loehle
|
70bbeb29fa |
dm delay: for short delays, use kthread instead of timers and wq
DM delay's current design of using timers and wq to realize the delays is insufficient for delays below ~50ms. This commit enhances the design to use a kthread to flush the expired delays, trading some CPU time (in some cases) for better delay accuracy and delays closer to what the user requested for smaller delays. The new design is chosen as long as all the delays are below 50ms. Since bios can't be completed in interrupt context using a kthread is probably the most reasonable way to approach this. Testing with echo "0 2097152 zero" | dmsetup create dm-zeros for i in $(seq 0 20); do echo "0 2097152 delay /dev/mapper/dm-zeros 0 $i" | dmsetup create dm-delay-${i}ms; done Some performance numbers for comparison, on beaglebone black (single core) CONFIG_HZ_1000=y: fio --name=1msread --rw=randread --bs=4k --runtime=60 --time_based \ --filename=/dev/mapper/dm-delay-1ms Theoretical maximum: 1000 IOPS Previous: 250 IOPS Kthread: 500 IOPS fio --name=10msread --rw=randread --bs=4k --runtime=60 --time_based \ --filename=/dev/mapper/dm-delay-10ms Theoretical maximum: 100 IOPS Previous: 45 IOPS Kthread: 50 IOPS fio --name=1mswrite --rw=randwrite --direct=1 --bs=4k --runtime=60 \ --time_based --filename=/dev/mapper/dm-delay-1ms Theoretical maximum: 1000 IOPS Previous: 498 IOPS Kthread: 1000 IOPS fio --name=10mswrite --rw=randwrite --direct=1 --bs=4k --runtime=60 \ --time_based --filename=/dev/mapper/dm-delay-10ms Theoretical maximum: 100 IOPS Previous: 90 IOPS Kthread: 100 IOPS (This one is just to prove the new design isn't impacting throughput, not really about delays): fio --name=10mswriteasync --rw=randwrite --direct=1 --bs=4k \ --runtime=60 --time_based --filename=/dev/mapper/dm-delay-10ms \ --numjobs=32 --iodepth=64 --ioengine=libaio --group_reporting Previous: 13.3k IOPS Kthread: 13.3k IOPS Signed-off-by: Christian Loehle <christian.loehle@arm.com> [Harshit: kthread_create error handling fix in delay_ctr] Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> |
||
Linus Torvalds
|
befaa609f4 |
hardening updates for v6.7-rc1
- Add LKDTM test for stuck CPUs (Mark Rutland) - Improve LKDTM selftest behavior under UBSan (Ricardo Cañuelo) - Refactor more 1-element arrays into flexible arrays (Gustavo A. R. Silva) - Analyze and replace strlcpy and strncpy uses (Justin Stitt, Azeem Shaikh) - Convert group_info.usage to refcount_t (Elena Reshetova) - Add __counted_by annotations (Kees Cook, Gustavo A. R. Silva) - Add Kconfig fragment for basic hardening options (Kees Cook, Lukas Bulwahn) - Fix randstruct GCC plugin performance mode to stay in groups (Kees Cook) - Fix strtomem() compile-time check for small sources (Kees Cook) -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmU/3cUWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJsEoEACBGPSiOmfSWdH3TOnIG270PD24 jGjg8KFv7RC/JTOdYmpLl0okdlGT9LvjN/ToSSDEw3PIayxoXUdhkbYy0MYtiV3m yz2ozDTzJuplQX/W2fPE+nXSzIwHao2zjPPFjHnT7lt8IIjhgjiOtLfZ2gGUkW99 Mdu2aWh3u0r4tC8OS23++yN5ibRc5l72efsjDWjZ0aPXnxE1bjmLMiIPiizpndIf beasPuDBs98sJVYouemCwnsPXuXOPz3Q1Cpo/fTd+TMTJCLSemCQZCTuOBU0acI/ ZjLCgCaJU1yIYKBMtrIN4G9kITZniXX3/Nm4o6NQMVlcCqMeNaHuflomqWoqWfhE UPbRo2eghZOaMNiCKLLvZDIqPrh1IcsiEl6Ef3W4hICc42GTK96IuGisIvDXwQ4N /SzTOupJuN42noh3z1M3XuZy5RoXJ99IYDNY5CTKf9IdqvA0bbGkU3nb1gZH/xw9 BjTqKzR/7K1kTXuSgagDZ1Wceej9pZxhX7E3IHYsP8ZOvKug3EeL4yybVwQ3HRfq Qnzcp/qPB9cOkLSQXveRTFTsj2mX28Gixct/iDuc1jIYwGQlY1gI6dcUcqby6ptM BrQti7eR2NH2+T3aE2UVCIWsZVhx7NaSF+z8JxfAuu56jicc4xJVsi8zrNveWX5M m2VXyBl3121BVtKi4w== =0iVF -----END PGP SIGNATURE----- Merge tag 'hardening-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "One of the more voluminous set of changes is for adding the new __counted_by annotation[1] to gain run-time bounds checking of dynamically sized arrays with UBSan. - Add LKDTM test for stuck CPUs (Mark Rutland) - Improve LKDTM selftest behavior under UBSan (Ricardo Cañuelo) - Refactor more 1-element arrays into flexible arrays (Gustavo A. R. Silva) - Analyze and replace strlcpy and strncpy uses (Justin Stitt, Azeem Shaikh) - Convert group_info.usage to refcount_t (Elena Reshetova) - Add __counted_by annotations (Kees Cook, Gustavo A. R. Silva) - Add Kconfig fragment for basic hardening options (Kees Cook, Lukas Bulwahn) - Fix randstruct GCC plugin performance mode to stay in groups (Kees Cook) - Fix strtomem() compile-time check for small sources (Kees Cook)" * tag 'hardening-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (56 commits) hwmon: (acpi_power_meter) replace open-coded kmemdup_nul reset: Annotate struct reset_control_array with __counted_by kexec: Annotate struct crash_mem with __counted_by virtio_console: Annotate struct port_buffer with __counted_by ima: Add __counted_by for struct modsig and use struct_size() MAINTAINERS: Include stackleak paths in hardening entry string: Adjust strtomem() logic to allow for smaller sources hardening: x86: drop reference to removed config AMD_IOMMU_V2 randstruct: Fix gcc-plugin performance mode to stay in group mailbox: zynqmp: Annotate struct zynqmp_ipi_pdata with __counted_by drivers: thermal: tsens: Annotate struct tsens_priv with __counted_by irqchip/imx-intmux: Annotate struct intmux_data with __counted_by KVM: Annotate struct kvm_irq_routing_table with __counted_by virt: acrn: Annotate struct vm_memory_region_batch with __counted_by hwmon: Annotate struct gsc_hwmon_platform_data with __counted_by sparc: Annotate struct cpuinfo_tree with __counted_by isdn: kcapi: replace deprecated strncpy with strscpy_pad isdn: replace deprecated strncpy with strscpy NFS/flexfiles: Annotate struct nfs4_ff_layout_segment with __counted_by nfs41: Annotate struct nfs4_file_layout_dsaddr with __counted_by ... |
||
Linus Torvalds
|
9e87705289 |
Initial bcachefs pull request for 6.7-rc1
Here's the bcachefs filesystem pull request. One new patch since last week: the exportfs constants ended up conflicting with other filesystems that are also getting added to the global enum, so switched to new constants picked by Amir. I'll also be sending another pull request later on in the cycle bringing things up to date my master branch that people are currently running; that will be restricted to fs/bcachefs/, naturally. Testing - fstests as well as the bcachefs specific tests in ktest: https://evilpiepirate.org/~testdashboard/ci?branch=bcachefs-for-upstream It's also been soaking in linux-next, which resulted in a whole bunch of smatch complaints and fixes and a patch or two from Kees. The only new non fs/bcachefs/ patch is the objtool patch that adds bcachefs functions to the list of noreturns. The patch that exports osq_lock() has been dropped for now, per Ingo. Prereq patch list: |
||
Jan Kara
|
b3856da790
|
bcache: Fixup error handling in register_cache()
Coverity has noticed that the printing of error message in register_cache() uses already freed bdev_handle to get to bdev. In fact the problem has been there even before commit "bcache: Convert to bdev_open_by_path()" just a bit more subtle one - cache object itself could have been freed by the time we looked at ca->bdev and we don't hold any reference to bdev either so even that could in principle go away (due to device unplug or similar). Fix all these problems by printing the error message before closing the bdev. Fixes: dc893f51d24a ("bcache: Convert to bdev_open_by_path()") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231004093757.11560-1-jack@suse.cz Asked-by: Coly Li <colyli@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
Jan Kara
|
9f0f5a30d3
|
md: Convert to bdev_open_by_dev()
Convert md to use bdev_open_by_dev() and pass the handle around. We also don't need the 'Holder' flag anymore so remove it. CC: linux-raid@vger.kernel.org CC: Song Liu <song@kernel.org> Acked-by: Song Liu <song@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-11-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
Jan Kara
|
c2fce61fb2
|
dm: Convert to bdev_open_by_dev()
Convert device mapper to use bdev_open_by_dev() and pass the handle around. CC: Alasdair Kergon <agk@redhat.com> CC: Mike Snitzer <snitzer@kernel.org> CC: dm-devel@redhat.com Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-10-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
Jan Kara
|
631b001fd6
|
bcache: Convert to bdev_open_by_path()
Convert bcache to use bdev_open_by_path() and pass the handle around. CC: linux-bcache@vger.kernel.org CC: Coly Li <colyli@suse.de> CC: Kent Overstreet <kent.overstreet@gmail.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-9-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
Mike Snitzer
|
6f25dd1c57 |
dm: respect REQ_NOWAIT flag in normal bios issued to DM
Update DM core's normal IO submission to allocate required memory
using GFP_NOWAIT if REQ_NOWAIT is set.
Tested with simple test provided in commit
|
||
Mike Snitzer
|
4a2fe29608 |
dm: enhance alloc_multiple_bios() to be more versatile
alloc_multiple_bios() has the useful ability to try allocating bios with GFP_NOWAIT but will fallback to using GFP_NOIO. The callers service both empty flush bios and abnormal bios (e.g. discard). alloc_multiple_bios() enhancements offered in this commit: - don't require table_devices_lock if num_bios = 1 - allow caller to pass GFP_NOWAIT to do usual GFP_NOWAIT with GFP_NOIO fallback - allow caller to pass GFP_NOIO to _only_ allocate using GFP_NOIO Flush bios with data may be issued to DM with REQ_NOWAIT, as such it makes sense to attempt servicing them with GFP_NOWAIT allocations. But abnormal IO should never be issued using REQ_NOWAIT (if that changes in the future that's fine, but no sense supporting it now). While at it, rename __send_changing_extent_only() to __send_abnormal_io(). [Thanks to both Ming and Mikulas for help with translating known possible IO scenarios to requirements.] Suggested-by: Ming Lei <ming.lei@redhat.com> Suggested-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> |
||
Mikulas Patocka
|
34dbaa88ca |
dm: make __send_duplicate_bios return unsigned int
All the callers cast the value returned by __send_duplicate_bios to unsigned int type, so we can return unsigned int as well. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> |