Commit Graph

6549 Commits

Author SHA1 Message Date
Linus Torvalds
ed41fd071c block-5.11-2021-01-10
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/7KA0QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpn6WEACeUa97qyzm7G8/E5ejBL6lXSTRXNc8qa+h
 YCdrDltkqs6OHAuEyUCwGw3zPmb7fp4M5RLZ/Dp9EtMwld45HfoN6mpRe0+i4U96
 iAkHMNUo6ytp3wXX1XKgZ0FhcSOSwkQK8CMzmLPn+pxkDYzQPFg38AUISPpoDA/L
 YNh4tEiHHd5oprHIzludE00m2i1oYNrBcmUe27sKxR0mak0kEJtxr4cXLrqBtN3k
 9C31A0gstCINSHmQPAcRvFerDxDM0WPYQ7K6UEXfkCfbyf6i+1eG/qLUwUCdm9MD
 Rjot6dXzQ2LzqJbaAZndjJRDRZx2xpC2TNlNaBjYzSOC6AXSY0MKiZBCnH/i/OoZ
 f0Bq/k7LVeMbyu02cgIis4DPLabfG+XQUOniu4HQTrzK8+neApAlCwINc73cvQOb
 hBS+LfUVqP6K6g3oVGSvqG01wj2HK69SWMNKTr9GZ3GIqrcWYtA/JnqFfTE7/KwC
 H7rkPL8i3+NBXmjjz6hm8hx3MrnekKJpsdCBicm9OOYqJRbkGVjoUYeDFz5MElfp
 k71u2WDQ81aiqfWajsJkZaUFxZgUrRzuWeyBZiQQP9kJEMzUUiDSg4K+0WJhk5bO
 Y0EX0sdCz8k9IBKfi2+FcF5dYj3RDolALmBDrrcfchTW0h7vxMpn4rr/ueN7gViz
 rW/Gj9pRsA==
 =CClj
 -----END PGP SIGNATURE-----

Merge tag 'block-5.11-2021-01-10' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Missing CRC32 selections (Arnd)

 - Fix for a merge window regression with bdev inode init (Christoph)

 - bcache fixes

 - rnbd fixes

 - NVMe pull request from Christoph:
    - fix a race in the nvme-tcp send code (Sagi Grimberg)
    - fix a list corruption in an nvme-rdma error path (Israel Rukshin)
    - avoid a possible double fetch in nvme-pci (Lalithambika Krishnakumar)
    - add the susystem NQN quirk for a Samsung driver (Gopal Tiwari)
    - fix two compiler warnings in nvme-fcloop (James Smart)
    - don't call sleeping functions from irq context in nvme-fc (James Smart)
    - remove an unused argument (Max Gurtovoy)
    - remove unused exports (Minwoo Im)

 - Use-after-free fix for partition iteration (Ming)

 - Missing blk-mq debugfs flag annotation (John)

 - Bdev freeze regression fix (Satya)

 - blk-iocost NULL pointer deref fix (Tejun)

* tag 'block-5.11-2021-01-10' of git://git.kernel.dk/linux-block: (26 commits)
  bcache: set bcache device into read-only mode for BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET
  bcache: introduce BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE for large bucket
  bcache: check unsupported feature sets for bcache register
  bcache: fix typo from SUUP to SUPP in features.h
  bcache: set pdev_set_uuid before scond loop iteration
  blk-mq-debugfs: Add decode for BLK_MQ_F_TAG_HCTX_SHARED
  block/rnbd-clt: avoid module unload race with close confirmation
  block/rnbd: Adding name to the Contributors List
  block/rnbd-clt: Fix sg table use after free
  block/rnbd-srv: Fix use after free in rnbd_srv_sess_dev_force_close
  block/rnbd: Select SG_POOL for RNBD_CLIENT
  block: pre-initialize struct block_device in bdev_alloc_inode
  fs: Fix freeze_bdev()/thaw_bdev() accounting of bd_fsfreeze_sb
  nvme: remove the unused status argument from nvme_trace_bio_complete
  nvmet-rdma: Fix list_del corruption on queue establishment failure
  nvme: unexport functions with no external caller
  nvme: avoid possible double fetch in handling CQE
  nvme-tcp: Fix possible race of io_work and direct send
  nvme-pci: mark Samsung PM1725a as IGNORE_DEV_SUBNQN
  nvme-fcloop: Fix sscanf type and list_first_entry_or_null warnings
  ...
2021-01-10 12:53:08 -08:00
Coly Li
5342fd4255 bcache: set bcache device into read-only mode for BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET
If BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET is set in incompat feature
set, it means the cache device is created with obsoleted layout with
obso_bucket_site_hi. Now bcache does not support this feature bit, a new
BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE incompat feature bit is added
for a better layout to support large bucket size.

For the legacy compatibility purpose, if a cache device created with
obsoleted BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit, all bcache
devices attached to this cache set should be set to read-only. Then the
dirty data can be written back to backing device before re-create the
cache device with BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE feature bit
by the latest bcache-tools.

This patch checks BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET feature bit
when running a cache set and attach a bcache device to the cache set. If
this bit is set,
- When run a cache set, print an error kernel message to indicate all
  following attached bcache device will be read-only.
- When attach a bcache device, print an error kernel message to indicate
  the attached bcache device will be read-only, and ask users to update
  to latest bcache-tools.

Such change is only for cache device whose bucket size >= 32MB, this is
for the zoned SSD and almost nobody uses such large bucket size at this
moment. If you don't explicit set a large bucket size for a zoned SSD,
such change is totally transparent to your bcache device.

Fixes: ffa4703275 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-09 09:21:03 -07:00
Coly Li
b16671e8f4 bcache: introduce BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE for large bucket
When large bucket feature was added, BCH_FEATURE_INCOMPAT_LARGE_BUCKET
was introduced into the incompat feature set. It used bucket_size_hi
(which was added at the tail of struct cache_sb_disk) to extend current
16bit bucket size to 32bit with existing bucket_size in struct
cache_sb_disk.

This is not a good idea, there are two obvious problems,
- Bucket size is always value power of 2, if store log2(bucket size) in
  existing bucket_size of struct cache_sb_disk, it is unnecessary to add
  bucket_size_hi.
- Macro csum_set() assumes d[SB_JOURNAL_BUCKETS] is the last member in
  struct cache_sb_disk, bucket_size_hi was added after d[] which makes
  csum_set calculate an unexpected super block checksum.

To fix the above problems, this patch introduces a new incompat feature
bit BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE, when this bit is set, it
means bucket_size in struct cache_sb_disk stores the order of power-of-2
bucket size value. When user specifies a bucket size larger than 32768
sectors, BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE will be set to
incompat feature set, and bucket_size stores log2(bucket size) more
than store the real bucket size value.

The obsoleted BCH_FEATURE_INCOMPAT_LARGE_BUCKET won't be used anymore,
it is renamed to BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET and still only
recognized by kernel driver for legacy compatible purpose. The previous
bucket_size_hi is renmaed to obso_bucket_size_hi in struct cache_sb_disk
and not used in bcache-tools anymore.

For cache device created with BCH_FEATURE_INCOMPAT_LARGE_BUCKET feature,
bcache-tools and kernel driver still recognize the feature string and
display it as "obso_large_bucket".

With this change, the unnecessary extra space extend of bcache on-disk
super block can be avoided, and csum_set() may generate expected check
sum as well.

Fixes: ffa4703275 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-09 09:21:03 -07:00
Coly Li
1dfc0686c2 bcache: check unsupported feature sets for bcache register
This patch adds the check for features which is incompatible for
current supported feature sets.

Now if the bcache device created by bcache-tools has features that
current kernel doesn't support, read_super() will fail with error
messoage. E.g. if an unsupported incompatible feature detected,
bcache register will fail with dmesg "bcache: register_bcache() error :
Unsupported incompatible feature found".

Fixes: d721a43ff6 ("bcache: increase super block version for cache device and backing device")
Fixes: ffa4703275 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-09 09:21:03 -07:00
Coly Li
f7b4943dea bcache: fix typo from SUUP to SUPP in features.h
This patch fixes the following typos,
from BCH_FEATURE_COMPAT_SUUP to BCH_FEATURE_COMPAT_SUPP
from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_INCOMPAT_SUPP
from BCH_FEATURE_INCOMPAT_SUUP to BCH_FEATURE_RO_COMPAT_SUPP

Fixes: d721a43ff6 ("bcache: increase super block version for cache device and backing device")
Fixes: ffa4703275 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli@suse.de>
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-09 09:21:03 -07:00
Yi Li
e80927079f bcache: set pdev_set_uuid before scond loop iteration
There is no need to reassign pdev_set_uuid in the second loop iteration,
so move it to the place before second loop.

Signed-off-by: Yi Li <yili@winhong.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-09 09:21:03 -07:00
Linus Torvalds
dea8dcf2a9 Revert WQ_SYSFS change that broke reencryption (and all other
functionality that requires reloading a dm-crypt DM table).
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl/qS/4THHNuaXR6ZXJA
 cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWiD0CACEywiVceLgV4dTnlH5h5YJjxybjo9g
 LyVe+4X9bN2LauJNpNmlWgRQZTovC6wIk9kyY7p69ZZqXe9Y4sXxoynRoUu/2DiG
 5+MnOleTIOafUHJTJOGhDs+vDPgNnYh3xmoVNZqAQpcexPJg/E0wuhcgmO9lXFew
 ldcNOXV51AdANfjLKFlyQckqBG028ktdV2xt7u/B1FAKcmUbb0rfG7LDHlGf1ggj
 KAVlZzMry7wMhEGPS3IQtmA0mO5DSMn1Kp4iM0Wd6cVrJ+jZafv0uFHFUHiH7DYy
 yxj7AurM/0wxS06cHvCN82OoOl9AzDmNAKy94I/PiWAqyP3I6iiep5rE
 =K4dv
 -----END PGP SIGNATURE-----

Merge tag 'for-5.11/dm-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fix from Mike Snitzer:
 "Revert WQ_SYSFS change that broke reencryption (and all other
  functionality that requires reloading a dm-crypt DM table)"

* tag 'for-5.11/dm-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  Revert "dm crypt: export sysfs of kcryptd workqueue"
2020-12-28 13:32:16 -08:00
Mike Snitzer
48b0777cd9 Revert "dm crypt: export sysfs of kcryptd workqueue"
This reverts commit a2b8b2d975.

WQ_SYSFS breaks the ability to reload a DM table due to sysfs kobject
collision (due to active and inactive table). Given lack of
demonstrated need for exposing this workqueue via sysfs: revert
exposing it.

Reported-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-28 16:13:52 -05:00
Linus Torvalds
771e7e4161 block-5.11-2020-12-23
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/kHXgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpkhFEADYuiBZbYEonaV4/nqOhMZ6lXj99rEqVZui
 AOMm7W8nopb97pWy0sJxZHPPMnjglubkTZbX/2TH08ndppGBQAa9HsgDIITO5Ap3
 vjnew6oPsrMtxMUMqR8w8nMb4z5tfqpvUYRPd+qpvYSYLEeSh0UNAEdQ/MOGbA1t
 nl8UPdNW/s1MbeyLJhU+NgGM0aZahED8KuJeVLOY2im6dySO+CoeoB0/mWrdc0PZ
 SDcGBEjhmFspDVAkW4Wo8bMw7Cr72es0esvJSJyx0mlo0jSCR7ZYParDtPwuAl9H
 thKo+brLibz9M+wRtW+7w37oUADTTRL+KV2xJ/J+DeAVjI7/hxgZKewh6hEORmrF
 DwjbS8cnEVp70rfF7+Z9FV/+2GJXm1uI/swBi69Y42CQ33FNLZmikBW6Q3pimrDj
 ZeQSCLRpGlo01xtJpsm8L71KGSfYd+njWaIFnESPhxJ+sTxd8kvFRsdjHlc4KNBG
 UnMzvn6pE3WNhzJJoIqdM2uXa4XDyKkMfO7VbpUgbyij103jpbs7ruWXq3DDU5/t
 Gdwa821zlq5azDYAw7PNb7FhKrsHhiKhOuxyKqJUbOUkOHBNwxJmxXcaMnR3fOSi
 B+AlqYhC6A9DhkL0HG0QLcdFwYawpOsDxbFUbemt/UXn74lAjRnp8+rNMSn5tjbn
 OCnk4Tpaww==
 =Q42d
 -----END PGP SIGNATURE-----

Merge tag 'block-5.11-2020-12-23' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A few stragglers in here, but mostly just straight fixes. In
  particular:

   - Set of rnbd fixes for issues around changes for the merge window
     (Gioh, Jack, Md Haris Iqbal)

   - iocost tracepoint addition (Baolin)

   - Copyright/maintainers update (Christoph)

   - Remove old blk-mq fast path CPU warning (Daniel)

   - loop max_part fix (Josh)

   - Remote IPI threaded IRQ fix (Sebastian)

   - dasd stable fixes (Stefan)

   - bcache merge window fixup and style fixup (Yi, Zheng)"

* tag 'block-5.11-2020-12-23' of git://git.kernel.dk/linux-block:
  md/bcache: convert comma to semicolon
  bcache:remove a superfluous check in register_bcache
  block: update some copyrights
  block: remove a pointless self-reference in block_dev.c
  MAINTAINERS: add fs/block_dev.c to the block section
  blk-mq: Don't complete on a remote CPU in force threaded mode
  s390/dasd: fix list corruption of lcu list
  s390/dasd: fix list corruption of pavgroup group list
  s390/dasd: prevent inconsistent LCU device data
  s390/dasd: fix hanging device offline processing
  blk-iocost: Add iocg idle state tracepoint
  nbd: Respect max_part for all partition scans
  block/rnbd-clt: Does not request pdu to rtrs-clt
  block/rnbd-clt: Dynamically allocate sglist for rnbd_iu
  block/rnbd: Set write-back cache and fua same to the target device
  block/rnbd: Fix typos
  block/rnbd-srv: Protect dev session sysfs removal
  block/rnbd-clt: Fix possible memleak
  block/rnbd-clt: Get rid of warning regarding size argument in strlcpy
  blk-mq: Remove 'running from the wrong CPU' warning
2020-12-24 12:28:35 -08:00
Zheng Yongjun
46926127d7 md/bcache: convert comma to semicolon
Replace a comma between expression statements by a semicolon.

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: Coly Li <colyli@sue.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-23 09:25:15 -07:00
Yi Li
117ae250cf bcache:remove a superfluous check in register_bcache
There have no reassign the bdev after check It is IS_ERR.
the double check !IS_ERR(bdev) is superfluous.

After commit 4e7b5671c6 ("block: remove i_bdev"),
"Switch the block device lookup interfaces to directly work with a dev_t
so that struct block_device references are only acquired by the
blkdev_get variants (and the blk-cgroup special case).  This means that
we now don't need an extra reference in the inode and can generally
simplify handling of struct block_device to keep the lookups contained
in the core block layer code."

so after lookup_bdev call, there no need to do bdput.

remove a superfluous check the bdev & don't call bdput after lookup_bdev.

Fixes: 4e7b5671c6a8("block: remove i_bdev")
Signed-off-by: Yi Li <yili@winhong.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-23 09:25:15 -07:00
Linus Torvalds
d8355e740f - Add DM verity support for signature verification with 2nd keyring.
- Fix DM verity to skip verity work if IO completes with error while
   system is shutting down.
 
 - Add new DM multipath "IO affinity" path selector that maps IO
   destined to a given path to a specific CPU based on user provided
   mapping.
 
 - Rename DM multipath path selector source files to have "dm-ps"
   prefix.
 
 - Add REQ_NOWAIT support to some other simple DM targets that don't
   block in more elaborate ways waiting for IO.
 
 - Export DM crypt's kcryptd workqueue via sysfs (WQ_SYSFS).
 
 - Fix error return code in DM's target_message() if empty message is
   received.
 
 - A handful of other small cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl/iDJYTHHNuaXR6ZXJA
 cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWsI2CACUk9PCCtOOHH1s//VeLjF86VUIsuxf
 hNMfTgWT+arlHUIzl2Bp4c4Dq/T+hXTYlf+f5zGRAbBAd82eCqji5LTVvy/qaYt3
 4gWSZnv/3VYHCx4MvV9UjtXQcSnS/ttDwyV0ZD6/NtYllQQobaCbyrE4p2tA1GIk
 ql8ESRgXKK5Sio197Tm45UEkrhG0oUrEZ3riBaXeM9yOldLp1mLLVPGJeLcJDvpA
 N7TDcM0owq/CMbmu5BkNv0xw7q/Vc9VQLGva8a15StxGjk1HY5/6KQWssCEkTkO7
 HnIprATtWz2r0qgTcI+fOXndUmlqgQr1jhvYfJeuv45KbjoI5qubZvYr
 =vHNj
 -----END PGP SIGNATURE-----

Merge tag 'for-5.11/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Add DM verity support for signature verification with 2nd keyring

 - Fix DM verity to skip verity work if IO completes with error while
   system is shutting down

 - Add new DM multipath "IO affinity" path selector that maps IO
   destined to a given path to a specific CPU based on user provided
   mapping

 - Rename DM multipath path selector source files to have "dm-ps" prefix

 - Add REQ_NOWAIT support to some other simple DM targets that don't
   block in more elaborate ways waiting for IO

 - Export DM crypt's kcryptd workqueue via sysfs (WQ_SYSFS)

 - Fix error return code in DM's target_message() if empty message is
   received

 - A handful of other small cleanups

* tag 'for-5.11/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm cache: simplify the return expression of load_mapping()
  dm ebs: avoid double unlikely() notation when using IS_ERR()
  dm verity: skip verity work if I/O error when system is shutting down
  dm crypt: export sysfs of kcryptd workqueue
  dm ioctl: fix error return code in target_message
  dm crypt: Constify static crypt_iv_operations
  dm: add support for REQ_NOWAIT to various targets
  dm: rename multipath path selector source files to have "dm-ps" prefix
  dm mpath: add IO affinity path selector
  dm verity: Add support for signature verification with 2nd keyring
  dm: remove unnecessary current->bio_list check when submitting split bio
2020-12-22 13:27:21 -08:00
Zheng Yongjun
b77709237e dm cache: simplify the return expression of load_mapping()
Simplify the return expression.

Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-22 09:54:48 -05:00
Antonio Quartulli
52252adede dm ebs: avoid double unlikely() notation when using IS_ERR()
The definition of IS_ERR() already applies the unlikely() notation
when checking the error status of the passed pointer. For this
reason there is no need to have the same notation outside of
IS_ERR() itself.

Clean up code by removing redundant notation.

Signed-off-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-21 13:59:37 -05:00
Hyeongseok Kim
252bd12563 dm verity: skip verity work if I/O error when system is shutting down
If emergency system shutdown is called, like by thermal shutdown,
a dm device could be alive when the block device couldn't process
I/O requests anymore. In this state, the handling of I/O errors
by new dm I/O requests or by those already in-flight can lead to
a verity corruption state, which is a misjudgment.

So, skip verity work in response to I/O error when system is shutting
down.

Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-21 13:57:25 -05:00
Linus Torvalds
69f637c335 for-5.11/drivers-2020-12-14
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/XgdYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjTBD/4me2TNvGOogbcL0b1leAotndJ7spI/IcFM
 NUMNy3pOGuRBcRjwle85xq44puAjlNkZE2LLatem5sT7ZvS+8lPNnOIoTYgfaCjt
 PhKx2sKlLumVm3BwymYAPcPtke4fikGG15Mwu5nX1oOehmyGrjObGAr3Lo6gexCT
 tQoCOczVqaTsV+iTXrLlmgEgs07J9Tm93uh2cNR8Jgroxb8ivuWeUq4YgbV4kWk+
 Y8XvOyVE/yba0vQf5/hHtWuVoC6RdELnqZ6NCkcP/EicdBecwk1GMJAej1S3zPS1
 0BT7GSFTpm3YUHcygD6LRmRg4I/BmWDTDtMi84+jLat6VvSG1HwIm//qHiCJh3ku
 SlvFZENIWAv5LP92x2vlR5Lt7uE3GK2V/5Pxt2fekyzCth6mzu+hLH4CBPQ3xgyd
 E1JqIQ/ilbXstp+EYoivV5x8yltZQnKEZRopws0EOqj1LsmDPj9XT1wzE9RnB0o+
 PWu/DNhQFhhcmP7Z8uLgPiKIVpyGs+vjxiJLlTtGDFTCy6M5JbcgzGkEkSmnybxH
 7lSanjpLt1dWj85FBMc6fNtJkv2rBPfb4+j0d1kZ45Dzcr4umirGIh7wtCHcgc83
 brmXSt29hlKHseSHMMuNWK8haXcgAE7gq9tD8GZ/kzM7+vkmLLxHJa22Qhq5rp4w
 URPeaBaQJw==
 =ayp2
 -----END PGP SIGNATURE-----

Merge tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-block

Pull block driver updates from Jens Axboe:
 "Nothing major in here:

   - NVMe pull request from Christoph:
        - nvmet passthrough improvements (Chaitanya Kulkarni)
        - fcloop error injection support (James Smart)
        - read-only support for zoned namespaces without Zone Append
          (Javier González)
        - improve some error message (Minwoo Im)
        - reject I/O to offline fabrics namespaces (Victor Gladkov)
        - PCI queue allocation cleanups (Niklas Schnelle)
        - remove an unused allocation in nvmet (Amit Engel)
        - a Kconfig spelling fix (Colin Ian King)
        - nvme_req_qid simplication (Baolin Wang)

   - MD pull request from Song:
        - Fix race condition in md_ioctl() (Dae R. Jeong)
        - Initialize read_slot properly for raid10 (Kevin Vigor)
        - Code cleanup (Pankaj Gupta)
        - md-cluster resync/reshape fix (Zhao Heming)

   - Move null_blk into its own directory (Damien Le Moal)

   - null_blk zone and discard improvements (Damien Le Moal)

   - bcache race fix (Dongsheng Yang)

   - Set of rnbd fixes/improvements (Gioh Kim, Guoqing Jiang, Jack Wang,
     Lutz Pogrell, Md Haris Iqbal)

   - lightnvm NULL pointer deref fix (tangzhenhao)

   - sr in_interrupt() removal (Sebastian Andrzej Siewior)

   - FC endpoint security support for s390/dasd (Jan Höppner, Sebastian
     Ott, Vineeth Vijayan). From the s390 arch guys, arch bits included
     as it made it easier for them to funnel the feature through the
     block driver tree.

   - Follow up fixes (Colin Ian King)"

* tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-block: (64 commits)
  block: drop dead assignments in loop_init()
  sr: Remove in_interrupt() usage in sr_init_command().
  sr: Switch the sector size back to 2048 if sr_read_sector() changed it.
  cdrom: Reset sector_size back it is not 2048.
  drivers/lightnvm: fix a null-ptr-deref bug in pblk-core.c
  null_blk: Move driver into its own directory
  null_blk: Allow controlling max_hw_sectors limit
  null_blk: discard zones on reset
  null_blk: cleanup discard handling
  null_blk: Improve implicit zone close
  null_blk: improve zone locking
  block: Align max_hw_sectors to logical blocksize
  null_blk: Fail zone append to conventional zones
  null_blk: Fix zone size initialization
  bcache: fix race between setting bdev state to none and new write request direct to backing
  block/rnbd: fix a null pointer dereference on dev->blk_symlink_name
  block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name
  block/rnbd: call kobject_put in the failure path
  Documentation/ABI/rnbd-srv: add document for force_close
  block/rnbd-srv: close a mapped device from server side.
  ...
2020-12-16 13:09:32 -08:00
Linus Torvalds
ac7ac4618c for-5.11/block-2020-12-14
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/Xec8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpoLbEACzXypgZWwMdfgRckA/Vt333rXHtbhUV+hK
 2XP+P81iRvr9Esi31UPbRp82vrgcDO0cpI1QmQojS5U5TIQP88BfXptfRZZu48eb
 wT5RDDNQ34HItqAh/yEuYsv9yUKcxeIrB99tBVvM+4UmQg9zTdIW3mg6PvCBdbhV
 N38jI0tCF/PJatjfRuphT/nXonQLPWBlVDmZk06KZQFOwQe9ep1vUi1+nbiRPuo3
 geFBpTh1Kp6Vl1B3n4RpECs6Y7I0RRuJdaH2sDizICla1/BW91F9fQwHimNnUxUq
 e1Q1kMuh6ftcQGkYlHSYcPhuv6CvorldTZCO5arPxWpcwvxriTSMRPWAgUr5pEiF
 fhiGhqeDu9e6vl9vS31wUD1B30hy+jFz9wyjRrDwJ3cPHH1JVBjTzvdX+cIh/1ku
 IbIwUMteUtvUrzqAv/DzbGhedp7xWtOFaVo8j0QFYh9zkjd6b8yDOF/yztwX2gjY
 Xt1cd+KpDSiN449ZRaoMI0sCJAxqzhMa6nsWlb0L7KuNyWKAbvKQBm9Rb47FLV9A
 Vx70KC+zkFoyw23capvIahmQazerriUJ5PGe0lVm6ROgmIFdCpXTPDjnrvq/6RZ/
 GEpD7gTW9atGJ7EuEE8686sAfKD5kneChWLX5EHXf0d0AG5Mr2lKsluiGp5LpPJg
 Q1Xqs6xwww==
 =zo4w
 -----END PGP SIGNATURE-----

Merge tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:
 "Another series of killing more code than what is being added, again
  thanks to Christoph's relentless cleanups and tech debt tackling.

  This contains:

   - blk-iocost improvements (Baolin Wang)

   - part0 iostat fix (Jeffle Xu)

   - Disable iopoll for split bios (Jeffle Xu)

   - block tracepoint cleanups (Christoph Hellwig)

   - Merging of struct block_device and hd_struct (Christoph Hellwig)

   - Rework/cleanup of how block device sizes are updated (Christoph
     Hellwig)

   - Simplification of gendisk lookup and removal of block device
     aliasing (Christoph Hellwig)

   - Block device ioctl cleanups (Christoph Hellwig)

   - Removal of bdget()/blkdev_get() as exported API (Christoph Hellwig)

   - Disk change rework, avoid ->revalidate_disk() (Christoph Hellwig)

   - sbitmap improvements (Pavel Begunkov)

   - Hybrid polling fix (Pavel Begunkov)

   - bvec iteration improvements (Pavel Begunkov)

   - Zone revalidation fixes (Damien Le Moal)

   - blk-throttle limit fix (Yu Kuai)

   - Various little fixes"

* tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block: (126 commits)
  blk-mq: fix msec comment from micro to milli seconds
  blk-mq: update arg in comment of blk_mq_map_queue
  blk-mq: add helper allocating tagset->tags
  Revert "block: Fix a lockdep complaint triggered by request queue flushing"
  nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class
  blk-mq: add new API of blk_mq_hctx_set_fq_lock_class
  block: disable iopoll for split bio
  block: Improve blk_revalidate_disk_zones() checks
  sbitmap: simplify wrap check
  sbitmap: replace CAS with atomic and
  sbitmap: remove swap_lock
  sbitmap: optimise sbitmap_deferred_clear()
  blk-mq: skip hybrid polling if iopoll doesn't spin
  blk-iocost: Factor out the base vrate change into a separate function
  blk-iocost: Factor out the active iocgs' state check into a separate function
  blk-iocost: Move the usage ratio calculation to the correct place
  blk-iocost: Remove unnecessary advance declaration
  blk-iocost: Fix some typos in comments
  blktrace: fix up a kerneldoc comment
  block: remove the request_queue to argument request based tracepoints
  ...
2020-12-16 12:57:51 -08:00
Mike Snitzer
0941e3b065 Revert "dm raid: fix discard limits for raid1 and raid10"
This reverts commit e0910c8e4f.

Reverting 6ffeb1c3f8 ("md: change mddev 'chunk_sectors' from int to
unsigned") exposes dm-raid.c compiler warnings detailed that commit's
header. Clearly this more conservative fix, of simply reverting
e0910c8e4f, would've been more prudent given how late we were in the
v5.10 release. Lessons have been learned.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-14 12:12:08 -05:00
Mike Snitzer
77a68698ff Revert "md: change mddev 'chunk_sectors' from int to unsigned"
This reverts commit 6ffeb1c3f8.

This change caused unexpected v5.10 raid6 mount failures, see:
https://lkml.org/lkml/2020/12/14/7

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-14 12:08:48 -05:00
Linus Torvalds
d2360a398f block-5.10-2020-12-12
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/VMG8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpsBdEACjbsECHIyQZOtTHW+vY2wkOj2ph+OkXtUU
 nmRlXn5itcrTEeHyenwDEc/W03WbU5m4C6BUz9ecRtXLda6K9W2d7YspSWND1JP/
 Gz3YtingA4xwAl7wOtHT6gi6JHb/seDObiKWAik6f7A3xvBW3o9lV1QRy3kaNdG1
 XgcYvBZBHDaP7Gdvvbg6FucSe3dNjSi+skVdvx8j+i27HF3LAEp3eoWG4vZBUePH
 RIpdDyCXRao+Z22V+EeY0X5cH568fxS408Z6xXewV/g2TDd4XUaXz3WM0O9FN0jU
 oTiQeXL+1y4EfU8QoboTWvHmpoKRqWMSZVD0GXbmx4ZwpiFHbyBub2SydU2Xm/Vl
 oNwHc1wktVU2HFrWU9dwrlWz4Egt4KZO6D9SzJ3VDt9HLKIciyZD16rbsVwSr/Bc
 BVQaFJjRYoR60xxNohIEC8C3rF3qRbkAy5HYSHFUyuOE6gm05RQIHLprY6gIjSei
 7gJnP1FL3iu5bQywp7ptqaRIlkeau6kqvboc8oYFDQvVAp35JqnxjdOsAByWUZR9
 lrN87bIh0n1Vqprs5arXWhw5Ixy7ZQsf0DsrFImZgU+VNTFmuB+1cKaa94EsQwo5
 gQH2j39QQSlGmkXphQVWDd0eGCAt3OrmjYe721ZSSLvJjM64+7pZFZpYPc//VpR5
 8Kcq/yDeOA==
 =PmVb
 -----END PGP SIGNATURE-----

Merge tag 'block-5.10-2020-12-12' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "This should be it for 5.10.

  Mike and Song looked into the warning case, and thankfully it appears
  the fix was pretty trivial - we can just change the md device chunk
  type to unsigned int to get rid of it. They cannot currently be < 0,
  and nobody is checking for that either.

  We're reverting the discard changes as the corruption reports came in
  very late, and there's just no time to attempt to deal with it at this
  point. Reverting the changes in question is the right call for 5.10"

* tag 'block-5.10-2020-12-12' of git://git.kernel.dk/linux-block:
  md: change mddev 'chunk_sectors' from int to unsigned
  Revert "md: add md_submit_discard_bio() for submitting discard bio"
  Revert "md/raid10: extend r10bio devs to raid disks"
  Revert "md/raid10: pull codes that wait for blocked dev into one function"
  Revert "md/raid10: improve raid10 discard request"
  Revert "md/raid10: improve discard request for far layout"
  Revert "dm raid: remove unnecessary discard limits for raid10"
2020-12-13 10:36:23 -08:00
Mike Snitzer
6ffeb1c3f8 md: change mddev 'chunk_sectors' from int to unsigned
Commit e2782f560c ("Revert "dm raid: remove unnecessary discard
limits for raid10"") exposed compiler warnings introduced by commit
e0910c8e4f ("dm raid: fix discard limits for raid1 and raid10"):

In file included from ./include/linux/kernel.h:14,
                 from ./include/asm-generic/bug.h:20,
                 from ./arch/x86/include/asm/bug.h:93,
                 from ./include/linux/bug.h:5,
                 from ./include/linux/mmdebug.h:5,
                 from ./include/linux/gfp.h:5,
                 from ./include/linux/slab.h:15,
                 from drivers/md/dm-raid.c:8:
drivers/md/dm-raid.c: In function ‘raid_io_hints’:
./include/linux/minmax.h:18:28: warning: comparison of distinct pointer types lacks a cast
  (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
                            ^~
./include/linux/minmax.h:32:4: note: in expansion of macro ‘__typecheck’
   (__typecheck(x, y) && __no_side_effects(x, y))
    ^~~~~~~~~~~
./include/linux/minmax.h:42:24: note: in expansion of macro ‘__safe_cmp’
  __builtin_choose_expr(__safe_cmp(x, y), \
                        ^~~~~~~~~~
./include/linux/minmax.h:51:19: note: in expansion of macro ‘__careful_cmp’
 #define min(x, y) __careful_cmp(x, y, <)
                   ^~~~~~~~~~~~~
./include/linux/minmax.h:84:39: note: in expansion of macro ‘min’
  __x == 0 ? __y : ((__y == 0) ? __x : min(__x, __y)); })
                                       ^~~
drivers/md/dm-raid.c:3739:33: note: in expansion of macro ‘min_not_zero’
   limits->max_discard_sectors = min_not_zero(rs->md.chunk_sectors,
                                 ^~~~~~~~~~~~

Fix this by changing the chunk_sectors member of 'struct mddev' from
int to 'unsigned int' to match the type used for the 'chunk_sectors'
member of 'struct queue_limits'.  Various MD code still uses 'int' but
none of it appears to ever make use of signed int; and storing
positive signed int in unsigned is perfectly safe.

Reported-by: Song Liu <songliubraving@fb.com>
Fixes: e2782f560c ("Revert "dm raid: remove unnecessary discard limits for raid10"")
Fixes: e0910c8e4f ("dm raid: fix discard limits for raid1 and raid10")
Cc: stable@vger,kernel.org # e0910c8e4f was marked for stable@
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-12 10:07:50 -07:00
Song Liu
57a0f3a81e Revert "md: add md_submit_discard_bio() for submitting discard bio"
This reverts commit 2628089b74.

Matthew Ruffell reported data corruption in raid10 due to the changes
in discard handling [1]. Revert these changes before we find a proper fix.

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/
Cc: Matthew Ruffell <matthew.ruffell@canonical.com>
Cc: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-12-09 20:46:01 -08:00
Song Liu
17c28c2a06 Revert "md/raid10: extend r10bio devs to raid disks"
This reverts commit 8650a88901.

Matthew Ruffell reported data corruption in raid10 due to the changes
in discard handling [1]. Revert these changes before we find a proper fix.

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/
Cc: Matthew Ruffell <matthew.ruffell@canonical.com>
Cc: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-12-09 20:46:01 -08:00
Song Liu
4e2c6567ef Revert "md/raid10: pull codes that wait for blocked dev into one function"
This reverts commit f046f5d0d7.

Matthew Ruffell reported data corruption in raid10 due to the changes
in discard handling [1]. Revert these changes before we find a proper fix.

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/
Cc: Matthew Ruffell <matthew.ruffell@canonical.com>
Cc: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-12-09 20:46:01 -08:00
Song Liu
d7cb6be0d0 Revert "md/raid10: improve raid10 discard request"
This reverts commit bcc90d2804.

Matthew Ruffell reported data corruption in raid10 due to the changes
in discard handling [1]. Revert these changes before we find a proper fix.

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/
Cc: Matthew Ruffell <matthew.ruffell@canonical.com>
Cc: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-12-09 20:46:00 -08:00
Song Liu
82fe9af77c Revert "md/raid10: improve discard request for far layout"
This reverts commit d3ee2d8415.

Matthew Ruffell reported data corruption in raid10 due to the changes
in discard handling [1]. Revert these changes before we find a proper fix.

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/
Cc: Matthew Ruffell <matthew.ruffell@canonical.com>
Cc: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-12-09 20:46:00 -08:00
Song Liu
e2782f560c Revert "dm raid: remove unnecessary discard limits for raid10"
This reverts commit f0e90b6c66.

Matthew Ruffell reported data corruption in raid10 due to the changes
in discard handling [1]. Revert these changes before we find a proper fix.

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/
Cc: Matthew Ruffell <matthew.ruffell@canonical.com>
Cc: Xiao Ni <xni@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-12-09 20:43:48 -08:00
Dongsheng Yang
df4ad53242 bcache: fix race between setting bdev state to none and new write request direct to backing
There is a race condition in detaching as below:
A. detaching			B. Write request
(1) writing back
(2) write back done, set bdev
    state to clean.
(3) cached_dev_put() and
    schedule_work(&dc->detach);
				(4) write data [0 - 4K] directly
				    into backing and ack to user.
(5) power-failure...

When we restart this bcache device, this bdev is clean but not detached,
and read [0 - 4K], we will get unexpected old data from cache device.

To fix this problem, set the bdev state to none when we writeback done
in detaching, and then if power-failure happened as above, the data in
cache will not be used in next bcache device starting, it's detached, we
will read the correct data from backing derectly.

Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07 13:25:16 -07:00
Jeffle Xu
a2b8b2d975 dm crypt: export sysfs of kcryptd workqueue
It should be helpful to export sysfs of "kcryptd" workqueue in some
cases, such as setting specific CPU affinity of the workqueue.

Besides, also tweak the name format a little. The slash inside a
directory name will be translate into exclamation mark, such as
/sys/devices/virtual/workqueue/'kcryptd!253:0'.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04 18:04:36 -05:00
Qinglang Miao
4d7659bfbe dm ioctl: fix error return code in target_message
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: 2ca4c92f58 ("dm ioctl: prevent empty message")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04 18:04:36 -05:00
Rikard Falkeborn
e8dc79d1bd dm crypt: Constify static crypt_iv_operations
The only usage of these structs is to assign their address to the
iv_gen_ops field in the crypt config struct, which is a pointer to
const. Make them const like the rest of the static crypt_iv_operations
structs. This allows the compiler to put them in read-only memory.

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04 18:04:35 -05:00
Jeffle Xu
410fe22007 dm: add support for REQ_NOWAIT to various targets
commit 021a24460d ("block: add QUEUE_FLAG_NOWAIT") added a new queue
flag QUEUE_FLAG_NOWAIT to advertise if the bdev supports handling of
REQ_NOWAIT or not. DM core supports stacking QUEUE_FLAG_NOWAIT since
commit 6abc49468e ("dm: add support for REQ_NOWAIT and enable it for
linear target"), in which only dm-linear enabled it.

Update others DM targets, which just do simple remapping, to enable
support for REQ_NOWAIT.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04 18:04:35 -05:00
Mike Snitzer
298fb37298 dm: rename multipath path selector source files to have "dm-ps" prefix
Additional prefix helps clarify that these source files implement path
selectors.

Required updating Makefile to still build modules _without_ the
"dm-ps" prefix to preserve dm-multipath's ability to autoload path
selector modules. While at it, cleaned up some DM whitespace in
Makefile.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04 18:04:35 -05:00
Mike Christie
e4d2e82b23 dm mpath: add IO affinity path selector
This patch adds a path selector that selects paths based on a CPU to
path mapping the user passes in and what CPU we are executing on. The
primary user for this PS is where the app is optimized to use specific
CPUs so other PSs undo the apps handy work, and the storage and it's
transport are not a bottlneck.

For these io-affinity PS setups a path's transport/interconnect
perf is not going to flucuate a lot and there is no major differences
between paths, so QL/HST smarts do not help and RR always messes up
what the app is trying to do.

On a system with 16 cores, where you have a job per CPU:

fio --filename=/dev/dm-0 --direct=1 --rw=randrw --bs=4k \
--ioengine=libaio --iodepth=128 --numjobs=16

and a dm-multipath device setup where each CPU is mapped to one path:

// When in mq mode I had to set dm_mq_nr_hw_queues=$NUM_PATHS.
// Bio mode also showed similar results.
0 16777216 multipath 0 0 1 1 io-affinity 0 16 1 8:16 1 8:32 2 8:64 4
8:48 8 8:80 10 8:96 20 8:112 40 8:128 80 8:144 100 8:160 200 8:176
400 8:192 800 8:208 1000 8:224 2000 8:240 4000 65:0 8000

we can see a IOPs increase of 25%.

The percent increase depends on the device and interconnect. For a
slower/medium speed path/device that can do around 180K IOPs a path
if you ran that fio command to it directly we saw a 25% increase like
above. Slower path'd devices that could do around 90K per path showed
maybe around a 2 - 5% increase. If you use something like null_blk or
scsi_debug which can multi-million IOPs and hack it up so each device
they export shows up as a path then you see 50%+ increases.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04 18:04:35 -05:00
Mickaël Salaün
4da8f8c8a1 dm verity: Add support for signature verification with 2nd keyring
Add a new configuration DM_VERITY_VERIFY_ROOTHASH_SIG_SECONDARY_KEYRING
to enable dm-verity signatures to be verified against the secondary
trusted keyring.  Instead of relying on the builtin trusted keyring
(with hard-coded certificates), the second trusted keyring can include
certificate authorities from the builtin trusted keyring and child
certificates loaded at run time.  Using the secondary trusted keyring
enables to use dm-verity disks (e.g. loop devices) signed by keys which
did not exist at kernel build time, leveraging the certificate chain of
trust model.  In practice, this makes it possible to update certificates
without kernel update and reboot, aligning with module and kernel
(kexec) signature verification which already use the secondary trusted
keyring.

Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04 18:04:35 -05:00
Jeffle Xu
985eabdcfe dm: remove unnecessary current->bio_list check when submitting split bio
The depth-first splitting is introduced in commit 18a25da843 ("dm:
ensure bio submission follows a depth-first tree walk"), which is used
to fix the potential deadlock in case of the misordering handling of
bios caused by bio_list. There're two paths submitting split bios,
dm_wq_work() from worker thread and submit_bio() from application. Back
upon that time, dm_wq_work() thread calls __split_and_process_bio()
directly and thus will not trigger this issue since bio_list doesn't
exist here. So this issue will only be triggered from application
calling submit_bio(), and the fix has to check if current->bio_list is
non-NULL to distinguish this case.

However since commit 0c2915b8c6 ("dm: fix missing imposition of
queue_limits from dm_wq_work() thread"), dm_wq_work() thread calls
submit_bio_noacct() and thus also uses bio_list. Since then all entries
into __split_and_process_bio() are under protection of bio_list, and
thus the checking of current->bio_list when determinning if the
depth-first principle should be used, seems kind of nonsense. After all
the checking always succeeds now.

Fixes: 0c2915b8c6 ("dm: fix missing imposition of queue_limits from dm_wq_work() thread")
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04 18:04:35 -05:00
Mike Snitzer
bde3808bc8 dm: remove invalid sparse __acquires and __releases annotations
Fixes sparse warnings:
drivers/md/dm.c:508:12: warning: context imbalance in 'dm_prepare_ioctl' - wrong count at exit
drivers/md/dm.c:543:13: warning: context imbalance in 'dm_unprepare_ioctl' - wrong count at exit

Fixes: 971888c469 ("dm: hold DM table for duration of ioctl rather than use blkdev_get")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04 15:25:18 -05:00
Mike Snitzer
f05c4403db dm: fix double RCU unlock in dm_dax_zero_page_range() error path
Remove redundant dm_put_live_table() in dm_dax_zero_page_range() error
path to fix sparse warning:
drivers/md/dm.c:1208:9: warning: context imbalance in 'dm_dax_zero_page_range' - unexpected unlock

Fixes: cdf6cdcd3b ("dm,dax: Add dax zero_page_range operation")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-12-04 15:19:27 -05:00
Mike Snitzer
3ee16db390 dm: fix IO splitting
Commit 882ec4e609 ("dm table: stack 'chunk_sectors' limit to account
for target-specific splitting") caused a couple regressions:
1) Using lcm_not_zero() when stacking chunk_sectors was a bug because
   chunk_sectors must reflect the most limited of all devices in the
   IO stack.
2) DM targets that set max_io_len but that do _not_ provide an
   .iterate_devices method no longer had there IO split properly.

And commit 5091cdec56 ("dm: change max_io_len() to use
blk_max_size_offset()") also caused a regression where DM no longer
supported varied (per target) IO splitting. The implication being the
potential for severely reduced performance for IO stacks that use a DM
target like dm-cache to hide performance limitations of a slower
device (e.g. one that requires 4K IO splitting).

Coming full circle: Fix all these issues by discontinuing stacking
chunk_sectors up using ti->max_io_len in dm_calculate_queue_limits(),
add optional chunk_sectors override argument to blk_max_size_offset()
and update DM's max_io_len() to pass ti->max_io_len to its
blk_max_size_offset() call.

Passing in an optional chunk_sectors override to blk_max_size_offset()
allows for code reuse of block's centralized calculation for max IO
size based on provided offset and split boundary.

Fixes: 882ec4e609 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting")
Fixes: 5091cdec56 ("dm: change max_io_len() to use blk_max_size_offset()")
Cc: stable@vger.kernel.org
Reported-by: John Dorminy <jdorminy@redhat.com>
Reported-by: Bruce Johnston <bjohnsto@redhat.com>
Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: John Dorminy <jdorminy@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
2020-12-04 14:53:15 -05:00
Christoph Hellwig
a54895fa05 block: remove the request_queue to argument request based tracepoints
The request_queue can trivially be derived from the request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04 09:42:00 -07:00
Christoph Hellwig
1c02fca620 block: remove the request_queue argument to the block_bio_remap tracepoint
The request_queue can trivially be derived from the bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04 09:42:00 -07:00
Christoph Hellwig
eb6f7f7cd3 block: remove the request_queue argument to the block_split tracepoint
The request_queue can trivially be derived from the bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04 09:42:00 -07:00
Christoph Hellwig
977115c0f6 block: stop using bdget_disk for partition 0
We can just dereference the point in struct gendisk instead.  Also
remove the now unused export.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01 14:53:40 -07:00
Christoph Hellwig
8446fe9255 block: switch partition lookup to use struct block_device
Use struct block_device to lookup partitions on a disk.  This removes
all usage of struct hd_struct from the I/O path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Coly Li <colyli@suse.de>			[bcache]
Acked-by: Chao Yu <yuchao0@huawei.com>			[f2fs]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01 14:53:40 -07:00
Christoph Hellwig
cb8432d650 block: allocate struct hd_struct as part of struct bdev_inode
Allocate hd_struct together with struct block_device to pre-load
the lifetime rule changes in preparation of merging the two structures.

Note that part0 was previously embedded into struct gendisk, but is
a separate allocation now, and already points to the block_device instead
of the hd_struct.  The lifetime of struct gendisk is still controlled by
the struct device embedded in the part0 hd_struct.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01 14:53:40 -07:00
Christoph Hellwig
a782483cc1 block: remove the nr_sects field in struct hd_struct
Now that the hd_struct always has a block device attached to it, there is
no need for having two size field that just get out of sync.

Additionally the field in hd_struct did not use proper serialization,
possibly allowing for torn writes.  By only using the block_device field
this problem also gets fixed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Coly Li <colyli@suse.de>			[bcache]
Acked-by: Chao Yu <yuchao0@huawei.com>			[f2fs]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01 14:53:40 -07:00
Christoph Hellwig
4e7b5671c6 block: remove i_bdev
Switch the block device lookup interfaces to directly work with a dev_t
so that struct block_device references are only acquired by the
blkdev_get variants (and the blk-cgroup special case).  This means that
we now don't need an extra reference in the inode and can generally
simplify handling of struct block_device to keep the lookups contained
in the core block layer code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Coly Li <colyli@suse.de>		[bcache]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01 14:53:39 -07:00
Christoph Hellwig
8d65269fe8 block: add a bdev_kobj helper
Add a little helper to find the kobject for a struct block_device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Coly Li <colyli@suse.de>		[bcache]
Acked-by: David Sterba <dsterba@suse.com>	[btrfs]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01 14:53:39 -07:00
Christoph Hellwig
b0519b5423 dm: remove the block_device reference in struct mapped_device
Get rid of the long-lasting struct block_device reference in
struct mapped_device.  The only remaining user is the freeze code,
where we can trivially look up the block device at freeze time
and release the reference at thaw time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01 14:53:39 -07:00
Christoph Hellwig
47d951023a dm: simplify flush_bio initialization in __send_empty_flush
We don't really need the struct block_device to initialize a bio.  So
switch from using bio_set_dev to manually setting up bi_disk (bi_partno
will always be zero and has been cleared by bio_init already).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01 14:53:39 -07:00