Commit Graph

60814 Commits

Author SHA1 Message Date
Qu Wenruo
1200b51f57 btrfs: remove the incorrect comment on RO fs when btrfs_run_delalloc_range() fails
At the context of btrfs_run_delalloc_range(), we haven't started/joined
a transaction, thus even something went wrong, we can't and won't abort
transaction, thus no way to make the fs RO.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:59 +02:00
Qu Wenruo
480b9b4d84 btrfs: extent-tree: Add trace events for space info numbers update
Add trace event for update_bytes_pinned() and update_bytes_may_use() to
detect underflow better.

The output would be something like (only showing data part):

  ## Buffered write start, 16K total ##
  2255.954 xfs_io/860 btrfs:update_bytes_may_use:(nil)U: type=DATA old=0 diff=4096
  2257.169 sudo/860 btrfs:update_bytes_may_use:(nil)U: type=DATA old=4096 diff=4096
  2257.346 sudo/860 btrfs:update_bytes_may_use:(nil)U: type=DATA old=8192 diff=4096
  2257.542 sudo/860 btrfs:update_bytes_may_use:(nil)U: type=DATA old=12288 diff=4096

  ## Delalloc start ##
  3727.853 kworker/u8:3-e/700 btrfs:update_bytes_may_use:(nil)U: type=DATA old=16384 diff=-16384

  ## Space cache update ##
  3733.132 sudo/862 btrfs:update_bytes_may_use:(nil)U: type=DATA old=0 diff=65536
  3733.169 sudo/862 btrfs:update_bytes_may_use:(nil)U: type=DATA old=65536 diff=-65536
  3739.868 sudo/862 btrfs:update_bytes_may_use:(nil)U: type=DATA old=0 diff=65536
  3739.891 sudo/862 btrfs:update_bytes_may_use:(nil)U: type=DATA old=65536 diff=-65536

These two trace events will allow bcc tool to probe btrfs_space_info
changes and detect underflow with more details (e.g. backtrace for each
update).

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:58 +02:00
Qu Wenruo
0185f364cb btrfs: extent-tree: Add lockdep assert when updating space info
Just add a safe net for btrfs_space_info member updating.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:58 +02:00
David Sterba
cff8267228 btrfs: read number of data stripes from map only once
There are several places that call nr_data_stripes, but this value does
not change.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:58 +02:00
David Sterba
72ad813157 btrfs: constify map parameter for nr_parity_stripes and nr_data_stripes
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:58 +02:00
David Sterba
158da513b1 btrfs: refactor helper for bg flags to name conversion
The helper lacks the btrfs_ prefix and the parameter is the raw
blockgroup type, so none of the callers has to do the flags -> index
conversion.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:58 +02:00
David Sterba
e3ecdb3fde btrfs: factor out devs_max setting in __btrfs_alloc_chunk
Merge the repeated code before the if-else block.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:57 +02:00
David Sterba
8c3e3582a4 btrfs: use u8 for raid_array members
The raid_attr table is now 7 * 56 = 392 bytes long, consisting of just
small numbers so we don't have to use ints. New size is 7 * 32 = 224,
saving 3 cachelines.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:57 +02:00
David Sterba
946c9256c6 btrfs: factor out helper for counting data stripes
Factor the sequence of ifs to a helper, the 'data stripes' here means
the number of stripes without redundancy and parity.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:57 +02:00
David Sterba
44b28adafd btrfs: use raid_attr table for btrfs_bg_type_to_factor
The factor is the number of copies.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:57 +02:00
David Sterba
6079e12cdb btrfs: use raid_attr table to find profiles for integrity lowering
Replace open coded list of the profiles by selecting them from the
raid_attr table. The criteria are now more explicit, we need profiles
that have more than 1 copy of the data or can reconstruct the data with
a missing device.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:57 +02:00
David Sterba
081db89b13 btrfs: use raid_attr to get allowed profiles for balance conversion
Iterate over the table and gather all allowed profiles for a given
number of devices, instead of open coding.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:56 +02:00
David Sterba
fc9a2ac77c btrfs: use raid_attr in btrfs_chunk_max_errors
The number of tolerated failures is stored in the raid_attr table, use
it.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:56 +02:00
David Sterba
9fa02ac75b btrfs: use raid_attr table in get_profile_num_devs
The dev_max constraints are defined in the raid_attr table, use it
instead of open-coding it.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:56 +02:00
David Sterba
c8bf1b6703 btrfs: remove mapping tree structures indirection
fs_info::mapping_tree is the physical<->logical mapping tree and uses
the same underlying structure as extents, but is embedded to another
structure. There are no other members and this indirection is useless.
No functional change.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:56 +02:00
David Sterba
49cc180ca9 btrfs: raid56: allow the exact minimum number of devices for balance convert
The minimum number of devices for RAID5 is 2, though this is only a bit
expensive RAID1, and for RAID6 it's 3, which is a triple copy that works
only 3 devices.

mkfs.btrfs allows that and mounting such filesystem also works, so the
conversion via balance filters is inconsistent with the others and we
should not prevent it.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:56 +02:00
David Sterba
0ee5f8ae08 btrfs: fix minimum number of chunk errors for DUP
The list of profiles in btrfs_chunk_max_errors lists DUP as a profile
DUP able to tolerate 1 device missing. Though this profile is special
with 2 copies, it still needs the device, unlike the others.

Looking at the history of changes, thre's no clear reason why DUP is
there, functions were refactored and blocks of code merged to one
helper.

d20983b40e Btrfs: fix writing data into the seed filesystem
  - factor code to a helper

de11cc12df Btrfs: don't pre-allocate btrfs bio
  - unrelated change, DUP still in the list with max errors 1

a236aed14c Btrfs: Deal with failed writes in mirrored configurations
  - introduced the max errors, leaves DUP and RAID1 in the same group

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:55 +02:00
Liu Bo
be9b8dfa9c Btrfs: remove unused variables in __btrfs_unlink_inode
This code was first introduced in 5f39d397df ("Btrfs: Create
extent_buffer interface for large blocksizes") and the function was
named btrfs_unlink_trans. It later got renamed to __btrfs_unlink_inode
and finally commit 16cdcec736 ("btrfs: implement delayed inode items
operation") changed the way inodes are deleted and obviated the need for
those two members.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ replace changelog by Nikolay's version ]
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:55 +02:00
Goldwyn Rodrigues
cebf05ca65 btrfs: Remove unused variable mode in btrfs_mount
This is a leftover from 312c89fbca ("btrfs: cleanup btrfs_mount()
using btrfs_mount_root()"), the mode was used for opening devices that's
not done here anymore.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:55 +02:00
Su Yue
8f63a84051 btrfs: switch order of unlocks of space_info and bg in do_trimming()
In function do_trimming(), block_group->lock should be unlocked first,
as the locks should be released in the reverse order. This does not
cause problems but should follow the best practices.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:55 +02:00
Qu Wenruo
4c094c33c9 btrfs: tree-checker: Check if the file extent end overflows
Under certain conditions, we could have strange file extent item in log
tree like:

  item 18 key (69599 108 397312) itemoff 15208 itemsize 53
	extent data disk bytenr 0 nr 0
	extent data offset 0 nr 18446744073709547520 ram 18446744073709547520

The num_bytes + ram_bytes overflow 64 bit type.

For num_bytes part, we can detect such overflow along with file offset
(key->offset), as file_offset + num_bytes should never go beyond u64.

For ram_bytes part, it's about the decompressed size of the extent, not
directly related to the size.
In theory it is OK to have a large value, and put extra limitation
on RAM bytes may cause unexpected false alerts.

So in tree-checker, we only check if the file offset and num bytes
overflow.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:55 +02:00
Nikolay Borisov
2ed95d2d59 btrfs: Remove redundant assignment of tgt_device->commit_total_bytes
This is already done in btrfs_init_dev_replace_tgtdev which is the first
phase of device replace, called before doing scrub. During that time
exclusive lock is held. Additionally btrfs_fs_device::commit_total_bytes
is always set based on the size of the underlying block device which
shouldn't change once set. This makes the 2nd assignment of the variable
in the finishing phase redundant.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:55 +02:00
Nikolay Borisov
f232ab04f6 btrfs: Explicitly reserve space for devreplace item
Part of device replace involves writing an item to the device root
containing information about pending replace operations. Currently space
for this item is not being explicitly reserved so this works thanks to
presence of global reserve. While not fatal it's not a good practice.
Let's be explicit about space requirement of device replace and reserve
space when starting the transaction.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:54 +02:00
Nikolay Borisov
fa19452a40 btrfs: Streamline replace sem unlock in btrfs_dev_replace_start
There are only 2 branches which goto leave label with need_unlock set
to true. Essentially need_unlock is used as a substitute for directly
calling up_write. Since the branches needing this are only 2 and their
context is not that big it's more clear to just call up_write where
required. No functional changes.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:54 +02:00
Nikolay Borisov
e1e0eb43ce btrfs: Ensure btrfs_init_dev_replace_tgtdev sees up to date values
btrfs_init_dev_replace_tgtdev reads certain values from the source
device (such as commit_total_bytes) which are updated during transaction
commit. Currently this function is called before committing any pending
transaction, leading to possibly reading outdated values.

Fix this by moving the function below the transaction commit, at this
point the EXCL_OP bit it set hence once transaction is complete the
total size of the device cannot be changed (it's usually changed by
resize/remove ops which are blocked).

Fixes: 9e271ae27e ("Btrfs: kernel operation should come after user input has been verified")
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:54 +02:00
Nikolay Borisov
419684b2c2 btrfs: dev-replace: Remove impossible WARN_ON
This WARN_ON can never trigger because src_device cannot be null.
btrfs_find_device_by_devspec always returns either an error or a valid
pointer to the device. Just remove it.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:54 +02:00
Nikolay Borisov
b0d9e1ea17 btrfs: Reduce critical section in btrfs_init_dev_replace_tgtdev
There is no point in holding btrfs_fs_devices::device_list_mutex
while initialising fields of the not-yet-published device. Instead,
hold the mutex only when the newly initialised device is being
published. I think holding device_list_mutex here is redundant
altogether, because at this point BTRFS_FS_EXCL_OP is set which
prevents device removal/addition/balance/resize to occur.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:54 +02:00
Nikolay Borisov
ddb9378469 btrfs: Don't opencode sync_blockdev in btrfs_init_dev_replace_tgtdev
Using sync_blockdev makes it plain obvious what's happening. No
functional changes.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:53 +02:00
David Sterba
5911c8fe05 btrfs: fiemap: preallocate ulists for btrfs_check_shared
btrfs_check_shared looks up parents of a given extent and uses ulists
for that. These are allocated and freed repeatedly. Preallocation in the
caller will avoid the overhead and also allow us to use the GFP_KERNEL
as it is happens before the extent locks are taken.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:53 +02:00
David Sterba
9b4e675a99 btrfs: detect fast implementation of crc32c on all architectures
Currently, there's only check for fast crc32c implementation on X86,
based on the CPU flags. This is used to decide if checksumming should be
offloaded to worker threads or can be calculated by the caller.

As there are more architectures that implement a faster version of
crc32c (ARM, SPARC, s390, MIPS, PowerPC), also there are specialized hw
cards.

The detection is based on driver name, all generic C implementations
contain 'generic', while the specialized versions do not. Alternatively
the priority could be used, but this is not currently provided by the
crypto API.

The flag is set per-filesystem at mount time and used for the offloading
decisions.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:53 +02:00
Qu Wenruo
78192442d3 btrfs: extent-tree: Refactor add_pinned_bytes() to add|sub_pinned_bytes()
Instead of using @sign to determine whether we're adding or subtracting.
Even it only has 3 callers, it's still (and in fact already caused
problem in the past) confusing to use.

Refactor add_pinned_bytes() to add_pinned_bytes() and sub_pinned_bytes()
to explicitly show what we're doing.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:34:53 +02:00
Christoph Hellwig
73d30d4874 xfs: remove XFS_TRANS_NOFS
Instead of a magic flag for xfs_trans_alloc, just ensure all callers
that can't relclaim through the file system use memalloc_nofs_save to
set the per-task nofs flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-30 09:05:17 -07:00
Christoph Hellwig
fe64e0d26b xfs: simplify xfs_ioend_can_merge
Compare the block layer status directly instead of converting it to
an errno first.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-30 09:05:17 -07:00
Christoph Hellwig
7dbae9fbde xfs: allow merging ioends over append boundaries
There is no real problem merging ioends that go beyond i_size into an
ioend that doesn't.  We just need to move the append transaction to the
base ioend.  Also use the opportunity to use a real error code instead
of the magic 1 to cancel the transactions, and write a comment
explaining the scheme.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-30 09:05:17 -07:00
Christoph Hellwig
0290d9c1e5 xfs: fix a comment typo in xfs_submit_ioend
The fail argument is long gone, update the comment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-30 09:05:17 -07:00
Christoph Hellwig
1fdafce55c xfs: remove the unused xfs_count_page_state declaration
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-30 09:05:17 -07:00
Christoph Hellwig
b620743077 block: never take page references for ITER_BVEC
If we pass pages through an iov_iter we always already have a reference
in the caller.  Thus remove the ITER_BVEC_FLAG_NO_REF and don't take
reference to pages by default for bvec backed iov_iters.

Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 09:47:32 -06:00
Christoph Hellwig
d7c8aa85ed direct-io: use bio_release_pages in dio_bio_complete
Use bio_release_pages instead of duplicating it.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 09:47:31 -06:00
Christoph Hellwig
9fec4a2188 block_dev: use bio_release_pages in bio_unmap_user
Use bio_release_pages instead of duplicating it.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 09:47:31 -06:00
Christoph Hellwig
57dfe3ce10 block_dev: use bio_release_pages in blkdev_bio_end_io
Use bio_release_pages instead of duplicating it.

Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 09:47:31 -06:00
Christoph Hellwig
147a60538d iomap: use bio_release_pages in iomap_dio_bio_end_io
Use bio_release_pages instead of duplicating it.

Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 09:47:31 -06:00
Linus Torvalds
01305db842 Merge tag 'xarray-5.2-rc6' of git://git.infradead.org/users/willy/linux-dax
Pull XArray fixes from Matthew Wilcox:

 - Account XArray nodes for the page cache to the appropriate cgroup
   (Johannes Weiner)

 - Fix idr_get_next() when called under the RCU lock (Matthew Wilcox)

 - Add a test for xa_insert() (Matthew Wilcox)

* tag 'xarray-5.2-rc6' of git://git.infradead.org/users/willy/linux-dax:
  XArray tests: Add check_insert
  idr: Fix idr_get_next race with idr_remove
  mm: fix page cache convergence regression
2019-06-29 17:14:57 +08:00
Linus Torvalds
0839c53762 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "15 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  linux/kernel.h: fix overflow for DIV_ROUND_UP_ULL
  mm, swap: fix THP swap out
  fork,memcg: alloc_thread_stack_node needs to set tsk->stack
  MAINTAINERS: add CLANG/LLVM BUILD SUPPORT info
  mm/vmalloc.c: avoid bogus -Wmaybe-uninitialized warning
  mm/page_idle.c: fix oops because end_pfn is larger than max_pfn
  initramfs: fix populate_initrd_image() section mismatch
  mm/oom_kill.c: fix uninitialized oc->constraint
  mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge
  mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails
  signal: remove the wrong signal_pending() check in restore_user_sigmask()
  fs/binfmt_flat.c: make load_flat_shared_library() work
  mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemask
  fs/proc/array.c: allow reporting eip/esp for all coredumping threads
  mm/dev_pfn: exclude MEMORY_DEVICE_PRIVATE while computing virtual address
2019-06-29 17:11:01 +08:00
Linus Torvalds
c949c30b26 Merge tag 'nfs-for-5.2-4' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull two more NFS client fixes from Anna Schumaker:
 "These are both stable fixes.

  One to calculate the correct client message length in the case of
  partial transmissions. And the other to set the proper TCP timeout for
  flexfiles"

* tag 'nfs-for-5.2-4' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS/flexfiles: Use the correct TCP timeout for flexfiles I/O
  SUNRPC: Fix up calculation of client message length
2019-06-29 17:02:22 +08:00
Linus Torvalds
43251dbd6a Merge tag 'ceph-for-5.2-rc7' of git://github.com/ceph/ceph-client
Pull ceph fix from Ilya Dryomov:
 "A small fix for a potential -rc1 regression from Jeff"

* tag 'ceph-for-5.2-rc7' of git://github.com/ceph/ceph-client:
  ceph: fix ceph_mdsc_build_path to not stop on first component
2019-06-29 17:01:02 +08:00
Linus Torvalds
9dda12b6fa Merge tag 'for-linus-20190628' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Just two small fixes.

  One from Paolo, fixing a silly mistake in BFQ. The other one is from
  me, ensuring that we have ->file cleared in the io_uring request a bit
  earlier. That avoids a use-before-free, if we encounter an error
  before ->file is assigned"

* tag 'for-linus-20190628' of git://git.kernel.dk/linux-block:
  block, bfq: fix operator in BFQQ_TOTALLY_SEEKY
  io_uring: ensure req->file is cleared on allocation
2019-06-29 16:58:35 +08:00
Oleg Nesterov
97abc889ee signal: remove the wrong signal_pending() check in restore_user_sigmask()
This is the minimal fix for stable, I'll send cleanups later.

Commit 854a6ed568 ("signal: Add restore_user_sigmask()") introduced
the visible change which breaks user-space: a signal temporary unblocked
by set_user_sigmask() can be delivered even if the caller returns
success or timeout.

Change restore_user_sigmask() to accept the additional "interrupted"
argument which should be used instead of signal_pending() check, and
update the callers.

Eric said:

: For clarity.  I don't think this is required by posix, or fundamentally to
: remove the races in select.  It is what linux has always done and we have
: applications who care so I agree this fix is needed.
:
: Further in any case where the semantic change that this patch rolls back
: (aka where allowing a signal to be delivered and the select like call to
: complete) would be advantage we can do as well if not better by using
: signalfd.
:
: Michael is there any chance we can get this guarantee of the linux
: implementation of pselect and friends clearly documented.  The guarantee
: that if the system call completes successfully we are guaranteed that no
: signal that is unblocked by using sigmask will be delivered?

Link: http://lkml.kernel.org/r/20190604134117.GA29963@redhat.com
Fixes: 854a6ed568 ("signal: Add restore_user_sigmask()")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Eric Wong <e@80x24.org>
Tested-by: Eric Wong <e@80x24.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: <stable@vger.kernel.org>	[5.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29 16:43:45 +08:00
Jann Horn
867bfa4a5f fs/binfmt_flat.c: make load_flat_shared_library() work
load_flat_shared_library() is broken: It only calls load_flat_file() if
prepare_binprm() returns zero, but prepare_binprm() returns the number of
bytes read - so this only happens if the file is empty.

Instead, call into load_flat_file() if the number of bytes read is
non-negative. (Even if the number of bytes is zero - in that case,
load_flat_file() will see nullbytes and return a nice -ENOEXEC.)

In addition, remove the code related to bprm creds and stop using
prepare_binprm() - this code is loading a library, not a main executable,
and it only actually uses the members "buf", "file" and "filename" of the
linux_binprm struct. Instead, call kernel_read() directly.

Link: http://lkml.kernel.org/r/20190524201817.16509-1-jannh@google.com
Fixes: 287980e49f ("remove lots of IS_ERR_VALUE abuses")
Signed-off-by: Jann Horn <jannh@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29 16:43:45 +08:00
John Ogness
cb8f381f16 fs/proc/array.c: allow reporting eip/esp for all coredumping threads
0a1eb2d474 ("fs/proc: Stop reporting eip and esp in /proc/PID/stat")
stopped reporting eip/esp and fd7d56270b ("fs/proc: Report eip/esp in
/prod/PID/stat for coredumping") reintroduced the feature to fix a
regression with userspace core dump handlers (such as minicoredumper).

Because PF_DUMPCORE is only set for the primary thread, this didn't fix
the original problem for secondary threads.  Allow reporting the eip/esp
for all threads by checking for PF_EXITING as well.  This is set for all
the other threads when they are killed.  coredump_wait() waits for all the
tasks to become inactive before proceeding to invoke a core dumper.

Link: http://lkml.kernel.org/r/87y32p7i7a.fsf@linutronix.de
Link: http://lkml.kernel.org/r/20190522161614.628-1-jlu@pengutronix.de
Fixes: fd7d56270b ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reported-by: Jan Luebbe <jlu@pengutronix.de>
Tested-by: Jan Luebbe <jlu@pengutronix.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29 16:43:44 +08:00
Christoph Hellwig
89b171acb2 xfs: fix iclog allocation size
Properly allocate the space for the bio_vecs instead of just one byte
per bio_vec.

Fixes: 79b54d9bfc ("xfs: use bios directly to write log buffers")
Reported-by: syzbot+b75afdbe271a0d7ac4f6@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 21:02:45 -07:00