Fix the length of holes reported at the end of a file: the length is
relative to the beginning of the extent, not the seek position which is
rounded down to the filesystem block size.
This bug went unnoticed for some time, but is now caught by the
following assertion in iomap_iter_done():
WARN_ON_ONCE(iter->iomap.offset + iter->iomap.length <= iter->pos)
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
So far, for buffered writes, we were taking the inode glock in
gfs2_iomap_begin and dropping it in gfs2_iomap_end with the intention of
not holding the inode glock while iomap_write_actor faults in user
pages. It turns out that iomap_write_actor is called inside iomap_begin
... iomap_end, so the user pages were still faulted in while holding the
inode glock and the locking code in iomap_begin / iomap_end was
completely pointless.
Move the locking into gfs2_file_buffered_write instead. We'll take care
of the potential deadlocks due to faulting in user pages while holding a
glock in a subsequent patch.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
These aren't actually used by the only instance implementing the methods.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Split __gfs2_unstuff_inode off from gfs2_unstuff_dinode and clean up the
code a little. All remaining callers now pass NULL as the page argument
of gfs2_unstuff_dinode, so remove that argument.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Building the kernel with W=1 results in a number of kernel-doc warnings
like incorrect function names and parameter descriptions. Fix those,
mostly by adding missing parameter descriptions, removing left-over
descriptions, and demoting some less important kernel-doc comments into
regular comments.
Originally proposed by Lee Jones; improved and combined into a single
patch by Andreas.
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Instead of only supporting GFS2_METATYPE_DI and GFS2_METATYPE_IN blocks,
make the block type a parameter of gfs2_meta_indirect_buffer and rename
the function to gfs2_meta_buffer.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Convert gfs2_extent_map to iomap and split it into gfs2_get_extent and
gfs2_alloc_extent. Instead of hardcoding the extent size, pass it in
via the extlen parameter.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Rename the current gfs2_iomap_get and gfs2_iomap_alloc functions to __*.
Add a new gfs2_iomap_get helper that doesn't expose struct metapath.
Rename gfs2_iomap_get_alloc to gfs2_iomap_alloc. Use the new helpers
where they make sense.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
This patch takes advantage of the new glock holder sharing feature for
resource groups. We have already introduced local resource group
locking in a previous patch, so competing accesses of local processes
are already under control.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
When starting an iomap write, gfs2_quota_lock_check -> gfs2_quota_lock
-> gfs2_quota_hold is called from gfs2_iomap_begin. At the end of the
write, before unlocking the quotas, punch_hole -> gfs2_quota_hold can be
called again in gfs2_iomap_end, which is incorrect and leads to a failed
assertion. Instead, move the call to gfs2_quota_unlock before the call
to punch_hole to fix that.
Fixes: 64bc06bb32 ("gfs2: iomap buffered write support")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
GFS2 uses struct gfs2_rbm to represent a filesystem block number as a
bit position within a resource group. This representation is used in
the bitmap manipulation code to prevent excessive conversions between
block numbers and bit positions, but also in struct gfs2_blkreserv which
is part of struct gfs2_inode, to mark the start of a reservation. In
the inode, the bit position representation makes less sense: first, the
start position is used as a block number about as often as a bit
position; second, the bit position representation makes the code
unnecessarily complicated and difficult to read.
Therefore, change struct gfs2_blkreserv to represent the start of a
reservation as a block number instead of a bit position. (This requires
keeping track of the resource group in gfs2_blkreserv separately.) With
that change, various things can be slightly simplified, and struct
gfs2_rbm can be moved to rgrp.c.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
This reverts commit b2a846dbef.
That commit changed the behavior of function gfs2_block_map to return
-ENODATA in cases where a hole (IOMAP_HOLE) is encountered and create is
false. While that fixed the intended problem for jdata, it also broke
other callers of gfs2_block_map such as some jdata block reads. Before
the patch, an encountered hole would be skipped and the buffer seen as
unmapped by the caller. The patch changed the behavior to return
-ENODATA, which is interpreted as an error by the caller.
The -ENODATA return code should be restricted to the specific case where
jdata holes are encountered during ail1 writes. That will be done in a
later patch.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
When flushing out its ail1 list, gfs2_write_jdata_page calls function
__block_write_full_page passing in function gfs2_get_block_noalloc.
But there was a problem when a process wrote to a jdata file, then
truncated it or punched a hole, leaving references to the blocks within
the new hole in its ail list, which are to be written to the journal log.
In writing them to the journal, after calling gfs2_block_map, function
gfs2_get_block_noalloc determined that the (hole-punched) block was not
mapped, so it returned -EIO to generic_writepages, which passed it back
to gfs2_ail1_start_one. This, in turn, performed a withdraw, assuming
there was a real IO error writing to the journal.
This might be a valid error when writing metadata to the journal, but for
journaled data writes, it does not warrant a withdraw.
This patch adds a check to function gfs2_block_map that makes an exception
for journaled data writes that correspond to jdata holes: If the iomap
get function returns a block type of IOMAP_HOLE, it instead returns
-ENODATA which does not cause the withdraw. Other errors are returned as
before.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function gfs2_block_map had a lot of redundancy between its create and
no_create paths. This patch simplifies the code to eliminate the redundancy.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Switch to using the iomap readpage and writepage helpers for all I/O in
the ordered and writeback modes, and thus eliminate using buffer_heads
for I/O in these cases. The journaled data mode is left untouched.
(Andreas Gruenbacher: In gfs2_unstuffer_page, switch from mark_buffer_dirty
to set_page_dirty instead of accidentally leaving the page / buffer clean.)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
- Make sure transactions won't be started recursively in gfs2_block_zero_range.
(Bug introduced in 5.4 when switching to iomap_zero_range.)
- Fix a glock holder refcount leak introduced in the iopen glock locking
scheme rework merged in 5.8.
- A few other small improvements (debugging, stack usage, comment fixes).
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAl8xkmAUHGFncnVlbmJh
QHJlZGhhdC5jb20ACgkQ1b+f6wMTZToB0w/9FANsxraTsNxxZUQuP2uyPiB/TRxY
Vutp63Pe+eE0GNxgpfLWT+QgO99BQlGZ8WmDJ49ex0dZXZY4lv4L7JoGTV5VwAyh
CFqRcWQkAb7zVvOxAeKfyXT0CwhS7O0avvlubSpEHfQgPkGQa3q3vu18vg1jHfB3
OsFjZGCoyiJKmNFX005euhebnStabaGeP0+8uSz/5xHS+NQUbB3jPlgycUMvTVBe
Lhl052rVQzlULNUCkehKnaBxzN0/8K55iFOaOSd2kzZ5BMfRabvskZEyJFf4T75p
VJyElekk5mGV0k/FpGL03Me1oqMddDLpAFCGh9Hw51o02i8wZ3RderfQHfUmojku
5/lLEcEJV+oC7gJR7IsGRme71De9y+uLnKvod+ayBw+9us1ZbEm4zJY7hLLGyq03
piuFEAo0UGUmxGn8/s1RBT0lMYKjEDGIGjockaz/XzEQMSip27JZq4ATnA5CHaSy
8q4PFflxEaXWJtPEjiY7DW1xQhYW/3cEDfd7kjSBw/GUxk1ILXRMnzCOsjrYuWfH
ykH0gzVq7/uJln/otYa39RBdt/GQXJCzhvODeizF6B36seVX9oBBEc6TtohxREwt
aIpupz6iGQ6GXddg1Sq6p2w3xyW8e8bG4YislEyiCyxpdOjvcYdM6cVxF7UBgtgu
fwEbjgGO34pAO1E=
=+Nig
-----END PGP SIGNATURE-----
Merge tag 'gfs2-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Make sure transactions won't be started recursively in
gfs2_block_zero_range (bug introduced in 5.4 when switching to
iomap_zero_range)
- Fix a glock holder refcount leak introduced in the iopen glock
locking scheme rework merged in 5.8.
- A few other small improvements (debugging, stack usage, comment
fixes).
* tag 'gfs2-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: When gfs2_dirty_inode gets a glock error, dump the glock
gfs2: Never call gfs2_block_zero_range with an open transaction
gfs2: print details on transactions that aren't properly ended
gfs2: Fix inaccurate comment
fs: Fix typo in comment
gfs2: Fix refcount leak in gfs2_glock_poke
gfs2: Pass glock holder to gfs2_file_direct_{read,write}
gfs2: Add some flags missing from glock output
Before this patch, some functions started transactions then they called
gfs2_block_zero_range. However, gfs2_block_zero_range, like writes, can
start transactions, which results in a recursive transaction error.
For example:
do_shrink
trunc_start
gfs2_trans_begin <------------------------------------------------
gfs2_block_zero_range
iomap_zero_range(inode, from, length, NULL, &gfs2_iomap_ops);
iomap_apply ... iomap_zero_range_actor
iomap_begin
gfs2_iomap_begin
gfs2_iomap_begin_write
actor (iomap_zero_range_actor)
iomap_zero
iomap_write_begin
gfs2_iomap_page_prepare
gfs2_trans_begin <------------------------
This patch reorders the callers of gfs2_block_zero_range so that they
only start their transactions after the call. It also adds a BUG_ON to
ensure this doesn't happen again.
Fixes: 2257e468a6 ("gfs2: implement gfs2_block_zero_range using iomap_zero_range")
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.
In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:
git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
xargs perl -pi -e \
's/\buninitialized_var\(([^\)]+)\)/\1/g;
s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'
drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.
No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.
[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/
Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>
Make sure we don't walk past the end of the metadata in gfs2_walk_metadata: the
inode holds fewer pointers than indirect blocks.
Slightly clean up gfs2_iomap_get.
Fixes: a27a0c9b6a ("gfs2: gfs2_walk_metadata fix")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Keeping reservations and quotas separate helps reviewing the code.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Before this patch, multiple users called gfs2_qa_alloc which allocated
a qadata structure to the inode, if quotas are turned on. Later, in
file close or evict, the structure was deleted with gfs2_qa_delete.
But there can be several competing processes who need access to the
structure. There were races between file close (release) and the others.
Thus, a release could delete the structure out from under a process
that relied upon its existence. For example, chown.
This patch changes the management of the qadata structures to be
a get/put scheme. Function gfs2_qa_alloc has been changed to gfs2_qa_get
and if the structure is allocated, the count essentially starts out at
1. Function gfs2_qa_delete has been renamed to gfs2_qa_put, and the
last guy to decrement the count to 0 frees the memory.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Before this patch, multiple callers called gfs2_rsqa_alloc to force
the existence of a reservations structure and a quota data structure
if needed. However, now the reservations are handled separately, so
the quota data is only the quota data. So we eliminate the one in
favor of just calling gfs2_qa_alloc directly.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Replace open-coded versions of list_first_entry and list_last_entry with those
functions.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Bob's extensive filesystem withdrawal and recovery testing:
- Don't write log headers after file system withdraw
- clean up iopen glock mess in gfs2_create_inode
- Close timing window with GLF_INVALIDATE_IN_PROGRESS
- Abort gfs2_freeze if io error is seen
- Don't loop forever in gfs2_freeze if withdrawn
- fix infinite loop in gfs2_ail1_flush on io error
- Introduce function gfs2_withdrawn
- fix glock reference problem in gfs2_trans_remove_revoke
Filesystems with a block size smaller than the page size:
- Fix end-of-file handling in gfs2_page_mkwrite
- Improve mmap write vs. punch_hole consistency
Other:
- Remove active journal side effect from gfs2_write_log_header
- Multi-block allocations in gfs2_page_mkwrite
Minor cleanups and coding style fixes:
- Remove duplicate call from gfs2_create_inode
- make gfs2_log_shutdown static
- make gfs2_fs_parameters static
- Some whitespace cleanups
- removed unnecessary semicolon
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJd6VQ1AAoJENW/n+sDE2U6tUsP/2Zd64dVA+2KzpfvklAUNpx/
TENbvRqGVvhofxuScY0oAvcg0KZN03K/L6ZCa71e7RI7T3VM/FG3rKNCkm084a9d
r+Rxcz+eSkce5qKVNva6CtQiczGAj27iNiho0Y9IgtzEZ5KEVJuY3cV8Z2qzHrQM
Rel2aVpUtaLhbFOj3jLGMt/HHSw8RTTzNqtJJCwRys/tVF1WPVqyNg4PD9a7zMT7
z96tqrlQQxFT4SGiZVJwQHFGuQZEnbr2ahNRmivmGtnNnawLxpEdFuFrSAsC73UB
wHO0Dq+7vYVTyQ7HugrqdkXxqyQr5ta06Pep7uj8ZvhoLWZvPuBf2SccpIn9Ufvo
9iLFY5Z9cHg6wpsW+YMG75Mz0A6WbPRIScVog0fxaKW+z0vMZOmsT6hT6llAlCzn
oj1igqVEOIBTS+4uMDIOJKvMixo4NTdgsLFQyftUxNHiCw5iGbqkV7ux31YjqX22
A830zv8lba44BsixGtuPEy/0Dnka7rMfRp0cflCzKESSLIdXtSjUSEtS6g0fJISS
qKNmnnkHpvjBGMG4lOqJihJdQ+2IiIMLWdNgxkvWgt6F7xjl3gcdREjJF0dmFsd+
GfDUkUZ/70T9UPsaTR2V0GBvEleq4PglALcet9Eela7tKOEziNf3L+prRjMr3909
y4uJIMH9/no1knKBxtkJ
=b1m/
-----END PGP SIGNATURE-----
Merge tag 'gfs2-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull GFS2 updates from Andreas Gruenbacher:
"Bob's extensive filesystem withdrawal and recovery testing:
- don't write log headers after file system withdraw
- clean up iopen glock mess in gfs2_create_inode
- close timing window with GLF_INVALIDATE_IN_PROGRESS
- abort gfs2_freeze if io error is seen
- don't loop forever in gfs2_freeze if withdrawn
- fix infinite loop in gfs2_ail1_flush on io error
- introduce function gfs2_withdrawn
- fix glock reference problem in gfs2_trans_remove_revoke
Filesystems with a block size smaller than the page size:
- fix end-of-file handling in gfs2_page_mkwrite
- improve mmap write vs. punch_hole consistency
Other:
- remove active journal side effect from gfs2_write_log_header
- multi-block allocations in gfs2_page_mkwrite
Minor cleanups and coding style fixes:
- remove duplicate call from gfs2_create_inode
- make gfs2_log_shutdown static
- make gfs2_fs_parameters static
- some whitespace cleanups
- removed unnecessary semicolon"
* tag 'gfs2-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Don't write log headers after file system withdraw
gfs2: Remove duplicate call from gfs2_create_inode
gfs2: clean up iopen glock mess in gfs2_create_inode
gfs2: Close timing window with GLF_INVALIDATE_IN_PROGRESS
gfs2: Abort gfs2_freeze if io error is seen
gfs2: Don't loop forever in gfs2_freeze if withdrawn
gfs2: fix infinite loop in gfs2_ail1_flush on io error
gfs2: Introduce function gfs2_withdrawn
gfs2: fix glock reference problem in gfs2_trans_remove_revoke
gfs2: make gfs2_log_shutdown static
gfs2: Remove active journal side effect from gfs2_write_log_header
gfs2: Fix end-of-file handling in gfs2_page_mkwrite
gfs2: Multi-block allocations in gfs2_page_mkwrite
gfs2: Improve mmap write vs. punch_hole consistency
gfs2: make gfs2_fs_parameters static
gfs2: Some whitespace cleanups
gfs2: removed unnecessary semicolon
When punching a hole in a file, use filemap_write_and_wait_range to
write back any dirty pages in the range of the hole. As a side effect,
if the hole isn't page aligned, this marks unaligned pages at the
beginning and the end of the hole read-only. This is required when the
block size is smaller than the page size: when those pages are written
to again after the hole punching, we must make sure that page_mkwrite is
called for those pages so that the page will be fully allocated and any
blocks turned into holes from the hole punching will be reallocated.
(If a page is writably mapped, page_mkwrite won't be called.)
Fixes xfstest generic/567.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The srcmap is used to identify where the read is to be performed from.
It is passed to ->iomap_begin, which can fill it in if we need to read
data for partially written blocks from a different location than the
write target. The srcmap is only supported for buffered writes so far.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
[hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as
srcmap if not set, adjust length down to srcmap end as well]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
In function sweep_bh_for_rgrps, which is a helper for punch_hole,
it uses variable buf_in_tr to keep track of when it needs to commit
pending block frees on a partial delete that overflows the
transaction created for the delete. The problem is that the
variable was initialized at the start of function sweep_bh_for_rgrps
but it was never cleared, even when starting a new transaction.
This patch reinitializes the variable when the transaction is
ended, so the next transaction starts out with it cleared.
Fixes: d552a2b9b3 ("GFS2: Non-recursive delete")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
On filesystems with a block size smaller than PAGE_SIZE, page_mkwrite is
called for each memory-mapped page before that page can be written to.
When such a memory-mapped file is truncated down to size x which is not
a multiple of the page size and then back to a larger size, the page
straddling size x can end up with a partial block mapping. In that
case, make sure to mark that page read-only so that page_mkwrite will be
called before the page can be written to the next time.
(There is no point in marking the page straddling size x read-only when
truncating down as writing to memory beyond the end of the file will
result in SIGBUS instead of growing the file.)
Fixes xfstests generic/029, generic/030 on filesystems with a block size
smaller than PAGE_SIZE.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
iomap handles all the nitty-gritty details of zeroing a file
range for us, so use the proper helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Add support for the IOMAP_ZERO iomap operation so that iomap_zero_range will
work as expected. In the IOMAP_ZERO case, the caller of iomap_zero_range is
responsible for taking an exclusive glock on the inode, so we need no
additional locking in gfs2_iomap_begin.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Following commit d0a22a4b03 ("gfs2: Fix iomap write page reclaim deadlock"),
gfs2_iomap_begin and gfs2_iomap_begin_write can be further cleaned up and the
split between those two functions can be improved.
With suggestions from Christoph Hellwig.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
It turns out that the current version of gfs2_metadata_walker suffers
from multiple problems that can cause gfs2_hole_size to report an
incorrect size. This will confuse fiemap as well as lseek with the
SEEK_DATA flag.
Fix that by changing gfs2_hole_walker to compute the metapath to the
first data block after the hole (if any), and compute the hole size
based on that.
Fixes xfstest generic/490.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Cc: stable@vger.kernel.org # v4.18+
With the recent iomap write page reclaim deadlock fix, it turns out that the
GLF_DIRTY flag isn't always set when it needs to be anymore: previously, this
happened as a side effect of always adding the inode buffer head to the current
transaction with gfs2_trans_add_meta, but this isn't happening consistently
anymore. Fix by removing an additional unnecessary gfs2_trans_add_meta call
and by setting the GLF_DIRTY flag in gfs2_iomap_end.
(The GLF_DIRTY flag causes inode_go_sync to flush the transaction log when
syncing out the glock of that inode. When the flag isn't set, inode_go_sync
will skip inodes, including ones with an i_state of I_DIRTY_PAGES, which will
lead to cluster incoherency.)
In addition, in gfs2_iomap_page_done, if the metadata has changed, mark the
inode as I_DIRTY_DATASYNC to have the inode added to the current transaction:
we don't expect metadata to change here, but let's err on the safe side.
Fixes: d0a22a4b03 ("gfs2: Fix iomap write page reclaim deadlock");
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
- An initial batch of obvious cleanups and fixes from Bob's
recovery patch queue.
- Two iomap conversion patches and some cleanups from Christoph
Hellwig.
- A cosmetic cleanup from Kefeng Wang (Huawei).
- Another minor fix and cleanup by me.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJdJmA1AAoJENW/n+sDE2U69jgQAJGXJj+iyT7kKS2vQE15W+ff
MdT9amtWokFBhLLz5fTI+OZ8segJPScsGuJuoZ8/5tW/3w3MMn9u96h+BD9Ww9On
ByRG1+zMkWPyAgI2RCDytP7WQ/enJrjFxYEEGNgwfkGLZ6aDnoZ4/8DlK8MQ+6SE
IeNFmV3jU8f4If5pLLZ352akTBhAOLC3InKv/DSHtq4QZqoxZimNQhqTuDW7uHKX
2YwE1+NC36dBfJz70dJqb6YPoUdEn4qklQPe6jj0mlMb38uXo08dKuE3atnRmbxQ
9cOwHlB1W5DiRb4Fg/aim+XOnrBFhPVT9kktKTwztaS/Rc6N8gUi9Est79Qz9vEK
2vDBammJ0/zB6+ogUyv4cVN9hcgOQInyo5yv5aJvhaIl/WrVUF11rdrj35a0vgeW
8oU0kD9h9jHPew1wCOPhS4138Qc9sDAqyeYIvAQ80W1VePw88kQ7xc9WwKyHmRlX
XjU1REZ+4P/nCycIht9L0ow1xpqHoutCmFVNhE/dbkUoJSMJyh2xNvwTPAYE0EZe
53oAtmBtYPhPDaghPpReHbVWq5F/OeUoRucLaugkkSZ96lbVAFQPr0TgP6GW1KS0
rL+Epa+eG1+8qeiA4HtKG7aqdHx9vQe43RGoGzknUmG4PTQHHRuz7rNUD0jQayoi
D/wEiuL///x10vb8cD0P
=5lRq
-----END PGP SIGNATURE-----
Merge tag 'gfs2-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
"Some relatively minor changes for gfs2:
- An initial batch of obvious cleanups and fixes from Bob's recovery
patch queue.
- Two iomap conversion patches and some cleanups from Christoph
Hellwig.
- A cosmetic cleanup from Kefeng Wang (Huawei).
- Another minor fix and cleanup by me"
* tag 'gfs2-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Remove unused gfs2_iomap_alloc argument
gfs2: don't use buffer_heads in gfs2_allocate_page_backing
gfs2: use iomap_bmap instead of generic_block_bmap
gfs2: mark stuffed_readpage static
gfs2: merge gfs2_writepage_common into gfs2_writepage
gfs2: merge gfs2_writeback_aops and gfs2_ordered_aops
gfs2: remove the unused gfs2_stuffed_write_end function
gfs2: use page_offset in gfs2_page_mkwrite
gfs2: replace more printk with calls to fs_info and friends
gfs2: dump fsid when dumping glock problems
gfs2: simplify gfs2_freeze by removing case
gfs2: Rename SDF_SHUTDOWN to SDF_WITHDRAWN
gfs2: Warn when a journal replay overwrites a rgrp with buffers
gfs2: log which portion of the journal is replayed
gfs2: eliminate tr_num_revoke_rm
gfs2: kthread and remount improvements
gfs2: Use IS_ERR_OR_NULL
gfs2: Clean up freeing struct gfs2_sbd
- Only mark inode dirty at the end of writing to a file (instead of once
for every page written).
- Fix for an accounting error in the page_done callback.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl0Y20cACgkQ+H93GTRK
tOtoog//VAJ91Fnr7A/HOQGxmfGZrpCAV+pdwUARS2ZwbrNmrg3aGsolPzSEPgKm
b6Zk75CPtUi9MnOAUI6d8404rv3pdhWm5rY7+QF02/JQZYUFt+uAVqJyZXxO6nDm
egNii0iYlS9WBZyDVPMskxaySD5j8+FbCxsLVak5LIk7kwxcMUdS4Zx/jfWv2XVa
DUls+QpcgrF+Bg8ItOnlmfBQOOwe8/vgKlzUm1PIp4Wdt8QroBMLs+BKu8ZJaOD9
AZ4RKi2wv2ctZ6yjhq5t4dQAfFYzgHzYMB7iAxIqog4O8XASYyIuTjq0HCJrv634
/FiDrHUaEqOdVacgXp8GMSGneDEejNiBBZPbeCd0J8YkIhskvcrZpF5gJL9PlcBa
TcXekTHJyhFrVOJTt36adAzuS1RIFPyX1QdZKQtOGCns3nxQjTyugNYpoWBUprFF
4BG/s4KCW4zAXFw7cnPtA2LrZr5N+nrSdCbq5kp2cN+tlhSKcmYEfXkrzsCqsPrz
v/wMzDDpPj68rWObl+8TFSItGBQ3Z4aiXiE1xyO3mUZvKWbZ171/XWYNX5y3Curd
NHybSYPBuw0RWWVBM+/gYHNHJcjK3chVzZVqRtTYIidAhH9vBzskFjNQ/DZp+vxs
qO8gyzE1GyiVPkgyKMhmEYNmbw3rCns4xsWm78oY9Wgb8eOUvx0=
=so9b
-----END PGP SIGNATURE-----
Merge tag 'iomap-5.3-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap updates from Darrick Wong:
"There are a few fixes for gfs2 but otherwise it's pretty quiet so far.
- Only mark inode dirty at the end of writing to a file (instead of
once for every page written).
- Fix for an accounting error in the page_done callback"
* tag 'iomap-5.3-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: fix page_done callback for short writes
fs: fold __generic_write_end back into generic_write_end
iomap: don't mark the inode dirty in iomap_write_end
Marking the inode dirty for each page copied into the page cache can be
very inefficient for file systems that use the VFS dirty inode tracking,
and is completely pointless for those that don't use the VFS dirty inode
tracking. So instead, only set an iomap flag when changing the in-core
inode size, and open code the rest of __generic_write_end.
Partially based on code from Christoph Hellwig.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
This patch replaces a few leftover printk errors with calls to
fs_info and similar, so that the file system having the error is
properly logged.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The pos and len arguments to the iomap page_prepare callback are not
block aligned, so we need to take that into account when computing the
number of blocks.
Fixes: d0a22a4b03 ("gfs2: Fix iomap write page reclaim deadlock")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Based on 1 normalized pattern(s):
this copyrighted material is made available to anyone wishing to use
modify copy or redistribute it subject to the terms and conditions
of the gnu general public license version 2
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 44 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since commit 64bc06bb32 ("gfs2: iomap buffered write support"), gfs2 is doing
buffered writes by starting a transaction in iomap_begin, writing a range of
pages, and ending that transaction in iomap_end. This approach suffers from
two problems:
(1) Any allocations necessary for the write are done in iomap_begin, so when
the data aren't journaled, there is no need for keeping the transaction open
until iomap_end.
(2) Transactions keep the gfs2 log flush lock held. When
iomap_file_buffered_write calls balance_dirty_pages, this can end up calling
gfs2_write_inode, which will try to flush the log. This requires taking the
log flush lock which is already held, resulting in a deadlock.
Fix both of these issues by not keeping transactions open from iomap_begin to
iomap_end. Instead, start a small transaction in page_prepare and end it in
page_done when necessary.
Reported-by: Edwin Török <edvin.torok@citrix.com>
Fixes: 64bc06bb32 ("gfs2: iomap buffered write support")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Rename gfs2_trans_add_unrevoke to gfs2_trans_remove_revoke: there is no
such thing as an "unrevoke" object; all this function does is remove
existing revoke objects plus some bookkeeping.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
This patch fixes regressions in 588bff95c9.
Due to that patch, function clean_journal was setting the value of
sd_log_flush_head, but that's only valid if it is replaying the node's
own journal. If it's replaying another node's journal, that's completely
wrong and will lead to multiple problems. This patch tries to clean up
the mess by passing the value of the logical journal block number into
gfs2_write_log_header so the function can treat non-owned journals
generically. For the local journal, the journal extent map is used for
best performance. For other nodes from other journals, new function
gfs2_lblk_to_dblk is called to figure it out using gfs2_iomap_get.
This patch also tries to establish more consistency when passing journal
block parameters by changing several unsigned int types to a consistent
u32.
Fixes: 588bff95c9 ("GFS2: Reduce code redundancy writing log headers")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Hi Linus,
This is my very first pull-request. I've been working full-time as
a kernel developer for more than two years now. During this time I've
been fixing bugs reported by Coverity all over the tree and, as part
of my work, I'm also contributing to the KSPP. My work in the kernel
community has been supervised by Greg KH and Kees Cook.
OK. So, after the quick introduction above, please, pull the following
patches that mark switch cases where we are expecting to fall through.
These patches are part of the ongoing efforts to enable -Wimplicit-fallthrough.
They have been ignored for a long time (most of them more than 3 months,
even after pinging multiple times), which is the reason why I've created
this tree. Most of them have been baking in linux-next for a whole development
cycle. And with Stephen Rothwell's help, we've had linux-next nag-emails
going out for newly introduced code that triggers -Wimplicit-fallthrough
to avoid gaining more of these cases while we work to remove the ones
that are already present.
I'm happy to let you know that we are getting close to completing this
work. Currently, there are only 32 of 2311 of these cases left to be
addressed in linux-next. I'm auditing every case; I take a look into
the code and analyze it in order to determine if I'm dealing with an
actual bug or a false positive, as explained here:
https://lore.kernel.org/lkml/c2fad584-1705-a5f2-d63c-824e9b96cf50@embeddedor.com/
While working on this, I've found and fixed the following missing
break/return bugs, some of them introduced more than 5 years ago:
84242b82d87850b51b6c5e420fe63509186e5034b5be8531817264235ee7cc5034a5d2479826cc865340f23df8df997abeeb2f10d82373307b00c5e65d25ff7a54a7ed5b3e7dc24bfa8f21ad0eaee6199ba8376ce1dc586a60a1a8e9b186f14e57562b4860747828eac5b974bee9cc44ba91162c930e3d0a
Once this work is finish, we'll be able to universally enable
"-Wimplicit-fallthrough" to avoid any of these kinds of bugs from
entering the kernel again.
Thanks
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAlzQR2IACgkQRwW0y0cG
2zEJbQ//X930OcBtT/9DRW4XL1Jeq0Mjssz/GLX2Vpup5CwwcTROG65no80Zezf/
yQRWnUjGX0OBv/fmUK32/nTxI/7k7NkmIXJHe0HiEF069GEENB7FT6tfDzIPjU8M
qQkB8NsSUWJs3IH6BVynb/9MGE1VpGBDbYk7CBZRtRJT1RMM+3kQPucgiZMgUBPo
Yd9zKwn4i/8tcOCli++EUdQ29ukMoY2R3qpK4LftdX9sXLKZBWNwQbiCwSkjnvJK
I6FDiA7RaWH2wWGlL7BpN5RrvAXp3z8QN/JZnivIGt4ijtAyxFUL/9KOEgQpBQN2
6TBRhfTQFM73NCyzLgGLNzvd8awem1rKGSBBUvevaPbgesgM+Of65wmmTQRhFNCt
A7+e286X1GiK3aNcjUKrByKWm7x590EWmDzmpmICxNPdt5DHQ6EkmvBdNjnxCMrO
aGA24l78tBN09qN45LR7wtHYuuyR0Jt9bCmeQZmz7+x3ICDHi/+Gw7XPN/eM9+T6
lZbbINiYUyZVxOqwzkYDCsdv9+kUvu3e4rPs20NERWRpV8FEvBIyMjXAg6NAMTue
K+ikkyMBxCvyw+NMimHJwtD7ho4FkLPcoeXb2ZGJTRHixiZAEtF1RaQ7dA05Q/SL
gbSc0DgLZeHlLBT+BSWC2Z8SDnoIhQFXW49OmuACwCUC68NHKps=
=k30z
-----END PGP SIGNATURE-----
Merge tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull Wimplicit-fallthrough updates from Gustavo A. R. Silva:
"Mark switch cases where we are expecting to fall through.
This is part of the ongoing efforts to enable -Wimplicit-fallthrough.
Most of them have been baking in linux-next for a whole development
cycle. And with Stephen Rothwell's help, we've had linux-next
nag-emails going out for newly introduced code that triggers
-Wimplicit-fallthrough to avoid gaining more of these cases while we
work to remove the ones that are already present.
We are getting close to completing this work. Currently, there are
only 32 of 2311 of these cases left to be addressed in linux-next. I'm
auditing every case; I take a look into the code and analyze it in
order to determine if I'm dealing with an actual bug or a false
positive, as explained here:
https://lore.kernel.org/lkml/c2fad584-1705-a5f2-d63c-824e9b96cf50@embeddedor.com/
While working on this, I've found and fixed the several missing
break/return bugs, some of them introduced more than 5 years ago.
Once this work is finished, we'll be able to universally enable
"-Wimplicit-fallthrough" to avoid any of these kinds of bugs from
entering the kernel again"
* tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits)
memstick: mark expected switch fall-throughs
drm/nouveau/nvkm: mark expected switch fall-throughs
NFC: st21nfca: Fix fall-through warnings
NFC: pn533: mark expected switch fall-throughs
block: Mark expected switch fall-throughs
ASN.1: mark expected switch fall-through
lib/cmdline.c: mark expected switch fall-throughs
lib: zstd: Mark expected switch fall-throughs
scsi: sym53c8xx_2: sym_nvram: Mark expected switch fall-through
scsi: sym53c8xx_2: sym_hipd: mark expected switch fall-throughs
scsi: ppa: mark expected switch fall-through
scsi: osst: mark expected switch fall-throughs
scsi: lpfc: lpfc_scsi: Mark expected switch fall-throughs
scsi: lpfc: lpfc_nvme: Mark expected switch fall-through
scsi: lpfc: lpfc_nportdisc: Mark expected switch fall-through
scsi: lpfc: lpfc_hbadisc: Mark expected switch fall-throughs
scsi: lpfc: lpfc_els: Mark expected switch fall-throughs
scsi: lpfc: lpfc_ct: Mark expected switch fall-throughs
scsi: imm: mark expected switch fall-throughs
scsi: csiostor: csio_wr: mark expected switch fall-through
...
Move the page_done callback into a separate iomap_page_ops structure and
add a page_prepare calback to be called before the next page is written
to. In gfs2, we'll want to start a transaction in page_prepare and end
it in page_done. Other filesystems that implement data journaling will
require the same kind of mechanism.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
This patch fixes the following warnings:
fs/affs/affs.h:124:38: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1692:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1694:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ceph/file.c:249:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:618:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:620:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/btrfs/ref-verify.c:522:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:711:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:722:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/jffs2/fs.c:339:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/nfsd/nfs4proc.c:429:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:62:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:43:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/fcntl.c:770:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/seq_file.c:319:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:148:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:150:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/signalfd.c:178:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/locks.c:1473:16: warning: this statement may fall through [-Wimplicit-fallthrough=]
Warning level 3 was used: -Wimplicit-fallthrough=3
This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>