A mirror of the official Linux kernel repository just in case
Go to file
Darrick J. Wong 13ae04d8d4 xfs: force all buffers to be written during btree bulk load
While stress-testing online repair of btrees, I noticed periodic
assertion failures from the buffer cache about buffers with incorrect
DELWRI_Q state.  Looking further, I observed this race between the AIL
trying to write out a btree block and repair zapping a btree block after
the fact:

AIL:    Repair0:

pin buffer X
delwri_queue:
set DELWRI_Q
add to delwri list

        stale buf X:
        clear DELWRI_Q
        does not clear b_list
        free space X
        commit

delwri_submit   # oops

Worse yet, I discovered that running the same repair over and over in a
tight loop can result in a second race that cause data integrity
problems with the repair:

AIL:    Repair0:        Repair1:

pin buffer X
delwri_queue:
set DELWRI_Q
add to delwri list

        stale buf X:
        clear DELWRI_Q
        does not clear b_list
        free space X
        commit

                        find free space X
                        get buffer
                        rewrite buffer
                        delwri_queue:
                        set DELWRI_Q
                        already on a list, do not add
                        commit

                        BAD: committed tree root before all blocks written

delwri_submit   # too late now

I traced this to my own misunderstanding of how the delwri lists work,
particularly with regards to the AIL's buffer list.  If a buffer is
logged and committed, the buffer can end up on that AIL buffer list.  If
btree repairs are run twice in rapid succession, it's possible that the
first repair will invalidate the buffer and free it before the next time
the AIL wakes up.  Marking the buffer stale clears DELWRI_Q from the
buffer state without removing the buffer from its delwri list.  The
buffer doesn't know which list it's on, so it cannot know which lock to
take to protect the list for a removal.

If the second repair allocates the same block, it will then recycle the
buffer to start writing the new btree block.  Meanwhile, if the AIL
wakes up and walks the buffer list, it will ignore the buffer because it
can't lock it, and go back to sleep.

When the second repair calls delwri_queue to put the buffer on the
list of buffers to write before committing the new btree, it will set
DELWRI_Q again, but since the buffer hasn't been removed from the AIL's
buffer list, it won't add it to the bulkload buffer's list.

This is incorrect, because the bulkload caller relies on delwri_submit
to ensure that all the buffers have been sent to disk /before/
committing the new btree root pointer.  This ordering requirement is
required for data consistency.

Worse, the AIL won't clear DELWRI_Q from the buffer when it does finally
drop it, so the next thread to walk through the btree will trip over a
debug assertion on that flag.

To fix this, create a new function that waits for the buffer to be
removed from any other delwri lists before adding the buffer to the
caller's delwri list.  By waiting for the buffer to clear both the
delwri list and any potential delwri wait list, we can be sure that
repair will initiate writes of all buffers and report all write errors
back to userspace instead of committing the new structure.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2023-12-15 10:03:27 -08:00
arch powerpc fixes for 6.7 #3 2023-12-03 08:43:35 +09:00
block block-6.7-2023-12-01 2023-12-02 06:39:30 +09:00
certs This update includes the following changes: 2023-11-02 16:15:30 -10:00
crypto This push fixes a regression in ahash and hides the Kconfig sub-options for the jitter RNG. 2023-11-09 17:04:58 -08:00
Documentation Documentation: xfs: consolidate XFS docs into its own subdirectory 2023-12-07 14:49:13 +05:30
drivers mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind 2023-12-07 14:34:26 +05:30
fs xfs: force all buffers to be written during btree bulk load 2023-12-15 10:03:27 -08:00
include mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind 2023-12-07 14:34:26 +05:30
init As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
io_uring io_uring: use fget/fput consistently 2023-11-28 11:56:29 -07:00
ipc Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
kernel Probes fixes for v6.7-r3: 2023-12-03 08:02:49 +09:00
lib Probes fixes for v6.7-r3: 2023-12-03 08:02:49 +09:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
mm mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind 2023-12-07 14:34:26 +05:30
net wireless fixes: 2023-11-29 19:43:34 -08:00
rust Kbuild updates for v6.7 2023-11-04 08:07:19 -10:00
samples Landlock updates for v6.7-rc1 2023-11-03 09:28:53 -10:00
scripts hardening fixes for v6.7-rc4 2023-12-01 14:17:54 +09:00
security + Features 2023-11-03 09:48:17 -10:00
sound ALSA: hda: Disable power-save on KONTRON SinglePC 2023-11-30 16:14:21 +01:00
tools perf tools fixes for v6.7: 1st batch 2023-12-01 10:17:16 +09:00
usr arch: Remove Itanium (IA-64) architecture 2023-09-11 08:13:17 +00:00
virt ARM: 2023-09-07 13:52:20 -07:00
.clang-format iommu: Add for_each_group_device() 2023-05-23 08:15:51 +02:00
.cocciconfig
.get_maintainer.ignore
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore kbuild: rpm-pkg: generate kernel.spec in rpmbuild/SPECS/ 2023-10-03 20:49:09 +09:00
.mailmap As usual, lots of singleton and doubleton patches all over the tree and 2023-11-02 20:53:31 -10:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING
CREDITS USB: Remove Wireless USB and UWB documentation 2023-08-09 14:17:32 +02:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig
MAINTAINERS Documentation: xfs: consolidate XFS docs into its own subdirectory 2023-12-07 14:49:13 +05:30
Makefile Linux 6.7-rc4 2023-12-03 18:52:56 +09:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.