When reading in block groups, a global mask of the available raid policies
should be adjusted based on the types of block groups found on disk. This
global mask is then used to decide which raid policy to use for new
block groups.
The recent allocator changes dropped the call that updated the global
mask, making all the block groups allocated at run time single striped
onto a single drive.
This also fixes the async worker threads to set any thread that uses
the requeue mechanism as busy. This allows us to avoid blocking
on get_request_wait for the async bio submission threads.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This improves the comments at the top of many functions. It didn't
dive into the guts of functions because I was trying to
avoid merging problems with the new allocator and back reference work.
extent-tree.c and volumes.c were both skipped, and there is definitely
more work todo in cleaning and commenting the code.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Btrfs had compatibility code for kernels back to 2.6.18. These have
been removed, and will be maintained in a separate backport
git tree from now on.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Before this change, btrfs would use a bdi congestion function to make
sure there weren't too many pending async checksum work items.
This change makes the process creating async work items wait instead,
leading to fewer congestion returns from the bdi. This improves
pdflush background_writeout scanning.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Large streaming reads make for large bios, which means each entry on the
list async work queues represents a large amount of data. IO
congestion throttling on the device was kicking in before the async
worker threads decided a single thread was busy and needed some help.
The end result was that a streaming read would result in a single CPU
running at 100% instead of balancing the work off to other CPUs.
This patch also changes the pre-IO checksum lookup done by reads to
work on a per-bio basis instead of a per-page. This results in many
extra btree lookups on large streaming reads. Doing the checksum lookup
right before bio submit allows us to reuse searches while processing
adjacent offsets.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When kthread_run() returns failure, this worker hasn't been
added to the list, so btrfs_stop_workers() won't free it.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This changes the worker thread pool to maintain a list of idle threads,
avoiding a complex search for a good thread to wake up.
Threads have two states:
idle - we try to reuse the last thread used in hopes of improving the batching
ratios
busy - each time a new work item is added to a busy task, the task is
rotated to the end of the line.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Btrfs has been using workqueues to spread the checksumming load across
other CPUs in the system. But, workqueues only schedule work on the
same CPU that queued the work, giving them a limited benefit for systems with
higher CPU counts.
This code adds a generic facility to schedule work with pools of kthreads,
and changes the bio submission code to queue bios up. The queueing is
important to make sure large numbers of procs on the system don't
turn streaming workloads into random workloads by sending IO down
concurrently.
The end result of all of this is much higher performance (and CPU usage) when
doing checksumming on large machines. Two worker pools are created,
one for writes and one for endio processing. The two could deadlock if
we tried to service both from a single pool.
Signed-off-by: Chris Mason <chris.mason@oracle.com>