All callers pass btrfs_get_extent_fiemap and get_extent_skip_holes
itself is used only as a fiemap helper.
Signed-off-by: David Sterba <dsterba@suse.com>
All callers pass btrfs_get_extent_fiemap and we don't expect anything
else in the context of extent_fiemap.
Signed-off-by: David Sterba <dsterba@suse.com>
Previous patches cleaned up all places where
extent_page_data::get_extent was set and it was btrfs_get_extent all the
time, so we can simply call that instead.
This also reduces size of extent_page_data by 8 bytes which has positive
effect on stack consumption on various functions on the write out path.
Signed-off-by: David Sterba <dsterba@suse.com>
Since tree-checker has verified leaf when reading from disk, we don't
need the existing verify_dir_item() or btrfs_is_name_len_valid() checks.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add checker for dir item, for key types DIR_ITEM, DIR_INDEX and
XATTR_ITEM.
This checker does comprehensive checks for:
1) dir_item header and its data size
Against item boundary and maximum name/xattr length.
This part is mostly the same as old verify_dir_item().
2) dir_type
Against maximum file types, and against key type.
Since XATTR key should only have FT_XATTR dir item, and normal dir
item type should not have XATTR key.
The check between key->type and dir_type is newly introduced by this
patch.
3) name hash
For XATTR and DIR_ITEM key, key->offset is name hash (crc32c).
Check the hash of the name against the key to ensure it's correct.
The name hash check is only found in btrfs-progs before this patch.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
All callers use GFP_NOFS, we don't have to pass it as an argument. The
built-in tests pass GFP_KERNEL, but they run only at module load time
and NOFS works there as well.
Signed-off-by: David Sterba <dsterba@suse.com>
Use __clear_extent_bit directly in case we want to pass unknown
gfp flags. Otherwise all clear_extent_bit callers use GFP_NOFS, so we
can sink them to the function and reduce argument count, at the cost
that __clear_extent_bit has to be exported.
Signed-off-by: David Sterba <dsterba@suse.com>
We take the fs_devices::device_list_mutex mutex in write_all_supers
which will prevent any add/del changes to the device list. Therefore we
don't need to use the RCU variant list_for_each_entry_rcu in any of the
called functions.
Signed-off-by: David Sterba <dsterba@suse.com>
We don't need to use the mutex as we do not modify the devices nor the
list itself and just read information about device counts.
Move copying fsid out of the protected section, not applicable to RCU
same as the rest of the retrieved information.
Signed-off-by: David Sterba <dsterba@suse.com>
We don't need to use the mutex as we do not modify the devices nor the
list itself and just read some information:
does not change during device lifetime:
- devid
- uuid
- name (ie. the path)
may change in parallel to the ioctl call, but can lead only to reporting
inacurracy:
- bytes_used
- total_bytes
Signed-off-by: David Sterba <dsterba@suse.com>
A helper to free a device and all it's dynamically allocated members,
like the rcu_string name or flush_bio. This is going to replace all
open coded places.
Signed-off-by: David Sterba <dsterba@suse.com>
Make it clear that it is an RCU helper, we want to use the name
free_device for a wrapper freeing all device members.
Signed-off-by: David Sterba <dsterba@suse.com>
These rules have been hidden in several if-else and are not
straightforward to follow, for example, dio submit hook's nocsum case
has a bug , i.e. doing async submit instead of sync submit, which has
been fixed recently.
This is documenting the rules for reference.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
After commit 996478ca9c ("btrfs: change how we decide to commit
transactions during flushing") there is no need to hold the delayed_rsv
during the percpu_counter_compare call since we get the byte's snapshot
earlier. So hold the lock only while reading delayed_rsv.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Adding __init macro gives kernel a hint that this function is only used
during the initialization phase and its memory resources can be freed up
after.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Function btrfs_add_device() is adding the device item so rename to
reflect that in the function. Similarly we have btrfs_rm_dev_item().
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_create_tree() will unconditionally generate UUID for any root.
So for quota tree and data reloc tree created by kernel, they will have
unique UUIDs.
However UUID in root item is only referred by UUID tree, which only
records UUID for fs trees. This makes unique UUIDs for quota/data reloc
tree meaningless.
Leave the UUID as zero for non-fs tree, making btrfs-debug-tree output
less confusing.
Reported-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
A cleanup patch no functional change, we hold volume_mutex before
calling btrfs_rm_device, so move it into the function itself.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Right before we go into this loop locked_end is set to alloc_end - 1 and
is being used in nearby functions, no need to have exceptions. This just
makes the code consistent, no functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Fallocating a file in btrfs goes through several stages. The one before
actually inserting the fallocated extents is to create a qgroup
reservation, covering the desired range. To this end there is a loop in
btrfs_fallocate which checks to see if there are holes in the fallocated
range or !PREALLOC extents past EOF and if so create qgroup reservations
for them. Unfortunately, the main condition of the loop is burried right
at the end of its body rather than in the actual while statement which
makes it non-obvious. Fix this by moving the condition in the while
statement where it belongs. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It was introduced because btrfs used to do blkdev_put in a deferred
work, now that btrfs has blkdev_put in place, this rcu_barrier can be
removed.
modprobe -r btrfs will do btrfs_cleanup_fs_uuids(), where it cleanup
every %fs_devices on the list, but when we do btrfs_close_devices(), we
have replaced the devices on the list with dummy ones which only have
the same name and uuid, so modprobe -r btrfs will free those instead of
what we were using, this change won't cause a problem for it.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ copied 2nd paragraph from mailinglist discussion ]
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_balance_delayed_items is the sole caller of
btrfs_wq_run_delayed_node and already includes one of the checks whether
the delayed inodes should be run. On the other hand
btrfs_wq_run_delayed_node duplicates that check and performs an
additional one for wq congestion.
Let's remove the duplicate check and move the congestion one in
btrfs_balance_delayed_items, leaving btrfs_wq_run_delayed_node to only
care about setting up the wq run. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently btrfs_async_run_delayed_root's implementation uses 3 goto
labels to mimic the functionality of a simple do {} while loop. Refactor
the function to use a do {} while construct, making intention clear and
code easier to follow. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The following callpath is always invoked with mirror_num set to 0, so
let's remove it as an argument and directly pass 0 to __do_redpage. No
functional change.
extent_readpages
__extent_readpages
__do_contiguous_readpages
__do_readpage
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's sole callsite was removed in a previous patch so just nuke it for good.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
As per atomic_t.txt documentation :
- RMW operations that have a return value are fully ordered;
atomic_xchg is one such operation so it already includes everything it
needs w.r.t memory ordering and add a comment to be more explicit about
that.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit addc3fa74e ("Btrfs: Fix the problem that the dirty flag of dev
stats is cleared") reworked the way device stats changes are tracked. A
new atomic dev_stats_ccnt counter was introduced which is incremented
every time any of the device stats counters are changed. This serves as
a flag whether there are any pending stats changes. However, this patch
only partially implemented the correct memory barriers necessary:
- It only ordered the stores to the counters but not the reads e.g.
btrfs_run_dev_stats
- It completely omitted any comments documenting the intended design and
how the memory barriers pair with each-other
This patch provides the necessary comments as well as adds a missing
smp_rmb in btrfs_run_dev_stats. Furthermore since dev_stats_cnt is only
a snapshot at best there was no point in reading the counter twice -
once in btrfs_dev_stats_dirty and then again when assigning stats_cnt.
Just collapse both reads into 1.
Fixes: addc3fa74e ("Btrfs: Fix the problem that the dirty flag of dev stats is cleared")
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_end_bio() is using btrfs_dev_stat_inc() and then
btrfs_dev_stat_print_on_error() separately instead use
btrfs_dev_stat_inc_and_print() directly.
As of now there isn't any bio in btrfs which is - a non-empty write and
also the REQ_PREFLUSH flag is set. So in actual the condition
if (bio->bi_opf & REQ_PREFLUSH)
is never true in btrfs_end_bio(), and so there won't be any redundant
error log by using btrfs_dev_stat_inc_and_print() separately one for
write and another for flush.
This consolidation will help to add the device critical error handles in
the function btrfs_dev_stat_inc_and_print() and which can be renamed as
needed.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's pointless to defer it to a kthread helper as we're not under a
special context.
For reference, commit 1f78160ce1 ("Btrfs: using rcu lock in the reader
side of devices list") introduced RCU freeing for device structures.
Originally the blkdev_put was called from free_device and rcu_barrier had
to be called. This is no longer required, bdev and our device structures
are now freed separately.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ enhance changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
In functions like btrfs_create(), we run both
btrfs_balance_delayed_items() and btrfs_btree_balance_dirty() after
the operation, but btrfs_btree_balance_dirty() is surely going to run
btrfs_balance_delayed_items().
This keeps only btrfs_btree_balance_dirty().
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull x86 pti fixes from Thomas Gleixner:
"A small set of fixes for the meltdown/spectre mitigations:
- Make kprobes aware of retpolines to prevent probes in the retpoline
thunks.
- Make the machine check exception speculation protected. MCE used to
issue an indirect call directly from the ASM entry code. Convert
that to a direct call into a C-function and issue the indirect call
from there so the compiler can add the retpoline protection,
- Make the vmexit_fill_RSB() assembly less stupid
- Fix a typo in the PTI documentation"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/retpoline: Optimize inline assembler for vmexit_fill_RSB
x86/pti: Document fix wrong index
kprobes/x86: Disable optimizing on the function jumps to indirect thunk
kprobes/x86: Blacklist indirect thunk functions for kprobes
retpoline: Introduce start/end markers of indirect thunk
x86/mce: Make machine check speculation protected
Pull x86 kexec fix from Thomas Gleixner:
"A single fix for the WBINVD issue introduced by the SME support which
causes kexec fails on non AMD/SME capable CPUs. Issue WBINVD only when
the CPU has SME and avoid doing so in a loop"
[ Side note: this patch fixes the problem, but it isn't entirely clear
why it is required. The wbinvd should just work regardless, but there
seems to be some system - as opposed to CPU - issue, since the wbinvd
causes more problems later in the shutdown sequence, but wbinvd
instructions while the system is still active are not problematic.
Possibly some SMI or pending machine check issue on the affected system ]
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Rework wbinvd, hlt operation in stop_this_cpu()
Pull irq fix from Thomas Gleixner:
"A single fix for the new matrix allocator to prevent vector exhaustion
by certain network drivers which allocate gazillions of unused vectors
which cannot be put into reservation mode due to MSI and the lack of
MSI entry masking.
The fix/workaround is to spread the vectors across CPUs by searching
the supplied target CPU mask for the CPU with the smallest number of
allocated vectors"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irq/matrix: Spread interrupts on allocation
Pull alpha fixes from Matt Turner:
"A build fix and a regression fix"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
alpha/PCI: Fix noname IRQ level detection
alpha: extend memset16 to EV6 optimised routines
Commit bacf6b499e ("x86/mm: Use a struct to reduce parameters for SME
PGD mapping") moved some parameters into a structure.
The structure was large enough to trigger the stack protection canary in
sme_encrypt_kernel which doesn't work this early, causing reboots.
Mark sme_encrypt_kernel appropriately to not use the canary.
Fixes: bacf6b499e ("x86/mm: Use a struct to reduce parameters for SME PGD mapping")
Signed-off-by: Laura Abbott <labbott@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The conversion of the alpha architecture PCI host bridge legacy IRQ
mapping/swizzling to the new PCI host bridge map/swizzle hooks carried
out through:
commit 0e4c2eeb75 ("alpha/PCI: Replace pci_fixup_irqs() call with
host bridge IRQ mapping hooks")
implies that IRQ for devices are now allocated through pci_assign_irq()
function in pci_device_probe() that is called when a driver matching a
device is found in order to probe the device through the device driver.
Alpha noname platforms required IRQ level programming to be executed
in sio_fixup_irq_levels(), that is called in noname_init_pci(), a
platform hook called within a subsys_initcall.
In noname_init_pci(), present IRQs are detected through
sio_collect_irq_levels() that check the struct pci_dev->irq number
to detect if an IRQ has been allocated for the device.
By the time sio_collect_irq_levels() is called, some devices may still
have not a matching driver loaded to match them (eg loadable module)
therefore their IRQ allocation is still pending - which means that
sio_collect_irq_levels() does not programme the correct IRQ level for
those devices, causing their IRQ handling to be broken when the device
driver is actually loaded and the device is probed.
Fix the issue by adding code in the noname map_irq() function
(noname_map_irq()) that, whilst mapping/swizzling the IRQ line, it also
ensures that the correct IRQ level programming is executed at platform
level, fixing the issue.
Fixes: 0e4c2eeb75 ("alpha/PCI: Replace pci_fixup_irqs() call with
host bridge IRQ mapping hooks")
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: stable@vger.kernel.org # 4.14
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Meelis Roos <mroos@linux.ee>
Signed-off-by: Matt Turner <mattst88@gmail.com>
ARM:
* fix incorrect huge page mappings on systems using the contiguous hint
for hugetlbfs
* support alternative GICv4 init sequence
* correctly implement the ARM SMCC for HVC and SMC handling
PPC:
* add KVM IOCTL for reporting vulnerability and workaround status
s390:
* provide userspace interface for branch prediction changes in firmware
x86:
* use correct macros for bits
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJaY3/eAAoJEED/6hsPKofo64kH/16SCSA9pKJTf39+jLoCPzbp
tlhzxoaqb9cPNMQBAk8Cj5xNJ6V4Clwnk8iRWaE6dRI5nWQxnxRHiWxnrobHwUbK
I0zSy+SywynSBnollKzLzQrDUBZ72fv3oLwiYEYhjMvs0zW6Q/vg10WERbav912Q
bv8nb5e8TbvU500ErndKTXOa8/B6uZYkMVjBNvAHwb+4AQ7bJgDQs5/qOeXllm8A
MT/SNYop/fkjRP7mQng5XYzoO+70tbe0hWpOQGgBnduzrbkNNvZtYtovusHYytLX
PAB7DDPbLZm5L2HBo4zvKgTHIoHTxU0X2yfUDzt7O151O2WSyqBRC3y1tpj6xa8=
=GnNJ
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- fix incorrect huge page mappings on systems using the contiguous
hint for hugetlbfs
- support alternative GICv4 init sequence
- correctly implement the ARM SMCC for HVC and SMC handling
PPC:
- add KVM IOCTL for reporting vulnerability and workaround status
s390:
- provide userspace interface for branch prediction changes in
firmware
x86:
- use correct macros for bits"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: s390: wire up bpb feature
KVM: PPC: Book3S: Provide information about hardware/firmware CVE workarounds
KVM/x86: Fix wrong macro references of X86_CR0_PG_BIT and X86_CR4_PAE_BIT in kvm_valid_sregs()
arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
KVM: arm64: Fix GICv4 init when called from vgic_its_create
KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2