If include/config/auto.conf.cmd is lost for some reasons, it is not
self-healing, so the top Makefile misses to run syncconfig.
Move include/config/auto.conf.cmd to the target side.
I used a pattern rule instead of a normal rule here although it is
a bit gross.
If the rule were written with a normal rule like this,
include/config/auto.conf \
include/config/auto.conf.cmd \
include/config/tristate.conf: $(KCONFIG_CONFIG)
$(Q)$(MAKE) -f $(srctree)/Makefile syncconfig
... syncconfig would be executed per target.
Using a pattern rule makes sure that syncconfig is executed just once
because Make assumes the recipe will create all of the targets.
Here is a quote from the GNU Make manual [1]:
"Pattern rules may have more than one target. Unlike normal rules,
this does not act as many different rules with the same prerequisites
and recipe. If a pattern rule has multiple targets, make knows that
the rule's recipe is responsible for making all of the targets. The
recipe is executed only once to make all the targets. When searching
for a pattern rule to match a target, the target patterns of a rule
other than the one that matches the target in need of a rule are
incidental: make worries only about giving a recipe and prerequisites
to the file presently in question. However, when this file's recipe is
run, the other targets are marked as having been updated themselves."
[1]: https://www.gnu.org/software/make/manual/html_node/Pattern-Intro.html
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Introduce a fast path for single-page bvec IO, then we can avoid
to call bvec_split_segs() unnecessarily.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Introduce a fast path for single-page bvec IO, then blk_bvec_map_sg()
can be avoided.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Single-page bvec can often be seen in small BS workloads, so
introduce bvec_nth_page() for avoiding to call nth_page() unnecessarily,
which looks not cheap.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The allocation happens with GFP_KERNEL after a transaction has been
started, this can potentially cause deadlock if reclaim tries to get the
memory by flushing filesystem data.
The fs_info::qgroup_ulist is not used during transaction start when
quotas are not enabled. The status bit BTRFS_FS_QUOTA_ENABLED is set
later in btrfs_quota_enable so it's safe to move it before the
transaction start.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Previously we only updated the drop_progress key if we were in the
DROP_REFERENCE stage of snapshot deletion. This is because the
UPDATE_BACKREF stage checks the flags of the blocks it's converting to
FULL_BACKREF, so if we go over a block we processed before it doesn't
matter, we just don't do anything.
The problem is in do_walk_down() we will go ahead and drop the roots
reference to any blocks that we know we won't need to walk into.
Given subvolume A and snapshot B. The root of B points to all of the
nodes that belong to A, so all of those nodes have a refcnt > 1. If B
did not modify those blocks it'll hit this condition in do_walk_down
if (!wc->update_ref ||
generation <= root->root_key.offset)
goto skip;
and in "goto skip" we simply do a btrfs_free_extent() for that bytenr
that we point at.
Now assume we modified some data in B, and then took a snapshot of B and
call it C. C points to all the nodes in B, making every node the root
of B points to have a refcnt > 1. This assumes the root level is 2 or
higher.
We delete snapshot B, which does the above work in do_walk_down,
free'ing our ref for nodes we share with A that we didn't modify. Now
we hit a node we _did_ modify, thus we own. We need to walk down into
this node and we set wc->stage == UPDATE_BACKREF. We walk down to level
0 which we also own because we modified data. We can't walk any further
down and thus now need to walk up and start the next part of the
deletion. Now walk_up_proc is supposed to put us back into
DROP_REFERENCE, but there's an exception to this
if (level < wc->shared_level)
goto out;
we are at level == 0, and our shared_level == 1. We skip out of this
one and go up to level 1. Since path->slots[1] < nritems we
path->slots[1]++ and break out of walk_up_tree to stop our transaction
and loop back around. Now in btrfs_drop_snapshot we have this snippet
if (wc->stage == DROP_REFERENCE) {
level = wc->level;
btrfs_node_key(path->nodes[level],
&root_item->drop_progress,
path->slots[level]);
root_item->drop_level = level;
}
our stage == UPDATE_BACKREF still, so we don't update the drop_progress
key. This is a problem because we would have done btrfs_free_extent()
for the nodes leading up to our current position. If we crash or
unmount here and go to remount we'll start over where we were before and
try to free our ref for blocks we've already freed, and thus abort()
out.
Fix this by keeping track of the last place we dropped a reference for
our block in do_walk_down. Then if wc->stage == UPDATE_BACKREF we know
we'll start over from a place we meant to, and otherwise things continue
to work as they did before.
I have a complicated reproducer for this problem, without this patch
we'll fail to fsck the fs when replaying the log writes log. With this
patch we can replay the whole log without any fsck or mount failures.
The steps to reproduce this easily are sort of tricky, I had to add a
couple of debug patches to the kernel in order to make it easy,
basically I just needed to make sure we did actually commit the
transaction every time we finished a walk_down_tree/walk_up_tree combo.
The reproducer:
1) Creates a base subvolume.
2) Creates 100k files in the subvolume.
3) Snapshots the base subvolume (snap1).
4) Touches files 5000-6000 in snap1.
5) Snapshots snap1 (snap2).
6) Deletes snap1.
I do this with dm-log-writes, and then replay to every FUA in the log
and fsck the fs.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ copy reproducer steps ]
Signed-off-by: David Sterba <dsterba@suse.com>
There's a bug in snapshot deletion where we won't update the
drop_progress key if we're in the UPDATE_BACKREF stage. This is a
problem because we could drop refs for blocks we know don't belong to
ours. If we crash or umount at the right time we could experience
messages such as the following when snapshot deletion resumes
BTRFS error (device dm-3): unable to find ref byte nr 66797568 parent 0 root 258 owner 1 offset 0
------------[ cut here ]------------
WARNING: CPU: 3 PID: 16052 at fs/btrfs/extent-tree.c:7108 __btrfs_free_extent.isra.78+0x62c/0xb30 [btrfs]
CPU: 3 PID: 16052 Comm: umount Tainted: G W OE 5.0.0-rc4+ #147
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011
RIP: 0010:__btrfs_free_extent.isra.78+0x62c/0xb30 [btrfs]
RSP: 0018:ffffc90005cd7b18 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff88842fade680 RSI: ffff88842fad6b18 RDI: ffff88842fad6b18
RBP: ffffc90005cd7bc8 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000001 R11: ffffffff822696b8 R12: 0000000003fb4000
R13: 0000000000000001 R14: 0000000000000102 R15: ffff88819c9d67e0
FS: 00007f08bb138fc0(0000) GS:ffff88842fac0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8f5d861ea0 CR3: 00000003e99fe000 CR4: 00000000000006e0
Call Trace:
? _raw_spin_unlock+0x27/0x40
? btrfs_merge_delayed_refs+0x356/0x3e0 [btrfs]
__btrfs_run_delayed_refs+0x75a/0x13c0 [btrfs]
? join_transaction+0x2b/0x460 [btrfs]
btrfs_run_delayed_refs+0xf3/0x1c0 [btrfs]
btrfs_commit_transaction+0x52/0xa50 [btrfs]
? start_transaction+0xa6/0x510 [btrfs]
btrfs_sync_fs+0x79/0x1c0 [btrfs]
sync_filesystem+0x70/0x90
generic_shutdown_super+0x27/0x120
kill_anon_super+0x12/0x30
btrfs_kill_super+0x16/0xa0 [btrfs]
deactivate_locked_super+0x43/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x3f/0x80
__cleanup_mnt+0x12/0x20
task_work_run+0x8b/0xc0
exit_to_usermode_loop+0xce/0xd0
do_syscall_64+0x20b/0x210
entry_SYSCALL_64_after_hwframe+0x49/0xbe
To fix this simply mark dead roots we read from disk as DEAD and then
set the walk_control->restarted flag so we know we have a restarted
deletion. From here whenever we try to drop refs for blocks we check to
verify our ref is set on them, and if it is not we skip it. Once we
find a ref that is set we unset walk_control->restarted since the tree
should be in a normal state from then on, and any problems we run into
from there are different issues. I tested this with an existing broken
fs and my reproducer that creates a broken fs and it fixed both file
systems.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The dependency will be checked anyway after Kbuild descends into a
sub-directory. Skip object/source dependency checks in top Makefile.
VPATH can be simpler since the top Makefile no longer checks the
presence of the source file, which is located in in the external
module directory.
One good thing is, it can compile an object from a generated source
file.
$ make crypto/rsapubkey.asn1.o
...
ASN.1 crypto/rsapubkey.asn1.c
CC crypto/rsapubkey.asn1.o
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
The previous commit made 'MAKEFLAGS += -rR' effective in the top
Makefile regardless of O= option, GNU Make versions.
The top Makefile does not need to cancel implicit rules for makefiles.
There is still one place where an empty rule is useful. Since -rR is
effective only after sub-make, GNU Make would try implicit rules to
update the top Makefile. Although it is not a big overhead, cancel it
just in case.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Adding -rR to MAKEFLAGS is important because we do not want to
be bothered by built-in implicit rules or variables.
One problem that used to exist in older GNU Make versions is
MAKEFLAGS += -rR
... does not become effective in the current Makefile. When you are
building with O= option, it becomes effective in the top Makefile
since it recurses via 'sub-make' target. Otherwise, the top Makefile
tries implicit rules. That is why we explicitly add empty rules for
Makefiles, but we often miss to do that.
In fact, adding -d option to older GNU Make versions shows it is
trying a bunch of implicit pattern rules.
Considering target file `scripts/Makefile.kcov'.
Looking for an implicit rule for `scripts/Makefile.kcov'.
Trying pattern rule with stem `Makefile.kcov'.
Trying implicit prerequisite `scripts/Makefile.kcov.o'.
Trying pattern rule with stem `Makefile.kcov'.
Trying implicit prerequisite `scripts/Makefile.kcov.c'.
Trying pattern rule with stem `Makefile.kcov'.
Trying implicit prerequisite `scripts/Makefile.kcov.cc'.
Trying pattern rule with stem `Makefile.kcov'.
Trying implicit prerequisite `scripts/Makefile.kcov.C'.
...
This issue was fixed by GNU Make commit 58dae243526b ("[Savannah #20501]
Handle adding -r/-R to MAKEFLAGS in the makefile"). So, it is no longer
a problem if you use GNU Make 4.0 or later. However, older versions are
still widely used.
So, I decided to patch the kernel Makefile to invoke sub-make regardless
of O= option. This will allow further cleanups.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Since -Wmaybe-uninitialized was introduced by GCC 4.7, we have patched
various false positives:
- commit e74fc973b6 ("Turn off -Wmaybe-uninitialized when building
with -Os") turned off this option for -Os.
- commit 815eb71e71 ("Kbuild: disable 'maybe-uninitialized' warning
for CONFIG_PROFILE_ALL_BRANCHES") turned off this option for
CONFIG_PROFILE_ALL_BRANCHES
- commit a76bcf557e ("Kbuild: enable -Wmaybe-uninitialized warning
for "make W=1"") turned off this option for GCC < 4.9
Arnd provided more explanation in https://lkml.org/lkml/2017/3/14/903
I think this looks better by shifting the logic from Makefile to Kconfig.
Link: https://github.com/ClangBuiltLinux/linux/issues/350
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
- $(word 1, <text>) is equivalent to $(firstword <text>)
- hardcode "gcc" instead of $(CC)
- minimize the shell script part
A little more notes in case $(filter-out -%, ...) is not clear.
arch/mips/Makefile passes prefixes depending on the configuration.
CROSS_COMPILE := $(call cc-cross-prefix, $(tool-archpref)-linux- \
$(tool-archpref)-linux-gnu- $(tool-archpref)-unknown-linux-gnu-)
In the Kconfig stage (e.g. when you run 'make defconfig'), neither
CONFIG_32BIT nor CONFIG_64BIT is defined. So, $(tool-archpref) is
empty. As a result, "-linux -linux-gnu- -unknown-linux-gnu" is passed
into cc-cross-prefix. The command 'which' assumes arguments starting
with a hyphen as command options, then emits the following messages:
Illegal option -l
Illegal option -l
Illegal option -u
I think it is strange to define CROSS_COMPILE depending on the CONFIG
options since you need to feed $(CC) to Kconfig, but it is how MIPS
Makefile currently works. Anyway, it would not hurt to filter-out
invalid strings beforehand.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
The genksyms source was integrated into the kernel tree in 2003.
I do not expect anybody still using the external /sbin/genksyms.
Kbuild does not need to provide the ability to override GENKSYMS.
Let's remove the GENKSYMS variable, and use the hardcoded path.
Since it occurred in the pre-git era, I attached the commit message
in case somebody is interested in the historical background.
| Author: Kai Germaschewski <kai@tp1.ruhr-uni-bochum.de>
| Date: Wed Feb 19 04:17:28 2003 -0600
|
| kbuild: [PATCH] put genksyms in scripts dir
|
| This puts genksyms into scripts/genksyms/.
|
| genksyms used to be maintained externally, though the only possible user
| was the kernel build. Moving it into the kernel sources makes it easier to
| keep it uptodate, like for example updating it to generate linker scripts
| directly instead of postprocessing the generated header file fragments
| with sed, as we do currently.
|
| Also, genksyms does not handle __typeof__, which needs to be fixed since
| some of the exported symbol in the kernel are defined using __typeof__.
|
| (Rusty Russell/me)
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
gdb-scripts is not a real object, but (ab)used like a phony target.
Rewrite the code in a more Kbuild-ish way. Add symlinks to extra-y
and use if_changed.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
It is weird to create gdb stuff as a side-effect of vmlinux.
Move it to a more relevant place.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Currently, Kbuild descends from scripts/Makefile to scripts/gdb/Makefile
just for creating symbolic links, but it does not need to do it so early.
Merge the two descending paths to simplify the code.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Every time we add/remove a target, we need to touch the header part,
including renumbering. This is not so important information.
Numbering targets is rather misleading because they are not necessarily
generated in this order. For example, 1) and 2) can be executed
simultaneously when the -j option is given.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
scripts/gdb/linux/constants.py is never used in the kernel build
process. There is no good reason to create it so early.
Get it out of the 'prepare' stage.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEw9DWbcNiT/aowBjO3s9rk8bwL68FAlx2a7ESHGNvaHVja0By
ZWRoYXQuY29tAAoJEN7Pa5PG8C+vwCYP/0ZTYTJdWhX3P72Bu2lBZyfrb6fxZYB5
TNFiPFYLDSsCaFhrRVTkz0LEV54DqGuENsmhGPFX0HqzyL9V2RV/5ZAqvCGMIXv1
AxYzTuAL8iihEAzgz7MhO2k2bBVmSWuyFIeBnhhKZYnxUV+ag06wizVghrMKjrmf
p8jgZnXYmk0YeYfahzVJqqZ4J8ZCWugOxYMPCz8zcdW7NScyBS6yLm0eFi8zQJsi
APwc8XdM7nMBeM3uf17J3XXdJbjOyZDCuKMKw9M0fCZ2sn3Y8qbujFcbJmBHjH3a
UfB3wEh1RhvuJm5CjBSwXfhvAhZOra0Vlz+vBdnpcdjuCiU3vCoOIW6+NV6l76N0
CsaOfSo/EvmhXgzFBEfzfkBuapnzcri2PjcFtEKvuY0DsT/K4bzLXcWooBXLvd6n
iy9UNqvEaVR19Djh0ds1y4jsTAWdn8SJr9iu++jhovO5U7X/4BcYtIKqS0TgKvJB
5xoOXHjySlM1K4t78pGaxnwEnq6U9FB46znzUuf6+sRf6mENiNFkK5us5LiqwukA
rTug/VZnxyfkuil+D2bfuGvbI7eIWGMU/NCPynpnx6fTIfyEGVslt6Wv6Ugvygo3
bWLgC9oMtdei9H3RzAknqV0I5XOyZUur+WFctuDxjUtc10uEOTHfC3KS6EzvwLEr
Mf96sfls8aRy
=6hqn
-----END PGP SIGNATURE-----
Merge tag 'vfio-ccw-20190227' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw into features
Pull vfio-ccw from Cornelia Huck with the following changes:
- Further fixes in TIC handling.
The commit identified below adds MC_BTB_FLUSH macro only when
CONFIG_PPC_FSL_BOOK3E is defined. This results in the following error
on some configs (seen several times with kisskb randconfig_defconfig)
arch/powerpc/kernel/exceptions-64e.S:576: Error: Unrecognized opcode: `mc_btb_flush'
make[3]: *** [scripts/Makefile.build:367: arch/powerpc/kernel/exceptions-64e.o] Error 1
make[2]: *** [scripts/Makefile.build:492: arch/powerpc/kernel] Error 2
make[1]: *** [Makefile:1043: arch/powerpc] Error 2
make: *** [Makefile:152: sub-make] Error 2
This patch adds a blank definition of MC_BTB_FLUSH for other cases.
Fixes: 10c5e83afd ("powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)")
Cc: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reflinking (clone/dedupe) and rename are operations that operate on two
inodes and therefore need to lock them in the same order to avoid ABBA
deadlocks. It happens that Btrfs' reflink implementation always locked
them in a different order from VFS's lock_two_nondirectories() helper,
which is used by the rename code in VFS, resulting in ABBA type deadlocks.
Btrfs' locking order:
static void btrfs_double_inode_lock(struct inode *inode1, struct inode *inode2)
{
if (inode1 < inode2)
swap(inode1, inode2);
inode_lock_nested(inode1, I_MUTEX_PARENT);
inode_lock_nested(inode2, I_MUTEX_CHILD);
}
VFS's locking order:
void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
{
if (inode1 > inode2)
swap(inode1, inode2);
if (inode1 && !S_ISDIR(inode1->i_mode))
inode_lock(inode1);
if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
inode_lock_nested(inode2, I_MUTEX_NONDIR2);
}
Fix this by killing the btrfs helper function that does the double inode
locking and replace it with VFS's helper lock_two_nondirectories().
Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Fixes: 416161db9b ("btrfs: offline dedupe")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In the past we had data corruption when reading compressed extents that
are shared within the same file and they are consecutive, this got fixed
by commit 005efedf2c ("Btrfs: fix read corruption of compressed and
shared extents") and by commit 808f80b467 ("Btrfs: update fix for read
corruption of compressed and shared extents"). However there was a case
that was missing in those fixes, which is when the shared and compressed
extents are referenced with a non-zero offset. The following shell script
creates a reproducer for this issue:
#!/bin/bash
mkfs.btrfs -f /dev/sdc &> /dev/null
mount -o compress /dev/sdc /mnt/sdc
# Create a file with 3 consecutive compressed extents, each has an
# uncompressed size of 128Kb and a compressed size of 4Kb.
for ((i = 1; i <= 3; i++)); do
head -c 4096 /dev/zero
for ((j = 1; j <= 31; j++)); do
head -c 4096 /dev/zero | tr '\0' "\377"
done
done > /mnt/sdc/foobar
sync
echo "Digest after file creation: $(md5sum /mnt/sdc/foobar)"
# Clone the first extent into offsets 128K and 256K.
xfs_io -c "reflink /mnt/sdc/foobar 0 128K 128K" /mnt/sdc/foobar
xfs_io -c "reflink /mnt/sdc/foobar 0 256K 128K" /mnt/sdc/foobar
sync
echo "Digest after cloning: $(md5sum /mnt/sdc/foobar)"
# Punch holes into the regions that are already full of zeroes.
xfs_io -c "fpunch 0 4K" /mnt/sdc/foobar
xfs_io -c "fpunch 128K 4K" /mnt/sdc/foobar
xfs_io -c "fpunch 256K 4K" /mnt/sdc/foobar
sync
echo "Digest after hole punching: $(md5sum /mnt/sdc/foobar)"
echo "Dropping page cache..."
sysctl -q vm.drop_caches=1
echo "Digest after hole punching: $(md5sum /mnt/sdc/foobar)"
umount /dev/sdc
When running the script we get the following output:
Digest after file creation: 5a0888d80d7ab1fd31c229f83a3bbcc8 /mnt/sdc/foobar
linked 131072/131072 bytes at offset 131072
128 KiB, 1 ops; 0.0033 sec (36.960 MiB/sec and 295.6830 ops/sec)
linked 131072/131072 bytes at offset 262144
128 KiB, 1 ops; 0.0015 sec (78.567 MiB/sec and 628.5355 ops/sec)
Digest after cloning: 5a0888d80d7ab1fd31c229f83a3bbcc8 /mnt/sdc/foobar
Digest after hole punching: 5a0888d80d7ab1fd31c229f83a3bbcc8 /mnt/sdc/foobar
Dropping page cache...
Digest after hole punching: fba694ae8664ed0c2e9ff8937e7f1484 /mnt/sdc/foobar
This happens because after reading all the pages of the extent in the
range from 128K to 256K for example, we read the hole at offset 256K
and then when reading the page at offset 260K we don't submit the
existing bio, which is responsible for filling all the page in the
range 128K to 256K only, therefore adding the pages from range 260K
to 384K to the existing bio and submitting it after iterating over the
entire range. Once the bio completes, the uncompressed data fills only
the pages in the range 128K to 256K because there's no more data read
from disk, leaving the pages in the range 260K to 384K unfilled. It is
just a slightly different variant of what was solved by commit
005efedf2c ("Btrfs: fix read corruption of compressed and shared
extents").
Fix this by forcing a bio submit, during readpages(), whenever we find a
compressed extent map for a page that is different from the extent map
for the previous page or has a different starting offset (in case it's
the same compressed extent), instead of the extent map's original start
offset.
A test case for fstests follows soon.
Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Fixes: 808f80b467 ("Btrfs: update fix for read corruption of compressed and shared extents")
Fixes: 005efedf2c ("Btrfs: fix read corruption of compressed and shared extents")
Cc: stable@vger.kernel.org # 4.3+
Tested-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently the opal log is globally readable. It is kernel policy to
limit the visibility of physical addresses / kernel pointers to root.
Given this and the fact the opal log may contain this information it
would be better to limit the readability to root.
Fixes: bfc36894a4 ("powerpc/powernv: Add OPAL message log interface")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Stewart Smith <stewart@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
hashtable is never used for 2-byte keys, remove nft_hash_key().
Fixes: e240cd0df4 ("netfilter: nf_tables: place all set backends in one single module")
Reported-by: Florian Westphal <fw@strlen.de>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Use the element from the loop iteration, not the same element we want to
deactivate otherwise this branch always evaluates true.
Fixes: 6c03ae210c ("netfilter: nft_set_hash: add non-resizable hashtable implementation")
Reported-by: Florian Westphal <fw@strlen.de>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Call jhash_1word() for the 4-bytes key case from the insertion and
deactivation path, otherwise big endian arch set lookups fail.
Fixes: 446a8268b7 ("netfilter: nft_set_hash: add lookup variant for fixed size hashtable")
Reported-by: Florian Westphal <fw@strlen.de>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Empty case is fine and does not switch fall-through
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
No need to dirty a cache line if timeout is unchanged.
Also, WARN() is useless here: we crash on 'skb->len' access
if skb is NULL.
Last, ct->timeout is u32, not 'unsigned long' so adapt the
function prototype accordingly.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The l3proto name is gone, its header file is the last trace.
While at it, also remove nf_nat_core.h, its very small and all users
include nf_nat.h too.
before:
text data bss dec hex filename
22948 1612 4136 28696 7018 nf_nat.ko
after removal of l3proto register/unregister functions:
text data bss dec hex filename
22196 1516 4136 27848 6cc8 nf_nat.ko
checkpatch complains about overly long lines, but line breaks
do not make things more readable and the line length gets smaller
here, not larger.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
after ipv4/6 nat tracker merge, there are no external callers, so
make last function static and remove the header.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
before:
text data bss dec hex filename
16566 1576 4136 22278 5706 nf_nat.ko
3598 844 0 4442 115a nf_nat_ipv6.ko
3187 844 0 4031 fbf nf_nat_ipv4.ko
after:
text data bss dec hex filename
22948 1612 4136 28696 7018 nf_nat.ko
... with ipv4/v6 nat now provided directly via nf_nat.ko.
Also changes:
ret = nf_nat_ipv4_fn(priv, skb, state);
if (ret != NF_DROP && ret != NF_STOLEN &&
into
if (ret != NF_ACCEPT)
return ret;
everywhere.
The nat hooks never should return anything other than
ACCEPT or DROP (and the latter only in rare error cases).
The original code uses multi-line ANDing including assignment-in-if:
if (ret != NF_DROP && ret != NF_STOLEN &&
!(IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) &&
(ct = nf_ct_get(skb, &ctinfo)) != NULL) {
I removed this while moving, breaking those in separate conditionals
and moving the assignments into extra lines.
checkpatch still generates some warnings:
1. Overly long lines (of moved code).
Breaking them is even more ugly. so I kept this as-is.
2. use of extern function declarations in a .c file.
This is necessary evil, we must call
nf_nat_l3proto_register() from the nat core now.
All l3proto related functions are removed later in this series,
those prototypes are then removed as well.
v2: keep empty nf_nat_ipv6_csum_update stub for CONFIG_IPV6=n case.
v3: remove IS_ENABLED(NF_NAT_IPV4/6) tests, NF_NAT_IPVx toggles
are removed here.
v4: also get rid of the assignments in conditionals.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
None of these functions calls any external functions, moving them allows
to avoid both the indirection and a need to export these symbols.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In the PRP0001 case, the compatible string may have additional data
affiliated with the device. When we call device_get_match_data() on
such device, we will get nothing since currently
acpi_device_get_match_data() doesn't respect PRP0001.
To fix the above, try acpi_of_match_device() if there is no ACPI
table in the driver.
Anyway, note that the device is expected to get its own proper
ACPI ID.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
They are however frequently triggered by syzkaller, so remove them.
ebtables userspace should never trigger any of these, so there is little
value in making them pr_debug (or ratelimited).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The Amanda CONNECT command has been updated to establish an optional
fourth connection [0]. Previously, a CONNECT command would look like:
CONNECT DATA port0 MESG port1 INDEX port2
nf_conntrack_amanda analyses the CONNECT command string in order to
learn the port numbers of the related DATA, MESG and INDEX streams. As
of amanda v3.4, the CONNECT command can advertise an additional port:
CONNECT DATA port0 MESG port1 INDEX port2 STATE port3
The new STATE stream is not handled, thus the connection on the STATE
port cannot be established.
The patch adds support for STATE streams to the amanda conntrack helper.
I tested with max_expected = 3, leaving the other patch hunks
unmodified. Amanda reports "connection refused" and aborts. After I set
max_expected to 4, the backup completes successfully.
[0] 3b8384fc9f (diff-711e502fc81a65182c0954765b42919eR456)
Signed-off-by: Florian Tham <tham@fidion.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add .release_ops, that is called in case of error at a later stage in
the expression initialization path, ie. .select_ops() has been already
set up operations and that needs to be undone. This allows us to unwind
.select_ops from the error path, ie. release the dynamic operations for
this extension.
Moreover, allocate one single operation instead of recycling them, this
comes at the cost of consuming a bit more memory per rule, but it
simplifies the infrastructure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In case of CQHCI, mrq->cmd may be NULL for data requests (non DCMD).
In such case mmc_should_fail_request is directly dereferencing
mrq->cmd while cmd is NULL.
Fix this by checking for mrq->cmd pointer.
Fixes: 72a5af554d ("mmc: core: Add support for handling CQE requests")
Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Currently, we don't coordinate BT USB activity with our handling of the
BT out-of-band wake pin, and instead just use gpio-keys. That causes
problems because we have no way of distinguishing wake activity due to a
BT device (e.g., mouse) vs. the BT controller (e.g., re-configuring wake
mask before suspend). This can cause spurious wake events just because
we, for instance, try to reconfigure the host controller's event mask
before suspending.
We can avoid these synchronization problems by handling the BT wake pin
directly in the btusb driver -- for all activity up until BT controller
suspend(), we simply listen to normal USB activity (e.g., to know the
difference between device and host activity); once we're really ready to
suspend the host controller, there should be no more host activity, and
only *then* do we unmask the GPIO interrupt.
This is already supported by btusb; we just need to describe the wake
pin in the right node.
We list 2 compatible properties, since both PID/VID pairs show up on
Scarlet devices, and they're both essentially identical QCA6174A-based
modules.
Also note that the polarity was wrong before: Qualcomm implemented WAKE
as active high, not active low. We only got away with this because
gpio-keys always reconfigured us as bi-directional edge-triggered.
Finally, we have an external pull-up and a level-shifter on this line
(we didn't notice Qualcomm's polarity in the initial design), so we
can't do pull-down. Switch to pull-none.
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There are two USB PID/VID variations I've seen for this chip, and I want
to utilize the 'interrupts' property defined here already.
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
We may need to specify a GPIO wake pin for this device, so add a
compatible property for it.
There are at least to USB PID/VID variations of this chip: one with a
Lite-On ID and one with an Atheros ID.
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
During initialization the power-on pulse is currently sent inmediately
after the prior power-off pulse. With this initialization often fails
at boot time:
[ 15.205224] Bluetooth: hci0: setting up wcn3990
[ 17.341062] Bluetooth: hci0: command 0xfc00 tx timeout
[ 22.101453] ERROR: Bluetooth initialization failed
[ 25.337740] Bluetooth: hci0: Reading QCA version information failed (-110)
After a power-off pulse wait 10ms to give the controller time to power
off. Remove the previous short settling delay, it isn't needed anymore.
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
After sending a power on pulse the driver has a delay of 100ms
to allow the host controller to boot. Move the delay into
qca_send_power_pulse(), since it is directly related with the
power-on pulse.
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
There are only two types of power pulses 'on' or 'off', pass a boolean
instead of the power pulse 'command'.
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>