Commit Graph

826113 Commits

Author SHA1 Message Date
Masahiro Yamada
9390dff66a kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing
If include/config/auto.conf.cmd is lost for some reasons, it is not
self-healing, so the top Makefile misses to run syncconfig.
Move include/config/auto.conf.cmd to the target side.

I used a pattern rule instead of a normal rule here although it is
a bit gross.

If the rule were written with a normal rule like this,

  include/config/auto.conf \
  include/config/auto.conf.cmd \
  include/config/tristate.conf: $(KCONFIG_CONFIG)
          $(Q)$(MAKE) -f $(srctree)/Makefile syncconfig

... syncconfig would be executed per target.

Using a pattern rule makes sure that syncconfig is executed just once
because Make assumes the recipe will create all of the targets.

Here is a quote from the GNU Make manual [1]:

"Pattern rules may have more than one target. Unlike normal rules,
this does not act as many different rules with the same prerequisites
and recipe. If a pattern rule has multiple targets, make knows that
the rule's recipe is responsible for making all of the targets. The
recipe is executed only once to make all the targets. When searching
for a pattern rule to match a target, the target patterns of a rule
other than the one that matches the target in need of a rule are
incidental: make worries only about giving a recipe and prerequisites
to the file presently in question. However, when this file's recipe is
run, the other targets are marked as having been updated themselves."

[1]: https://www.gnu.org/software/make/manual/html_node/Pattern-Intro.html

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-02-27 22:22:12 +09:00
Ming Lei
bbcbbd567c block: optimize blk_bio_segment_split for single-page bvec
Introduce a fast path for single-page bvec IO, then we can avoid
to call bvec_split_segs() unnecessarily.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-27 06:18:55 -07:00
Ming Lei
48d7727cae block: optimize __blk_segment_map_sg() for single-page bvec
Introduce a fast path for single-page bvec IO, then blk_bvec_map_sg()
can be avoided.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-27 06:18:54 -07:00
Ming Lei
4d633062c1 block: introduce bvec_nth_page()
Single-page bvec can often be seen in small BS workloads, so
introduce bvec_nth_page() for avoiding to call nth_page() unnecessarily,
which looks not cheap.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-27 06:18:52 -07:00
David Sterba
7503b83d80 btrfs: move ulist allocation out of transaction in quota enable
The allocation happens with GFP_KERNEL after a transaction has been
started, this can potentially cause deadlock if reclaim tries to get the
memory by flushing filesystem data.

The fs_info::qgroup_ulist is not used during transaction start when
quotas are not enabled. The status bit BTRFS_FS_QUOTA_ENABLED is set
later in btrfs_quota_enable so it's safe to move it before the
transaction start.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-27 14:10:25 +01:00
Josef Bacik
aea6f028d0 btrfs: save drop_progress if we drop refs at all
Previously we only updated the drop_progress key if we were in the
DROP_REFERENCE stage of snapshot deletion.  This is because the
UPDATE_BACKREF stage checks the flags of the blocks it's converting to
FULL_BACKREF, so if we go over a block we processed before it doesn't
matter, we just don't do anything.

The problem is in do_walk_down() we will go ahead and drop the roots
reference to any blocks that we know we won't need to walk into.

Given subvolume A and snapshot B.  The root of B points to all of the
nodes that belong to A, so all of those nodes have a refcnt > 1.  If B
did not modify those blocks it'll hit this condition in do_walk_down

if (!wc->update_ref ||
    generation <= root->root_key.offset)
	goto skip;

and in "goto skip" we simply do a btrfs_free_extent() for that bytenr
that we point at.

Now assume we modified some data in B, and then took a snapshot of B and
call it C.  C points to all the nodes in B, making every node the root
of B points to have a refcnt > 1.  This assumes the root level is 2 or
higher.

We delete snapshot B, which does the above work in do_walk_down,
free'ing our ref for nodes we share with A that we didn't modify.  Now
we hit a node we _did_ modify, thus we own.  We need to walk down into
this node and we set wc->stage == UPDATE_BACKREF.  We walk down to level
0 which we also own because we modified data.  We can't walk any further
down and thus now need to walk up and start the next part of the
deletion.  Now walk_up_proc is supposed to put us back into
DROP_REFERENCE, but there's an exception to this

if (level < wc->shared_level)
	goto out;

we are at level == 0, and our shared_level == 1.  We skip out of this
one and go up to level 1.  Since path->slots[1] < nritems we
path->slots[1]++ and break out of walk_up_tree to stop our transaction
and loop back around.  Now in btrfs_drop_snapshot we have this snippet

if (wc->stage == DROP_REFERENCE) {
	level = wc->level;
	btrfs_node_key(path->nodes[level],
		       &root_item->drop_progress,
		       path->slots[level]);
	root_item->drop_level = level;
}

our stage == UPDATE_BACKREF still, so we don't update the drop_progress
key.  This is a problem because we would have done btrfs_free_extent()
for the nodes leading up to our current position.  If we crash or
unmount here and go to remount we'll start over where we were before and
try to free our ref for blocks we've already freed, and thus abort()
out.

Fix this by keeping track of the last place we dropped a reference for
our block in do_walk_down.  Then if wc->stage == UPDATE_BACKREF we know
we'll start over from a place we meant to, and otherwise things continue
to work as they did before.

I have a complicated reproducer for this problem, without this patch
we'll fail to fsck the fs when replaying the log writes log.  With this
patch we can replay the whole log without any fsck or mount failures.

The steps to reproduce this easily are sort of tricky, I had to add a
couple of debug patches to the kernel in order to make it easy,
basically I just needed to make sure we did actually commit the
transaction every time we finished a walk_down_tree/walk_up_tree combo.

The reproducer:

1) Creates a base subvolume.
2) Creates 100k files in the subvolume.
3) Snapshots the base subvolume (snap1).
4) Touches files 5000-6000 in snap1.
5) Snapshots snap1 (snap2).
6) Deletes snap1.

I do this with dm-log-writes, and then replay to every FUA in the log
and fsck the fs.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ copy reproducer steps ]
Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-27 14:08:47 +01:00
Josef Bacik
78c52d9eb6 btrfs: check for refs on snapshot delete resume
There's a bug in snapshot deletion where we won't update the
drop_progress key if we're in the UPDATE_BACKREF stage.  This is a
problem because we could drop refs for blocks we know don't belong to
ours.  If we crash or umount at the right time we could experience
messages such as the following when snapshot deletion resumes

 BTRFS error (device dm-3): unable to find ref byte nr 66797568 parent 0 root 258  owner 1 offset 0
 ------------[ cut here ]------------
 WARNING: CPU: 3 PID: 16052 at fs/btrfs/extent-tree.c:7108 __btrfs_free_extent.isra.78+0x62c/0xb30 [btrfs]
 CPU: 3 PID: 16052 Comm: umount Tainted: G        W  OE     5.0.0-rc4+ #147
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./890FX Deluxe5, BIOS P1.40 05/03/2011
 RIP: 0010:__btrfs_free_extent.isra.78+0x62c/0xb30 [btrfs]
 RSP: 0018:ffffc90005cd7b18 EFLAGS: 00010286
 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
 RDX: ffff88842fade680 RSI: ffff88842fad6b18 RDI: ffff88842fad6b18
 RBP: ffffc90005cd7bc8 R08: 0000000000000000 R09: 0000000000000001
 R10: 0000000000000001 R11: ffffffff822696b8 R12: 0000000003fb4000
 R13: 0000000000000001 R14: 0000000000000102 R15: ffff88819c9d67e0
 FS:  00007f08bb138fc0(0000) GS:ffff88842fac0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f8f5d861ea0 CR3: 00000003e99fe000 CR4: 00000000000006e0
 Call Trace:
 ? _raw_spin_unlock+0x27/0x40
 ? btrfs_merge_delayed_refs+0x356/0x3e0 [btrfs]
 __btrfs_run_delayed_refs+0x75a/0x13c0 [btrfs]
 ? join_transaction+0x2b/0x460 [btrfs]
 btrfs_run_delayed_refs+0xf3/0x1c0 [btrfs]
 btrfs_commit_transaction+0x52/0xa50 [btrfs]
 ? start_transaction+0xa6/0x510 [btrfs]
 btrfs_sync_fs+0x79/0x1c0 [btrfs]
 sync_filesystem+0x70/0x90
 generic_shutdown_super+0x27/0x120
 kill_anon_super+0x12/0x30
 btrfs_kill_super+0x16/0xa0 [btrfs]
 deactivate_locked_super+0x43/0x70
 deactivate_super+0x40/0x60
 cleanup_mnt+0x3f/0x80
 __cleanup_mnt+0x12/0x20
 task_work_run+0x8b/0xc0
 exit_to_usermode_loop+0xce/0xd0
 do_syscall_64+0x20b/0x210
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

To fix this simply mark dead roots we read from disk as DEAD and then
set the walk_control->restarted flag so we know we have a restarted
deletion.  From here whenever we try to drop refs for blocks we check to
verify our ref is set on them, and if it is not we skip it.  Once we
find a ref that is set we unset walk_control->restarted since the tree
should be in a normal state from then on, and any problems we run into
from there are different issues.  I tested this with an existing broken
fs and my reproducer that creates a broken fs and it fixed both file
systems.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-27 14:08:47 +01:00
Masahiro Yamada
6b12de69ad kbuild: simplify single target rules
The dependency will be checked anyway after Kbuild descends into a
sub-directory. Skip object/source dependency checks in top Makefile.

VPATH can be simpler since the top Makefile no longer checks the
presence of the source file, which is located in in the external
module directory.

One good thing is, it can compile an object from a generated source
file.

  $ make crypto/rsapubkey.asn1.o
    ...
  ASN.1   crypto/rsapubkey.asn1.c
  CC      crypto/rsapubkey.asn1.o

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-02-27 21:43:31 +09:00
Masahiro Yamada
b999923c29 kbuild: remove empty rules for makefiles
The previous commit made 'MAKEFLAGS += -rR' effective in the top
Makefile regardless of O= option, GNU Make versions.

The top Makefile does not need to cancel implicit rules for makefiles.

There is still one place where an empty rule is useful. Since -rR is
effective only after sub-make, GNU Make would try implicit rules to
update the top Makefile. Although it is not a big overhead, cancel it
just in case.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-02-27 21:43:31 +09:00
Masahiro Yamada
3812b8c5c5 kbuild: make -r/-R effective in top Makefile for old Make versions
Adding -rR to MAKEFLAGS is important because we do not want to
be bothered by built-in implicit rules or variables.

One problem that used to exist in older GNU Make versions is

  MAKEFLAGS += -rR

... does not become effective in the current Makefile. When you are
building with O= option, it becomes effective in the top Makefile
since it recurses via 'sub-make' target. Otherwise, the top Makefile
tries implicit rules. That is why we explicitly add empty rules for
Makefiles, but we often miss to do that.

In fact, adding -d option to older GNU Make versions shows it is
trying a bunch of implicit pattern rules.

 Considering target file `scripts/Makefile.kcov'.
  Looking for an implicit rule for `scripts/Makefile.kcov'.
  Trying pattern rule with stem `Makefile.kcov'.
  Trying implicit prerequisite `scripts/Makefile.kcov.o'.
  Trying pattern rule with stem `Makefile.kcov'.
  Trying implicit prerequisite `scripts/Makefile.kcov.c'.
  Trying pattern rule with stem `Makefile.kcov'.
  Trying implicit prerequisite `scripts/Makefile.kcov.cc'.
  Trying pattern rule with stem `Makefile.kcov'.
  Trying implicit prerequisite `scripts/Makefile.kcov.C'.
  ...

This issue was fixed by GNU Make commit 58dae243526b ("[Savannah #20501]
Handle adding -r/-R to MAKEFLAGS in the makefile"). So, it is no longer
a problem if you use GNU Make 4.0 or later. However, older versions are
still widely used.

So, I decided to patch the kernel Makefile to invoke sub-make regardless
of O= option. This will allow further cleanups.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-02-27 21:43:31 +09:00
Masahiro Yamada
f47a23ce2b kbuild: move tools_silent to a more relevant place
This would disturb the change the sub-make part. Move it near the
tools/ target.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-02-27 21:43:30 +09:00
Masahiro Yamada
b303c6df80 kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig
Since -Wmaybe-uninitialized was introduced by GCC 4.7, we have patched
various false positives:

 - commit e74fc973b6 ("Turn off -Wmaybe-uninitialized when building
   with -Os") turned off this option for -Os.

 - commit 815eb71e71 ("Kbuild: disable 'maybe-uninitialized' warning
   for CONFIG_PROFILE_ALL_BRANCHES") turned off this option for
   CONFIG_PROFILE_ALL_BRANCHES

 - commit a76bcf557e ("Kbuild: enable -Wmaybe-uninitialized warning
   for "make W=1"") turned off this option for GCC < 4.9
   Arnd provided more explanation in https://lkml.org/lkml/2017/3/14/903

I think this looks better by shifting the logic from Makefile to Kconfig.

Link: https://github.com/ClangBuiltLinux/linux/issues/350
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
2019-02-27 21:43:20 +09:00
Masahiro Yamada
bd55f96fa9 kbuild: refactor cc-cross-prefix implementation
- $(word 1, <text>) is equivalent to $(firstword <text>)

 - hardcode "gcc" instead of $(CC)

 - minimize the shell script part

A little more notes in case $(filter-out -%, ...) is not clear.

arch/mips/Makefile passes prefixes depending on the configuration.

CROSS_COMPILE := $(call cc-cross-prefix, $(tool-archpref)-linux- \
    $(tool-archpref)-linux-gnu- $(tool-archpref)-unknown-linux-gnu-)

In the Kconfig stage (e.g. when you run 'make defconfig'), neither
CONFIG_32BIT nor CONFIG_64BIT is defined. So, $(tool-archpref) is
empty. As a result, "-linux -linux-gnu- -unknown-linux-gnu" is passed
into cc-cross-prefix. The command 'which' assumes arguments starting
with a hyphen as command options, then emits the following messages:

  Illegal option -l
  Illegal option -l
  Illegal option -u

I think it is strange to define CROSS_COMPILE depending on the CONFIG
options since you need to feed $(CC) to Kconfig, but it is how MIPS
Makefile currently works. Anyway, it would not hurt to filter-out
invalid strings beforehand.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-02-27 21:41:27 +09:00
Masahiro Yamada
88110713ca kbuild: hardcode genksyms path and remove GENKSYMS variable
The genksyms source was integrated into the kernel tree in 2003.

I do not expect anybody still using the external /sbin/genksyms.
Kbuild does not need to provide the ability to override GENKSYMS.

Let's remove the GENKSYMS variable, and use the hardcoded path.

Since it occurred in the pre-git era, I attached the commit message
in case somebody is interested in the historical background.

  | Author: Kai Germaschewski <kai@tp1.ruhr-uni-bochum.de>
  | Date:   Wed Feb 19 04:17:28 2003 -0600
  |
  | kbuild: [PATCH] put genksyms in scripts dir
  |
  | This puts genksyms into scripts/genksyms/.
  |
  | genksyms used to be maintained externally, though the only possible user
  | was the kernel build. Moving it into the kernel sources makes it easier to
  | keep it uptodate, like for example updating it to generate linker scripts
  | directly instead of postprocessing the generated header file fragments
  | with sed, as we do currently.
  |
  | Also, genksyms does not handle __typeof__, which needs to be fixed since
  | some of the exported symbol in the kernel are defined using __typeof__.
  |
  | (Rusty Russell/me)

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-02-27 21:41:26 +09:00
Masahiro Yamada
b513adf45c scripts/gdb: refactor rules for symlink creation
gdb-scripts is not a real object, but (ab)used like a phony target.

Rewrite the code in a more Kbuild-ish way. Add symlinks to extra-y
and use if_changed.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
2019-02-27 21:41:04 +09:00
Masahiro Yamada
8d2e52003a kbuild: create symlink to vmlinux-gdb.py in scripts_gdb target
It is weird to create gdb stuff as a side-effect of vmlinux.

Move it to a more relevant place.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
2019-02-27 21:40:54 +09:00
Masahiro Yamada
1e5ff84ffe scripts/gdb: do not descend into scripts/gdb from scripts
Currently, Kbuild descends from scripts/Makefile to scripts/gdb/Makefile
just for creating symbolic links, but it does not need to do it so early.

Merge the two descending paths to simplify the code.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
2019-02-27 21:40:09 +09:00
Masahiro Yamada
01d509a48b kbuild: remove unimportant comments from ./Kbuild
Every time we add/remove a target, we need to touch the header part,
including renumbering. This is not so important information.

Numbering targets is rather misleading because they are not necessarily
generated in this order. For example, 1) and 2) can be executed
simultaneously when the -j option is given.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
2019-02-27 21:40:01 +09:00
Masahiro Yamada
67274c0834 scripts/gdb: delay generation of gdb constants.py
scripts/gdb/linux/constants.py is never used in the kernel build
process. There is no good reason to create it so early.

Get it out of the 'prepare' stage.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
2019-02-27 21:39:48 +09:00
Martin Schwidefsky
7b660c225f Further fixes in TIC handling.
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEw9DWbcNiT/aowBjO3s9rk8bwL68FAlx2a7ESHGNvaHVja0By
 ZWRoYXQuY29tAAoJEN7Pa5PG8C+vwCYP/0ZTYTJdWhX3P72Bu2lBZyfrb6fxZYB5
 TNFiPFYLDSsCaFhrRVTkz0LEV54DqGuENsmhGPFX0HqzyL9V2RV/5ZAqvCGMIXv1
 AxYzTuAL8iihEAzgz7MhO2k2bBVmSWuyFIeBnhhKZYnxUV+ag06wizVghrMKjrmf
 p8jgZnXYmk0YeYfahzVJqqZ4J8ZCWugOxYMPCz8zcdW7NScyBS6yLm0eFi8zQJsi
 APwc8XdM7nMBeM3uf17J3XXdJbjOyZDCuKMKw9M0fCZ2sn3Y8qbujFcbJmBHjH3a
 UfB3wEh1RhvuJm5CjBSwXfhvAhZOra0Vlz+vBdnpcdjuCiU3vCoOIW6+NV6l76N0
 CsaOfSo/EvmhXgzFBEfzfkBuapnzcri2PjcFtEKvuY0DsT/K4bzLXcWooBXLvd6n
 iy9UNqvEaVR19Djh0ds1y4jsTAWdn8SJr9iu++jhovO5U7X/4BcYtIKqS0TgKvJB
 5xoOXHjySlM1K4t78pGaxnwEnq6U9FB46znzUuf6+sRf6mENiNFkK5us5LiqwukA
 rTug/VZnxyfkuil+D2bfuGvbI7eIWGMU/NCPynpnx6fTIfyEGVslt6Wv6Ugvygo3
 bWLgC9oMtdei9H3RzAknqV0I5XOyZUur+WFctuDxjUtc10uEOTHfC3KS6EzvwLEr
 Mf96sfls8aRy
 =6hqn
 -----END PGP SIGNATURE-----

Merge tag 'vfio-ccw-20190227' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/vfio-ccw into features

Pull vfio-ccw from Cornelia Huck with the following changes:
 - Further fixes in TIC handling.
2019-02-27 13:19:33 +01:00
Christophe Leroy
27da80719e powerpc/fsl: Fix the flush of branch predictor.
The commit identified below adds MC_BTB_FLUSH macro only when
CONFIG_PPC_FSL_BOOK3E is defined. This results in the following error
on some configs (seen several times with kisskb randconfig_defconfig)

arch/powerpc/kernel/exceptions-64e.S:576: Error: Unrecognized opcode: `mc_btb_flush'
make[3]: *** [scripts/Makefile.build:367: arch/powerpc/kernel/exceptions-64e.o] Error 1
make[2]: *** [scripts/Makefile.build:492: arch/powerpc/kernel] Error 2
make[1]: *** [Makefile:1043: arch/powerpc] Error 2
make: *** [Makefile:152: sub-make] Error 2

This patch adds a blank definition of MC_BTB_FLUSH for other cases.

Fixes: 10c5e83afd ("powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)")
Cc: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-27 22:52:38 +11:00
Filipe Manana
4ea748e1d2 Btrfs: fix deadlock between clone/dedupe and rename
Reflinking (clone/dedupe) and rename are operations that operate on two
inodes and therefore need to lock them in the same order to avoid ABBA
deadlocks. It happens that Btrfs' reflink implementation always locked
them in a different order from VFS's lock_two_nondirectories() helper,
which is used by the rename code in VFS, resulting in ABBA type deadlocks.

Btrfs' locking order:

  static void btrfs_double_inode_lock(struct inode *inode1, struct inode *inode2)
  {
         if (inode1 < inode2)
                swap(inode1, inode2);

         inode_lock_nested(inode1, I_MUTEX_PARENT);
         inode_lock_nested(inode2, I_MUTEX_CHILD);
  }

VFS's locking order:

  void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
  {
        if (inode1 > inode2)
                swap(inode1, inode2);

        if (inode1 && !S_ISDIR(inode1->i_mode))
                inode_lock(inode1);
        if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
                inode_lock_nested(inode2, I_MUTEX_NONDIR2);
}

Fix this by killing the btrfs helper function that does the double inode
locking and replace it with VFS's helper lock_two_nondirectories().

Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Fixes: 416161db9b ("btrfs: offline dedupe")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-27 12:24:16 +01:00
Filipe Manana
8e92821878 Btrfs: fix corruption reading shared and compressed extents after hole punching
In the past we had data corruption when reading compressed extents that
are shared within the same file and they are consecutive, this got fixed
by commit 005efedf2c ("Btrfs: fix read corruption of compressed and
shared extents") and by commit 808f80b467 ("Btrfs: update fix for read
corruption of compressed and shared extents"). However there was a case
that was missing in those fixes, which is when the shared and compressed
extents are referenced with a non-zero offset. The following shell script
creates a reproducer for this issue:

  #!/bin/bash

  mkfs.btrfs -f /dev/sdc &> /dev/null
  mount -o compress /dev/sdc /mnt/sdc

  # Create a file with 3 consecutive compressed extents, each has an
  # uncompressed size of 128Kb and a compressed size of 4Kb.
  for ((i = 1; i <= 3; i++)); do
      head -c 4096 /dev/zero
      for ((j = 1; j <= 31; j++)); do
          head -c 4096 /dev/zero | tr '\0' "\377"
      done
  done > /mnt/sdc/foobar
  sync

  echo "Digest after file creation:   $(md5sum /mnt/sdc/foobar)"

  # Clone the first extent into offsets 128K and 256K.
  xfs_io -c "reflink /mnt/sdc/foobar 0 128K 128K" /mnt/sdc/foobar
  xfs_io -c "reflink /mnt/sdc/foobar 0 256K 128K" /mnt/sdc/foobar
  sync

  echo "Digest after cloning:         $(md5sum /mnt/sdc/foobar)"

  # Punch holes into the regions that are already full of zeroes.
  xfs_io -c "fpunch 0 4K" /mnt/sdc/foobar
  xfs_io -c "fpunch 128K 4K" /mnt/sdc/foobar
  xfs_io -c "fpunch 256K 4K" /mnt/sdc/foobar
  sync

  echo "Digest after hole punching:   $(md5sum /mnt/sdc/foobar)"

  echo "Dropping page cache..."
  sysctl -q vm.drop_caches=1
  echo "Digest after hole punching:   $(md5sum /mnt/sdc/foobar)"

  umount /dev/sdc

When running the script we get the following output:

  Digest after file creation:   5a0888d80d7ab1fd31c229f83a3bbcc8  /mnt/sdc/foobar
  linked 131072/131072 bytes at offset 131072
  128 KiB, 1 ops; 0.0033 sec (36.960 MiB/sec and 295.6830 ops/sec)
  linked 131072/131072 bytes at offset 262144
  128 KiB, 1 ops; 0.0015 sec (78.567 MiB/sec and 628.5355 ops/sec)
  Digest after cloning:         5a0888d80d7ab1fd31c229f83a3bbcc8  /mnt/sdc/foobar
  Digest after hole punching:   5a0888d80d7ab1fd31c229f83a3bbcc8  /mnt/sdc/foobar
  Dropping page cache...
  Digest after hole punching:   fba694ae8664ed0c2e9ff8937e7f1484  /mnt/sdc/foobar

This happens because after reading all the pages of the extent in the
range from 128K to 256K for example, we read the hole at offset 256K
and then when reading the page at offset 260K we don't submit the
existing bio, which is responsible for filling all the page in the
range 128K to 256K only, therefore adding the pages from range 260K
to 384K to the existing bio and submitting it after iterating over the
entire range. Once the bio completes, the uncompressed data fills only
the pages in the range 128K to 256K because there's no more data read
from disk, leaving the pages in the range 260K to 384K unfilled. It is
just a slightly different variant of what was solved by commit
005efedf2c ("Btrfs: fix read corruption of compressed and shared
extents").

Fix this by forcing a bio submit, during readpages(), whenever we find a
compressed extent map for a page that is different from the extent map
for the previous page or has a different starting offset (in case it's
the same compressed extent), instead of the extent map's original start
offset.

A test case for fstests follows soon.

Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Fixes: 808f80b467 ("Btrfs: update fix for read corruption of compressed and shared extents")
Fixes: 005efedf2c ("Btrfs: fix read corruption of compressed and shared extents")
Cc: stable@vger.kernel.org # 4.3+
Tested-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-27 12:24:07 +01:00
Jordan Niethe
7b62f9bd22 powerpc/powernv: Make opal log only readable by root
Currently the opal log is globally readable. It is kernel policy to
limit the visibility of physical addresses / kernel pointers to root.
Given this and the fact the opal log may contain this information it
would be better to limit the readability to root.

Fixes: bfc36894a4 ("powerpc/powernv: Add OPAL message log interface")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Stewart Smith <stewart@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-27 22:11:31 +11:00
Pablo Neira Ayuso
123f89c8aa netfilter: nft_set_hash: remove nft_hash_key()
hashtable is never used for 2-byte keys, remove nft_hash_key().

Fixes: e240cd0df4 ("netfilter: nf_tables: place all set backends in one single module")
Reported-by: Florian Westphal <fw@strlen.de>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 11:08:32 +01:00
Pablo Neira Ayuso
a01cbae57e netfilter: nft_set_hash: bogus element self comparison from deactivation path
Use the element from the loop iteration, not the same element we want to
deactivate otherwise this branch always evaluates true.

Fixes: 6c03ae210c ("netfilter: nft_set_hash: add non-resizable hashtable implementation")
Reported-by: Florian Westphal <fw@strlen.de>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 11:08:31 +01:00
Pablo Neira Ayuso
3b02b0adc2 netfilter: nft_set_hash: fix lookups with fixed size hash on big endian
Call jhash_1word() for the 4-bytes key case from the insertion and
deactivation path, otherwise big endian arch set lookups fail.

Fixes: 446a8268b7 ("netfilter: nft_set_hash: add lookup variant for fixed size hashtable")
Reported-by: Florian Westphal <fw@strlen.de>
Tested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 11:08:31 +01:00
Li RongQing
35acfbab6e netfilter: remove unneeded switch fall-through
Empty case is fine and does not switch fall-through

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 11:03:59 +01:00
Florian Westphal
cc16921351 netfilter: conntrack: avoid same-timeout update
No need to dirty a cache line if timeout is unchanged.
Also, WARN() is useless here: we crash on 'skb->len' access
if skb is NULL.

Last, ct->timeout is u32, not 'unsigned long' so adapt the
function prototype accordingly.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 10:58:21 +01:00
Florian Westphal
d2c5c103b1 netfilter: nat: remove nf_nat_l3proto.h and nf_nat_core.h
The l3proto name is gone, its header file is the last trace.
While at it, also remove nf_nat_core.h, its very small and all users
include nf_nat.h too.

before:
   text    data     bss     dec     hex filename
  22948    1612    4136   28696    7018 nf_nat.ko

after removal of l3proto register/unregister functions:
   text	   data	    bss	    dec	    hex	filename
  22196	   1516	   4136	  27848	   6cc8 nf_nat.ko

checkpatch complains about overly long lines, but line breaks
do not make things more readable and the line length gets smaller
here, not larger.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 10:54:08 +01:00
Florian Westphal
d6c4c8ffb5 netfilter: nat: remove l3proto struct
All l3proto function pointers have been removed.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 10:53:57 +01:00
Florian Westphal
dac3fe7259 netfilter: nat: remove csum_recalc hook
We can now use direct calls.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 10:53:47 +01:00
Florian Westphal
03fe5efc4c netfilter: nat: remove csum_update hook
We can now use direct calls.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 10:53:35 +01:00
Florian Westphal
2e666b229d netfilter: nat: remove l3 manip_pkt hook
We can now use direct calls.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 10:53:05 +01:00
Florian Westphal
14cb1a6e29 netfilter: nat: remove nf_nat_l4proto.h
after ipv4/6 nat tracker merge, there are no external callers, so
make last function static and remove the header.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 10:52:47 +01:00
Florian Westphal
3bf195ae60 netfilter: nat: merge nf_nat_ipv4,6 into nat core
before:
   text    data     bss     dec     hex filename
  16566    1576    4136   22278    5706 nf_nat.ko
   3598	    844	      0	   4442	   115a	nf_nat_ipv6.ko
   3187	    844	      0	   4031	    fbf	nf_nat_ipv4.ko

after:
   text    data     bss     dec     hex filename
  22948    1612    4136   28696    7018 nf_nat.ko

... with ipv4/v6 nat now provided directly via nf_nat.ko.

Also changes:
       ret = nf_nat_ipv4_fn(priv, skb, state);
       if (ret != NF_DROP && ret != NF_STOLEN &&
into
	if (ret != NF_ACCEPT)
		return ret;

everywhere.

The nat hooks never should return anything other than
ACCEPT or DROP (and the latter only in rare error cases).

The original code uses multi-line ANDing including assignment-in-if:
        if (ret != NF_DROP && ret != NF_STOLEN &&
           !(IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) &&
            (ct = nf_ct_get(skb, &ctinfo)) != NULL) {

I removed this while moving, breaking those in separate conditionals
and moving the assignments into extra lines.

checkpatch still generates some warnings:
 1. Overly long lines (of moved code).
    Breaking them is even more ugly. so I kept this as-is.
 2. use of extern function declarations in a .c file.
    This is necessary evil, we must call
    nf_nat_l3proto_register() from the nat core now.
    All l3proto related functions are removed later in this series,
    those prototypes are then removed as well.

v2: keep empty nf_nat_ipv6_csum_update stub for CONFIG_IPV6=n case.
v3: remove IS_ENABLED(NF_NAT_IPV4/6) tests, NF_NAT_IPVx toggles
    are removed here.
v4: also get rid of the assignments in conditionals.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 10:49:55 +01:00
Florian Westphal
096d09067a netfilter: nat: move nlattr parse and xfrm session decode to core
None of these functions calls any external functions, moving them allows
to avoid both the indirection and a need to export these symbols.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 10:49:42 +01:00
Florian Westphal
d1aca8ab31 netfilter: nat: merge ipv4 and ipv6 masquerade functionality
Before:
   text	   data	    bss	    dec	    hex	filename
  13916	   1412	   4128	  19456	   4c00	nf_nat.ko
   4510	    968	      4	   5482	   156a	nf_nat_ipv4.ko
   5146	    944	      8	   6098	   17d2	nf_nat_ipv6.ko

After:
   text	   data	    bss	    dec	    hex	filename
  16566	   1576	   4136	  22278	   5706	nf_nat.ko
   3187	    844	      0	   4031	    fbf	nf_nat_ipv4.ko
   3598	    844	      0	   4442	   115a	nf_nat_ipv6.ko

... so no drastic changes in combined size.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 10:49:24 +01:00
Andy Shevchenko
886ca88be6 ACPI / bus: Respect PRP0001 when retrieving device match data
In the PRP0001 case, the compatible string may have additional data
affiliated with the device.  When we call device_get_match_data() on
such device, we will get nothing since currently
acpi_device_get_match_data() doesn't respect PRP0001.

To fix the above, try acpi_of_match_device() if there is no ACPI
table in the driver.

Anyway, note that the device is expected to get its own proper
ACPI ID.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-02-27 10:47:59 +01:00
Florian Westphal
d824548dae netfilter: ebtables: remove BUGPRINT messages
They are however frequently triggered by syzkaller, so remove them.

ebtables userspace should never trigger any of these, so there is little
value in making them pr_debug (or ratelimited).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 10:47:57 +01:00
Florian Tham
4283428e49 netfilter: nf_conntrack_amanda: add support for STATE streams
The Amanda CONNECT command has been updated to establish an optional
fourth connection [0]. Previously, a CONNECT command would look like:

    CONNECT DATA port0 MESG port1 INDEX port2

nf_conntrack_amanda analyses the CONNECT command string in order to
learn the port numbers of the related DATA, MESG and INDEX streams. As
of amanda v3.4, the CONNECT command can advertise an additional port:

    CONNECT DATA port0 MESG port1 INDEX port2 STATE port3

The new STATE stream is not handled, thus the connection on the STATE
port cannot be established.

The patch adds support for STATE streams to the amanda conntrack helper.

I tested with max_expected = 3, leaving the other patch hunks
unmodified. Amanda reports "connection refused" and aborts. After I set
max_expected to 4, the backup completes successfully.

[0] 3b8384fc9f (diff-711e502fc81a65182c0954765b42919eR456)

Signed-off-by: Florian Tham <tham@fidion.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 10:46:39 +01:00
Pablo Neira Ayuso
b8e2040063 netfilter: nft_compat: use .release_ops and remove list of extension
Add .release_ops, that is called in case of error at a later stage in
the expression initialization path, ie. .select_ops() has been already
set up operations and that needs to be undone. This allows us to unwind
.select_ops from the error path, ie. release the dynamic operations for
this extension.

Moreover, allocate one single operation instead of recycling them, this
comes at the cost of consuming a bit more memory per rule, but it
simplifies the infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27 10:41:24 +01:00
Ritesh Harjani
e5723f95d6 mmc: core: Fix NULL ptr crash from mmc_should_fail_request
In case of CQHCI, mrq->cmd may be NULL for data requests (non DCMD).
In such case mmc_should_fail_request is directly dereferencing
mrq->cmd while cmd is NULL.
Fix this by checking for mrq->cmd pointer.

Fixes: 72a5af554d ("mmc: core: Add support for handling CQE requests")
Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2019-02-27 10:00:17 +01:00
Brian Norris
5364a0b4f4 arm64: dts: rockchip: move QCA6174A wakeup pin into its USB node
Currently, we don't coordinate BT USB activity with our handling of the
BT out-of-band wake pin, and instead just use gpio-keys. That causes
problems because we have no way of distinguishing wake activity due to a
BT device (e.g., mouse) vs. the BT controller (e.g., re-configuring wake
mask before suspend). This can cause spurious wake events just because
we, for instance, try to reconfigure the host controller's event mask
before suspending.

We can avoid these synchronization problems by handling the BT wake pin
directly in the btusb driver -- for all activity up until BT controller
suspend(), we simply listen to normal USB activity (e.g., to know the
difference between device and host activity); once we're really ready to
suspend the host controller, there should be no more host activity, and
only *then* do we unmask the GPIO interrupt.

This is already supported by btusb; we just need to describe the wake
pin in the right node.

We list 2 compatible properties, since both PID/VID pairs show up on
Scarlet devices, and they're both essentially identical QCA6174A-based
modules.

Also note that the polarity was wrong before: Qualcomm implemented WAKE
as active high, not active low. We only got away with this because
gpio-keys always reconfigured us as bi-directional edge-triggered.

Finally, we have an external pull-up and a level-shifter on this line
(we didn't notice Qualcomm's polarity in the initial design), so we
can't do pull-down. Switch to pull-none.

Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-02-27 08:50:15 +01:00
Brian Norris
7d19261bc0 dt-bindings: net: btusb: add QCA6174A IDs
There are two USB PID/VID variations I've seen for this chip, and I want
to utilize the 'interrupts' property defined here already.

Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-02-27 08:50:15 +01:00
Brian Norris
4c409af04d Bluetooth: btusb: add QCA6174A compatible properties
We may need to specify a GPIO wake pin for this device, so add a
compatible property for it.

There are at least to USB PID/VID variations of this chip: one with a
Lite-On ID and one with an Atheros ID.

Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-02-27 08:50:14 +01:00
Matthias Kaehlcke
6d10cd5cbd Bluetooth: hci_qca: Use msleep() instead of open coding it
Call msleep() in qca_set_baudrate() instead of reimplementing it.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-02-27 08:47:39 +01:00
Matthias Kaehlcke
0ebcddd8e0 Bluetooth: hci_qca: Add delay after power-off pulse
During initialization the power-on pulse is currently sent inmediately
after the prior power-off pulse. With this initialization often fails
at boot time:

[   15.205224] Bluetooth: hci0: setting up wcn3990
[   17.341062] Bluetooth: hci0: command 0xfc00 tx timeout
[   22.101453] ERROR: Bluetooth initialization failed
[   25.337740] Bluetooth: hci0: Reading QCA version information failed (-110)

After a power-off pulse wait 10ms to give the controller time to power
off. Remove the previous short settling delay, it isn't needed anymore.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-02-27 08:44:33 +01:00
Matthias Kaehlcke
ad571d725c Bluetooth: hci_qca: Move boot delay to qca_send_power_pulse()
After sending a power on pulse the driver has a delay of 100ms
to allow the host controller to boot. Move the delay into
qca_send_power_pulse(), since it is directly related with the
power-on pulse.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-02-27 08:44:32 +01:00
Matthias Kaehlcke
9836b80208 Bluetooth: hci_qca: Pass boolean 'on/off' to qca_send_power_pulse()
There are only two types of power pulses 'on' or 'off', pass a boolean
instead of the power pulse 'command'.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-02-27 08:44:32 +01:00