Commit Graph

724454 Commits

Author SHA1 Message Date
Vasily Averin
3a2b19d1ee lockd: lost rollback of set_grace_period() in lockd_down_net()
Commit efda760fe9 ("lockd: fix lockd shutdown race") is incorrect,
it removes lockd_manager and disarm grace_period_end for init_net only.

If nfsd was started from another net namespace lockd_up_net() calls
set_grace_period() that adds lockd_manager into per-netns list
and queues grace_period_end delayed work.

These action should be reverted in lockd_down_net().
Otherwise it can lead to double list_add on after restart nfsd in netns,
and to use-after-free if non-disarmed delayed work will be executed after netns destroy.

Fixes: efda760fe9 ("lockd: fix lockd shutdown race")
Cc: stable@vger.kernel.org
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:11 -05:00
Vasily Averin
a3152f1440 lockd: added cleanup checks in exit_net hook
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Vasily Averin
b872285751 grace: replace BUG_ON by WARN_ONCE in exit_net hook
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Andrew Elble
4f34bd0540 nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class
The use of the st_mutex has been confusing the validator. Use the
proper nested notation so as to not produce warnings.

Signed-off-by: Andrew Elble <aweits@rit.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Vasily Averin
e919b07652 lockd: remove net pointer from messages
Publishing of net pointer is not safe,
use net->ns.inum as net ID in debug messages

[  171.757678] lockd_up_net: per-net data created; net=f00001e7
[  171.767188] NFSD: starting 90-second grace period (net f00001e7)
[  300.653313] lockd: nuking all hosts in net f00001e7...
[  300.653641] lockd: host garbage collection for net f00001e7
[  300.653968] lockd: nlmsvc_mark_resources for net f00001e7
[  300.711483] lockd_down_net: per-net data destroyed; net=f00001e7
[  300.711847] lockd: nuking all hosts in net 0...
[  300.711847] lockd: host garbage collection for net 0
[  300.711848] lockd: nlmsvc_mark_resources for net 0

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Vasily Averin
ba589528d6 nfsd: remove net pointer from debug messages
Publishing of net pointer is not safe,
replace it in debug meesages by net->ns.inum

[  119.989161] nfsd: initializing export module (net: f00001e7).
[  171.767188] NFSD: starting 90-second grace period (net f00001e7)
[  322.185240] nfsd: shutting down export module (net: f00001e7).
[  322.186062] nfsd: export shutdown complete (net: f00001e7).

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Trond Myklebust
03da3169c6 nfsd: Fix races with check_stateid_generation()
The various functions that call check_stateid_generation() in order
to compare a client-supplied stateid with the nfs4_stid state, usually
need to atomically check for closed state. Those that perform the
check after locking the st_mutex using nfsd4_lock_ol_stateid()
should now be OK, but we do want to fix up the others.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Trond Myklebust
9271d7e509 nfsd: Ensure we check stateid validity in the seqid operation checks
After taking the stateid st_mutex, we want to know that the stateid
still represents valid state before performing any non-idempotent
actions.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Trond Myklebust
beeca19cf1 nfsd: Fix race in lock stateid creation
If we're looking up a new lock state, and the creation fails, then
we want to unhash it, just like we do for OPEN. However in order
to do so, we need to that no other LOCK requests can grab the
mutex until we have unhashed it (and marked it as closed).

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Trond Myklebust
fd1fd685b3 nfsd4: move find_lock_stateid
Trivial cleanup to simplify following patch.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Trond Myklebust
659aefb68e nfsd: Ensure we don't recognise lock stateids after freeing them
In order to deal with lookup races, nfsd4_free_lock_stateid() needs
to be able to signal to other stateful functions that the lock stateid
is no longer valid. Right now, nfsd_lock() will check whether or not an
existing stateid is still hashed, but only in the "new lock" path.

To ensure the stateid invalidation is also recognised by the "existing lock"
path, and also by a second call to nfsd4_free_lock_stateid() itself, we can
change the type to NFS4_CLOSED_STID under the stp->st_mutex.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Trond Myklebust
fb500a7cfe nfsd: CLOSE SHOULD return the invalid special stateid for NFSv4.x (x>0)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Trond Myklebust
d8a1a00055 nfsd: Fix another OPEN stateid race
If nfsd4_process_open2() is initialising a new stateid, and yet the
call to nfs4_get_vfs_file() fails for some reason, then we must
declare the stateid closed, and unhash it before dropping the mutex.

Right now, we unhash the stateid after dropping the mutex, and without
changing the stateid type, meaning that another OPEN could theoretically
look it up and attempt to use it.

Reported-by: Andrew W Elble <aweits@rit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Trond Myklebust
15ca08d329 nfsd: Fix stateid races between OPEN and CLOSE
Open file stateids can linger on the nfs4_file list of stateids even
after they have been closed. In order to avoid reusing such a
stateid, and confusing the client, we need to recheck the
nfs4_stid's type after taking the mutex.
Otherwise, we risk reusing an old stateid that was already closed,
which will confuse clients that expect new stateids to conform to
RFC7530 Sections 9.1.4.2 and 16.2.5 or RFC5661 Sections 8.2.2 and 18.2.4.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-11-27 16:45:10 -05:00
Jakub Kicinski
a39e17b2d8 bpf: offload: add a license header
I forgot to add a license on kernel/bpf/offload.c.  Luckily I'm
still the only author so make it explicitly GPLv2.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-27 22:24:51 +01:00
Linus Torvalds
1751e8a6cb Rename superblock flags (MS_xyz -> SB_xyz)
This is a pure automated search-and-replace of the internal kernel
superblock flags.

The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.

Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.

The script to do this was:

    # places to look in; re security/*: it generally should *not* be
    # touched (that stuff parses mount(2) arguments directly), but
    # there are two places where we really deal with superblock flags.
    FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
            include/linux/fs.h include/uapi/linux/bfs_fs.h \
            security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
    # the list of MS_... constants
    SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
          DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
          POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
          I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
          ACTIVE NOUSER"

    SED_PROG=
    for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done

    # we want files that contain at least one of MS_...,
    # with fs/namespace.c and fs/pnode.c excluded.
    L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')

    for f in $L; do sed -i $f $SED_PROG; done

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27 13:05:09 -08:00
Nicolas Pitre
abee210500 percpu: hack to let the CRIS architecture to boot until they clean up
Commit 438a506180 ("percpu: don't forget to free the temporary struct
pcpu_alloc_info") uncovered a problem on the CRIS architecture where
the bootmem allocator is initialized with virtual addresses. Given it
has:

    #define __va(x) ((void *)((unsigned long)(x) | 0x80000000))

then things just work out because the end result is the same whether you
give this a physical or a virtual address.

Untill you call memblock_free_early(__pa(address)) that is, because
values from __pa() don't match with the virtual addresses stuffed in the
bootmem allocator anymore.

Avoid freeing the temporary pcpu_alloc_info memory on that architecture
until they fix things up to let the kernel boot like it did before.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 438a506180 ("percpu: don't forget to free the temporary struct pcpu_alloc_info")
2017-11-27 12:53:12 -08:00
Thomas Meyer
141cbfba1d auxdisplay: img-ascii-lcd: Only build on archs that have IOMEM
This avoids the MODPOST error:

  ERROR: "devm_ioremap_resource" [drivers/auxdisplay/img-ascii-lcd.ko] undefined!

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27 12:36:45 -08:00
Kirill A. Shutemov
152e93af3c mm, thp: Do not make pmd/pud dirty without a reason
Currently we make page table entries dirty all the time regardless of
access type and don't even consider if the mapping is write-protected.
The reasoning is that we don't really need dirty tracking on THP and
making the entry dirty upfront may save some time on first write to the
page.

Unfortunately, such approach may result in false-positive
can_follow_write_pmd() for huge zero page or read-only shmem file.

Let's only make page dirty only if we about to write to the page anyway
(as we do for small pages).

I've restructured the code to make entry dirty inside
maybe_p[mu]d_mkwrite(). It also takes into account if the vma is
write-protected.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27 12:26:29 -08:00
Kirill A. Shutemov
a8f9736645 mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
Currently, we unconditionally make page table dirty in touch_pmd().
It may result in false-positive can_follow_write_pmd().

We may avoid the situation, if we would only make the page table entry
dirty if caller asks for write access -- FOLL_WRITE.

The patch also changes touch_pud() in the same way.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27 12:26:29 -08:00
Icenowy Zheng
04226916d2 media: usbtv: add a new usbid
A new usbid of UTV007 is found in a newly bought device.

The usbid is 1f71:3301.

The ID on the chip is:
UTV007
A89029.1
1520L18K1

Both video and audio is tested with the modified usbtv driver.

Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Acked-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-11-27 14:49:18 -05:00
Arvind Yadav
20f9ceed72 pata_pdc2027x : make pdc2027x_*_timing structures const
Make these pdc2027x_*_timing structures const as it is never modified.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-27 11:46:26 -08:00
Arvind Yadav
c1da86c19a pata_pdc2027x: Remove unnecessary error check
Here, The function pdc_hardware_init always return zero. So it is not
necessary to check its return value.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-27 11:46:26 -08:00
Jon Maloy
2e724dca77 tipc: eliminate access after delete in group_filter_msg()
KASAN revealed another access after delete in group.c. This time
it found that we read the header of a received message after the
buffer has been released.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-27 14:44:45 -05:00
Wang Long
ddf7005f32 debug cgroup: use task_css_set instead of rcu_dereference
This macro `task_css_set` verifies that the caller is
inside proper critical section if the kernel set CONFIG_PROVE_RCU=y.

Signed-off-by: Wang Long <wanglong19@meituan.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-27 11:37:33 -08:00
Florian Fainelli
babd8a3e31 This pull request brings in a fix for a warning that started occuring
when dtc from -next got merged.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE/JuuFDWp9/ZkuCBXtdYpNtH8nugFAloMj8kACgkQtdYpNtH8
 nuiDkA/9GwCXgWzBPamys+OjBWCOmiU09XTAq5Z3e1fGB0EpuDihVOTGX3F/gYZU
 PHvp5QwDCErF9t0gFpBBaZeAcXGIGcKTilB9/ysO+gke4AdI+GRB/PRKu/NGjn0R
 ApLwg9CXlW7LFT/pWyp/+LWRRxThjsEE9qUfLB9+YXFBbFlAEW4MRdizweio9GDi
 EuNDbVn+D/1M8hzgf9sCJM85ZZ9+p6NKwjbvBRNcu7IV5EzgznYj2mGjTwPikVF5
 z5FLaou1VbQ2gmvHMxiC7DkXClLINwf8xAtOUJt+hiZVeKjvCQ6g17zgtiS4tkU8
 t8iJVnwBFk7Oq0llw0fqwHzmeQ71zzW8UYrIbfzFN1cGEajlzhgIEMzMNtPBJx3O
 2XnkN0DOcbB5wsX5gmABCbLXA287m/2K/k4yRJRf76xg+APCcwgh3JyTvbUJy8Ul
 tYitPGOGUryy9C+ZWXEJdIhUQ/zApCKms5kxs6DnxRFA/QdJ/MJ1auNeFsAp3Ddr
 49x7VYzIGuIyKsIG86y5P6wwIRr75cBO5qEKbh1g34NTaBLz8pIXLXq4P0ZWiGwJ
 I/yc2gTTGadyCS8XR2ZV8XMDZORe+O/qLQ4IFIAot9iybRw0RXySq0YVzAs93oWD
 KMbvkGoUALSEwjRbvhPaSCDSz434AwDOx9v+QJtjQySj1QlO/i4=
 =nA2b
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJaHGdhAAoJEIfQlpxEBwcEb4gP/jLM1pPlFvy/MeuoAO79jwKk
 Y0ew8fdg9OQAqulhqauh02R7QejGFChwdYajYNegRG0DBHGFnSepUmAbkrK9FQbr
 Zezago3/KI8hTLntGSzHnFsaEcNR7p+cgZKvT2AOTw3obV0a4fEeozETRwiliRrk
 bOvc/ETZNNUvqnYSPTrlFSemZ3sbF1nWzlziUrBYAQFErH+2MwpMHAQ8hZ70qUEe
 HA3vp9CCtY5COqPCpkGFvgoqN3lvhYdUpDVj3nyjk8tYajrwdM8I1AnqUzTXA83q
 8JEUmL28A9+aMzFHKqLLcud3w08jdgdKJAAMfEVSyqwthFpOSaC/TdMfMqqh4H3u
 2sea6tvGxpCxCiNKP0cN53gWJd5tPLhSVqYaFSc41OvvSZbc1YTb9joAbfwlVS9V
 /m2RE3U8jh9SGicbybLyOgu9WbuUM4qrVeEXAzJJhQ4rgAN8SOqpye12S8dSzg45
 QsVlzdYwKkaAr3qJJklPXeFcADc6yTxUgwzEQlOHj46ij7+pLtTBm5dGZ0MlMEQm
 R9u+JFWnTjPv+IJ2HeEG+ZuOWnv3SZ4rxY5c4JqnQLNhZxHzaf93ubKV0W8L34kM
 /Magq4hiAeFq51knIELVZLyfrt/e4cZOF7T8tEWY3hFtSlvbuyBvXaOqng27BhvK
 icKoEdbViUWbrh1p2nbs
 =JQj1
 -----END PGP SIGNATURE-----

Merge tag 'bcm2835-dt-next-fixes-2017-11-15' into devicetree/fixes

This pull request brings in a fix for a warning that started occuring
when dtc from -next got merged.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
2017-11-27 11:28:23 -08:00
Albert Pool
16a27dfd21 ata: mediatek: Fix typo in module description
Signed-off-by: Albert Pool <albertpool@solcon.nl>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-27 11:26:59 -08:00
Florian Fainelli
5f1aa51c7a ARM: dts: NSP: Fix PPI interrupt types
Booting a kernel results in the kernel warning us about the following
PPI interrupts configuration:
[    0.105127] smp: Bringing up secondary CPUs ...
[    0.110545] GIC: PPI11 is secure or misconfigured
[    0.110551] GIC: PPI13 is secure or misconfigured

Fix this by using the appropriate edge configuration for PPI11 and
PPI13, this is similar to what was fixed for Northstar (BCM5301X) in
commit 0e34079cd1 ("ARM: dts: BCM5301X: Correct GIC_PPI interrupt
flags").

Fixes: 7b2e987de2 ("ARM: NSP: add minimal Northstar Plus device tree")
Fixes: 1a9d53caba ("ARM: dts: NSP: Add TWD Support to DT")
Acked-by: Jon Mason <jon.mason@broadcom.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
2017-11-27 11:22:29 -08:00
Florian Fainelli
77416ab35f ARM: dts: NSP: Disable AHCI controller for HR NSP boards
The AHCI controller is currently enabled for all of these boards:
bcm958623hr and bcm958625hr would result in a hard hang on boot that we
cannot get rid of. Since this does not appear to have an easy and simple
fix, just disable the AHCI controller for now until this gets resolved.

Fixes: 70725d6e97 ("ARM: dts: NSP: Enable SATA on bcm958625hr")
Fixes: d454c37624 ("ARM: dts: NSP: Add new DT file for bcm958623hr")
Acked-by: Jon Mason <jon.mason@broadcom.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
2017-11-27 11:22:29 -08:00
Eduardo Otubo
5b5971df3b xen-netfront: remove warning when unloading module
v2:
 * Replace busy wait with wait_event()/wake_up_all()
 * Cannot garantee that at the time xennet_remove is called, the
   xen_netback state will not be XenbusStateClosed, so added a
   condition for that
 * There's a small chance for the xen_netback state is
   XenbusStateUnknown by the time the xen_netfront switches to Closed,
   so added a condition for that.

When unloading module xen_netfront from guest, dmesg would output
warning messages like below:

  [  105.236836] xen:grant_table: WARNING: g.e. 0x903 still in use!
  [  105.236839] deferring g.e. 0x903 (pfn 0x35805)

This problem relies on netfront and netback being out of sync. By the time
netfront revokes the g.e.'s netback didn't have enough time to free all of
them, hence displaying the warnings on dmesg.

The trick here is to make netfront to wait until netback frees all the g.e.'s
and only then continue to cleanup for the module removal, and this is done by
manipulating both device states.

Signed-off-by: Eduardo Otubo <otubo@redhat.com>
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-27 14:21:58 -05:00
Jens Axboe
2967acbb25 blktrace: fix trace mutex deadlock
A previous commit changed the locking around registration/cleanup,
but direct callers of blk_trace_remove() were missed. This means
that if we hit the error path in setup, we will deadlock on
attempting to re-acquire the queue trace mutex.

Fixes: 1f2cac107c ("blktrace: fix unlocked access to init/start-stop/teardown")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-11-27 12:03:58 -07:00
Colin Ian King
66a7c84d67 i2c: i2c-boardinfo: fix memory leaks on devinfo
Currently when an error occurs devinfo is still allocated but is
unused when the error exit paths break out of the for-loop. Fix
this by kfree'ing devinfo to avoid the leak.

Detected by CoverityScan, CID#1416590 ("Resource Leak")

Fixes: 4124c4eba4 ("i2c: allow attaching IRQ resources to i2c_board_info")
Fixes: 0daaf99d84 ("i2c: copy device properties when using i2c_register_board_info()")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-11-27 19:14:29 +01:00
Hans de Goede
6e0c9507bf i2c: i801: Fix Failed to allocate irq -2147483648 error
On Apollo Lake devices the BIOS does not set up IRQ routing for the i801
SMBUS controller IRQ, so we end up with dev->irq set to IRQ_NOTCONNECTED.

Detect this and do not try to use the irq in this case silencing:
i801_smbus 0000:00:1f.1: Failed to allocate irq -2147483648: -107

Cc: stable@vger.kernel.org
BugLink: https://communities.intel.com/thread/114759
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2017-11-27 19:11:27 +01:00
Darrick J. Wong
509955823c xfs: log recovery should replay deferred ops in order
As part of testing log recovery with dm_log_writes, Amir Goldstein
discovered an error in the deferred ops recovery that lead to corruption
of the filesystem metadata if a reflink+rmap filesystem happened to shut
down midway through a CoW remap:

"This is what happens [after failed log recovery]:

"Phase 1 - find and verify superblock...
"Phase 2 - using internal log
"        - zero log...
"        - scan filesystem freespace and inode maps...
"        - found root inode chunk
"Phase 3 - for each AG...
"        - scan (but don't clear) agi unlinked lists...
"        - process known inodes and perform inode discovery...
"        - agno = 0
"data fork in regular inode 134 claims CoW block 376
"correcting nextents for inode 134
"bad data fork in inode 134
"would have cleared inode 134"

Hou Tao dissected the log contents of exactly such a crash:

"According to the implementation of xfs_defer_finish(), these ops should
be completed in the following sequence:

"Have been done:
"(1) CUI: Oper (160)
"(2) BUI: Oper (161)
"(3) CUD: Oper (194), for CUI Oper (160)
"(4) RUI A: Oper (197), free rmap [0x155, 2, -9]

"Should be done:
"(5) BUD: for BUI Oper (161)
"(6) RUI B: add rmap [0x155, 2, 137]
"(7) RUD: for RUI A
"(8) RUD: for RUI B

"Actually be done by xlog_recover_process_intents()
"(5) BUD: for BUI Oper (161)
"(6) RUI B: add rmap [0x155, 2, 137]
"(7) RUD: for RUI B
"(8) RUD: for RUI A

"So the rmap entry [0x155, 2, -9] for COW should be freed firstly,
then a new rmap entry [0x155, 2, 137] will be added. However, as we can see
from the log record in post_mount.log (generated after umount) and the trace
print, the new rmap entry [0x155, 2, 137] are added firstly, then the rmap
entry [0x155, 2, -9] are freed."

When reconstructing the internal log state from the log items found on
disk, it's required that deferred ops replay in exactly the same order
that they would have had the filesystem not gone down.  However,
replaying unfinished deferred ops can create /more/ deferred ops.  These
new deferred ops are finished in the wrong order.  This causes fs
corruption and replay crashes, so let's create a single defer_ops to
handle the subsequent ops created during replay, then use one single
transaction at the end of log recovery to ensure that everything is
replayed in the same order as they're supposed to be.

Reported-by: Amir Goldstein <amir73il@gmail.com>
Analyzed-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-11-27 09:34:08 -08:00
Darrick J. Wong
98c4f78dcd xfs: always free inline data before resetting inode fork during ifree
In xfs_ifree, we reset the data/attr forks to extents format without
bothering to free any inline data buffer that might still be around
after all the blocks have been truncated off the file.  Prior to commit
43518812d2 ("xfs: remove support for inlining data/extents into the
inode fork") nobody noticed because the leftover inline data after
truncation was small enough to fit inside the inline buffer inside the
fork itself.

However, now that we've removed the inline buffer, we /always/ have to
free the inline data buffer or else we leak them like crazy.  This test
was found by turning on kmemleak for generic/001 or generic/388.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-11-27 09:33:25 -08:00
Adam Thomson
b7926c464d
ASoC: da7218: Correct IRQ level in DT binding example
Current DT binding documentation shows an example where the IRQ
for the device is chosen to be ACTIVE_HIGH. This is incorrect as
the device only supports ACTIVE_LOW, so this commit fixes that
discrepancy.

Signed-off-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
2017-11-27 17:11:10 +00:00
Adam Thomson
d3b0535216
ASoC: da7219: Correct IRQ level in DT binding example
Current DT binding documentation shows an example where the IRQ
for the device is chosen to be ACTIVE_HIGH. This is incorrect as
the device only supports ACTIVE_LOW, so this commit fixes that
discrepancy.

Signed-off-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
2017-11-27 17:10:48 +00:00
Tal Shorer
c98a980509 workqueue: respect isolated cpus when queueing an unbound work
Initialize wq_unbound_cpumask to exclude cpus that were isolated by
the cmdline's isolcpus parameter.

Signed-off-by: Tal Shorer <tal.shorer@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-27 08:57:00 -08:00
Tal Shorer
7d229c668a main: kernel_start: move housekeeping_init() before workqueue_init_early()
This is needed in order to allow the unbound workqueue to take
housekeeping cpus into accounty

Signed-off-by: Tal Shorer <tal.shorer@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-27 08:56:57 -08:00
Paolo Bonzini
a63dd7480d PPC KVM fixes for 4.15
One commit here, that fixes a couple of bugs relating to the patch
 series that enables HPT guests to run on a radix host on POWER9
 systems.  This patch series went upstream in the 4.15 merge window,
 so no stable backport is required.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJaGKSnAAoJEJ2a6ncsY3GfF7MIANLLhznEMrWq8jw4g95WsJU1
 MkDGwp8kIdhIOM9HD6JRskoJZB5Mws2BWlQ5PSaVFxO6v6eUgNLaRb/UBxC1r7gU
 1f9/8corY4BNkezSdJqTL7Xgp13KjTU726OwYAqCPEyCSPEc9ciMyeIgyZuv2dPa
 Pju+u4tnA+9JJyskgNL+/ybOOZwVat91VmNUVRq29zP6+zo1tmIDxrQchy6Bqui/
 7Wg298G+yjAkJ8ktQu69ACk+0oEBGUOcLUlraqGSr9auR+b0nJ1PAGCDRaONdwgE
 +X+OE+t+UC6rU+coUXMwO+Id0X7HMdsLQd3066ODEtD55g8MIVZ126Wt8xDmj5o=
 =GSTh
 -----END PGP SIGNATURE-----

Merge tag 'kvm-ppc-fixes-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master

PPC KVM fixes for 4.15

One commit here, that fixes a couple of bugs relating to the patch
series that enables HPT guests to run on a radix host on POWER9
systems.  This patch series went upstream in the 4.15 merge window,
so no stable backport is required.
2017-11-27 17:54:13 +01:00
Jan H. Schönherr
20b7035c66 KVM: Let KVM_SET_SIGNAL_MASK work as advertised
KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that
"any unblocked signal received [...] will cause KVM_RUN to return with
-EINTR" and that "the signal will only be delivered if not blocked by
the original signal mask".

This, however, is only true, when the calling task has a signal handler
registered for a signal. If not, signal evaluation is short-circuited for
SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN
returning or the whole process is terminated.

Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar
to that in do_sigtimedwait() to avoid short-circuiting of signals.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-27 17:53:47 +01:00
Prateek Sood
1599a185f0 cpuset: Make cpuset hotplug synchronous
Convert cpuset_hotplug_workfn() into synchronous call for cpu hotplug
path. For memory hotplug path it still gets queued as a work item.

Since cpuset_hotplug_workfn() can be made synchronous for cpu hotplug
path, it is not required to wait for cpuset hotplug while thawing
processes.

Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-27 08:48:10 -08:00
Prateek Sood
aa24163b2e cgroup/cpuset: remove circular dependency deadlock
Remove circular dependency deadlock in a scenario where hotplug of CPU is
being done while there is updation in cgroup and cpuset triggered from
userspace.

Process A => kthreadd => Process B => Process C => Process A

Process A
cpu_subsys_offline();
  cpu_down();
    _cpu_down();
      percpu_down_write(&cpu_hotplug_lock); //held
      cpuhp_invoke_callback();
	     workqueue_offline_cpu();
            queue_work_on(); // unbind_work on system_highpri_wq
               __queue_work();
                 insert_work();
                    wake_up_worker();
            flush_work();
               wait_for_completion();

worker_thread();
   manage_workers();
      create_worker();
	     kthread_create_on_node();
		    wake_up_process(kthreadd_task);

kthreadd
kthreadd();
  kernel_thread();
    do_fork();
      copy_process();
        percpu_down_read(&cgroup_threadgroup_rwsem);
          __rwsem_down_read_failed_common(); //waiting

Process B
kernfs_fop_write();
  cgroup_file_write();
    cgroup_procs_write();
      percpu_down_write(&cgroup_threadgroup_rwsem); //held
      cgroup_attach_task();
        cgroup_migrate();
          cgroup_migrate_execute();
            cpuset_can_attach();
              mutex_lock(&cpuset_mutex); //waiting

Process C
kernfs_fop_write();
  cgroup_file_write();
    cpuset_write_resmask();
      mutex_lock(&cpuset_mutex); //held
      update_cpumask();
        update_cpumasks_hier();
          rebuild_sched_domains_locked();
            get_online_cpus();
              percpu_down_read(&cpu_hotplug_lock); //waiting

Eliminating deadlock by reversing the locking order for cpuset_mutex and
cpu_hotplug_lock.

Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-27 08:48:10 -08:00
oder_chiou@realtek.com
346cccf883
ASoC: rt5514: Add the sanity check for the driver_data in the resume function
If the rt5514 spi driver is loaded, but the snd_soc_platform_driver is not
loaded by the correct DAI settings, the NULL pointer will be gotten by
snd_soc_platform_get_drvdata in the resume function.

Signed-off-by: Oder Chiou <oder_chiou@realtek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2017-11-27 16:44:57 +00:00
Maciej S. Szmigiero
b880b8056b
ASoC: fsl_ssi: serialize AC'97 register access operations
AC'97 register access operations (both read and write) on SSI use a one,
shared set of SSI registers for AC'97 register address and data.
This means that only one such access is possible at a time and so all these
operations need to be serialized.

Since an AC'97 register access operation in this driver takes 100us+ let's
use a mutex for this.

Use this opportunity to also change a default value returned from AC'97
register read function from -1 to 0, since that's what AC'97 specs require
to be returned when unknown / undefined registers are read.

Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Signed-off-by: Mark Brown <broonie@kernel.org>
2017-11-27 16:43:43 +00:00
Maciej S. Szmigiero
695b78b548
ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure
AC'97 ops (register read / write) need SSI regmap and clock, so they have
to be set after them.

We also need to set these ops back to NULL if we fail the probe.

Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Acked-by: Nicolin Chen <nicoleotsuka@gmail.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
2017-11-27 16:41:55 +00:00
Liu Bo
ebb70442cd Btrfs: fix list_add corruption and soft lockups in fsync
Xfstests btrfs/146 revealed this corruption,

[   58.138831] Buffer I/O error on dev dm-0, logical block 2621424, async page read
[   58.151233] BTRFS error (device sdf): bdev /dev/mapper/error-test errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
[   58.152403] list_add corruption. prev->next should be next (ffff88005e6775d8), but was ffffc9000189be88. (prev=ffffc9000189be88).
[   58.153518] ------------[ cut here ]------------
[   58.153892] WARNING: CPU: 1 PID: 1287 at lib/list_debug.c:31 __list_add_valid+0x169/0x1f0
...
[   58.157379] RIP: 0010:__list_add_valid+0x169/0x1f0
...
[   58.161956] Call Trace:
[   58.162264]  btrfs_log_inode_parent+0x5bd/0xfb0 [btrfs]
[   58.163583]  btrfs_log_dentry_safe+0x60/0x80 [btrfs]
[   58.164003]  btrfs_sync_file+0x4c2/0x6f0 [btrfs]
[   58.164393]  vfs_fsync_range+0x5f/0xd0
[   58.164898]  do_fsync+0x5a/0x90
[   58.165170]  SyS_fsync+0x10/0x20
[   58.165395]  entry_SYSCALL_64_fastpath+0x1f/0xbe
...

It turns out that we could record btrfs_log_ctx:io_err in
log_one_extents when IO fails, but make log_one_extents() return '0'
instead of -EIO, so the IO error is not acknowledged by the callers,
i.e.  btrfs_log_inode_parent(), which would remove btrfs_log_ctx:list
from list head 'root->log_ctxs'.  Since btrfs_log_ctx is allocated
from stack memory, it'd get freed with a object alive on the
list. then a future list_add will throw the above warning.

This returns the correct error in the above case.

Jeff also reported this while testing against his fsync error
patch set[1].

[1]: https://www.spinics.net/lists/linux-btrfs/msg65308.html
"btrfs list corruption and soft lockups while testing writeback error handling"

Fixes: 8407f55326 ("Btrfs: fix data corruption after fast fsync and writeback error")
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-11-27 17:41:19 +01:00
Wanpeng Li
b74558259c KVM: VMX: Fix vmx->nested freeing when no SMI handler
Reported by syzkaller:

   ------------[ cut here ]------------
   WARNING: CPU: 5 PID: 2939 at arch/x86/kvm/vmx.c:3844 free_loaded_vmcs+0x77/0x80 [kvm_intel]
   CPU: 5 PID: 2939 Comm: repro Not tainted 4.14.0+ #26
   RIP: 0010:free_loaded_vmcs+0x77/0x80 [kvm_intel]
   Call Trace:
    vmx_free_vcpu+0xda/0x130 [kvm_intel]
    kvm_arch_destroy_vm+0x192/0x290 [kvm]
    kvm_put_kvm+0x262/0x560 [kvm]
    kvm_vm_release+0x2c/0x30 [kvm]
    __fput+0x190/0x370
    task_work_run+0xa1/0xd0
    do_exit+0x4d2/0x13e0
    do_group_exit+0x89/0x140
    get_signal+0x318/0xb80
    do_signal+0x8c/0xb40
    exit_to_usermode_loop+0xe4/0x140
    syscall_return_slowpath+0x206/0x230
    entry_SYSCALL_64_fastpath+0x98/0x9a

The syzkaller testcase will execute VMXON/VMLAUCH instructions, so the
vmx->nested stuff is populated, it will also issue KVM_SMI ioctl. However,
the testcase is just a simple c program and not be lauched by something
like seabios which implements smi_handler. Commit 05cade71cf (KVM: nSVM:
fix SMI injection in guest mode) gets out of guest mode and set nested.vmxon
to false for the duration of SMM according to SDM 34.14.1 "leave VMX
operation" upon entering SMM. We can't alloc/free the vmx->nested stuff
each time when entering/exiting SMM since it will induce more overhead. So
the function vmx_pre_enter_smm() marks nested.vmxon false even if vmx->nested
stuff is still populated. What it expected is em_rsm() can mark nested.vmxon
to be true again. However, the smi_handler/rsm will not execute since there
is no something like seabios in this scenario. The function free_nested()
fails to free the vmx->nested stuff since the vmx->nested.vmxon is false
which results in the above warning.

This patch fixes it by also considering the no SMI handler case, luckily
vmx->nested.smm.vmxon is marked according to the value of vmx->nested.vmxon
in vmx_pre_enter_smm(), we can take advantage of it and free vmx->nested
stuff when L1 goes down.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Fixes: 05cade71cf (KVM: nSVM: fix SMI injection in guest mode)
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-27 17:37:55 +01:00
Wanpeng Li
c37c28730b KVM: VMX: Fix rflags cache during vCPU reset
Reported by syzkaller:

   *** Guest State ***
   CR0: actual=0x0000000080010031, shadow=0x0000000060000010, gh_mask=fffffffffffffff7
   CR4: actual=0x0000000000002061, shadow=0x0000000000000000, gh_mask=ffffffffffffe8f1
   CR3 = 0x000000002081e000
   RSP = 0x000000000000fffa  RIP = 0x0000000000000000
   RFLAGS=0x00023000         DR7 = 0x00000000000000
          ^^^^^^^^^^
   ------------[ cut here ]------------
   WARNING: CPU: 6 PID: 24431 at /home/kernel/linux/arch/x86/kvm//x86.c:7302 kvm_arch_vcpu_ioctl_run+0x651/0x2ea0 [kvm]
   CPU: 6 PID: 24431 Comm: reprotest Tainted: G        W  OE   4.14.0+ #26
   RIP: 0010:kvm_arch_vcpu_ioctl_run+0x651/0x2ea0 [kvm]
   RSP: 0018:ffff880291d179e0 EFLAGS: 00010202
   Call Trace:
    kvm_vcpu_ioctl+0x479/0x880 [kvm]
    do_vfs_ioctl+0x142/0x9a0
    SyS_ioctl+0x74/0x80
    entry_SYSCALL_64_fastpath+0x23/0x9a

The failed vmentry is triggered by the following beautified testcase:

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <string.h>
    #include <stdint.h>
    #include <linux/kvm.h>
    #include <fcntl.h>
    #include <sys/ioctl.h>

    long r[5];
    int main()
    {
        struct kvm_debugregs dr = { 0 };

        r[2] = open("/dev/kvm", O_RDONLY);
        r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
        r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
        struct kvm_guest_debug debug = {
                .control = 0xf0403,
                .arch = {
                        .debugreg[6] = 0x2,
                        .debugreg[7] = 0x2
                }
        };
        ioctl(r[4], KVM_SET_GUEST_DEBUG, &debug);
        ioctl(r[4], KVM_RUN, 0);
    }

which testcase tries to setup the processor specific debug
registers and configure vCPU for handling guest debug events through
KVM_SET_GUEST_DEBUG.  The KVM_SET_GUEST_DEBUG ioctl will get and set
rflags in order to set TF bit if single step is needed. All regs' caches
are reset to avail and GUEST_RFLAGS vmcs field is reset to 0x2 during vCPU
reset. However, the cache of rflags is not reset during vCPU reset. The
function vmx_get_rflags() returns an unreset rflags cache value since
the cache is marked avail, it is 0 after boot. Vmentry fails if the
rflags reserved bit 1 is 0.

This patch fixes it by resetting both the GUEST_RFLAGS vmcs field and
its cache to 0x2 during vCPU reset.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-27 17:37:46 +01:00
Wanpeng Li
e70b57a6ce KVM: X86: Fix softlockup when get the current kvmclock
watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [qemu-system-x86:10185]
 CPU: 6 PID: 10185 Comm: qemu-system-x86 Tainted: G           OE   4.14.0-rc4+ #4
 RIP: 0010:kvm_get_time_scale+0x4e/0xa0 [kvm]
 Call Trace:
  get_time_ref_counter+0x5a/0x80 [kvm]
  kvm_hv_process_stimers+0x120/0x5f0 [kvm]
  kvm_arch_vcpu_ioctl_run+0x4b4/0x1690 [kvm]
  kvm_vcpu_ioctl+0x33a/0x620 [kvm]
  do_vfs_ioctl+0xa1/0x5d0
  SyS_ioctl+0x79/0x90
  entry_SYSCALL_64_fastpath+0x1e/0xa9

This can be reproduced when running kvm-unit-tests/hyperv_stimer.flat and
cpu-hotplug stress simultaneously. __this_cpu_read(cpu_tsc_khz) returns 0
(set in kvmclock_cpu_down_prep()) when the pCPU is unhotplug which results
in kvm_get_time_scale() gets into an infinite loop.

This patch fixes it by treating the unhotplug pCPU as not using master clock.

Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-27 17:32:53 +01:00