When passed GFP flags that allow sleeping (such as
GFP_NOIO), mempool_alloc() will never return NULL, it will
wait until memory is available.
This means that we don't need to handle failure, but that we
do need to ensure one thread doesn't call mempool_alloc()
twice on the one pool without queuing or freeing the first
allocation. If multiple threads did this during times of
high memory pressure, the pool could be exhausted and a
deadlock could result.
pnfs_generic_alloc_ds_commits() attempts to allocate from
the nfs_commit_mempool while already holding an allocation
from that pool. This is not safe. So change
nfs_commitdata_alloc() to take a flag that indicates whether
failure is acceptable.
In pnfs_generic_alloc_ds_commits(), accept failure and
handle it as we currently do. Else where, do not accept
failure, and do not handle it.
Even when failure is acceptable, we want to succeed if
possible. That means both
- using an entry from the pool if there is one
- waiting for direct reclaim is there isn't.
We call mempool_alloc(GFP_NOWAIT) to achieve the first, then
kmem_cache_alloc(GFP_NOIO|__GFP_NORETRY) to achieve the
second. Each of these can fail, but together they do the
best they can without blocking indefinitely.
The objects returned by kmem_cache_alloc() will still be freed
by mempool_free(). This is safe as mempool_alloc() uses
exactly the same function to allocate objects (since the mempool
was created with mempool_create_slab_pool()). The object returned
by mempool_alloc() and kmem_cache_alloc() are indistinguishable
so mempool_free() will handle both identically, either adding to the
pool or calling kmem_cache_free().
Also, don't test for failure when allocating from
nfs_wdata_mempool.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Returning errors directly even lets us remove the goto
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If we cut out the dprintk()s, then we can return error codes directly
and cut out the goto.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
This puts all the common code in a single place for the
walk_client_list() functions.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Once again, we can remove the function and compare integer values
directly.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If we cut out the dprintk()s, then we don't even need this to be a
separate function.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Just remove the function and have the caller use nfs_release_request()
instead.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
We always call nfs_mark_client_ready() even if nfs_create_rpc_client()
returns an error, so we can rearrange nfs_init_client() to mark the
client ready from a single place.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Let's cut out the goto and return any errors immedately
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Additionally, this change lets us cut out the goto by returning errors
immediately.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Removing the dprintk() lets us simplify the function by returning status
codes directly, rather than using a goto.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Removing the dprintk() lets us return the status value directly, rather
than jumping to a label if an error occurs.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
In addition to removing the dprintk(), this patch also initializes "res"
to the default return value instead of doing this through an else
condition.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Removing the dprintk()s lets us simplify the function by removing the
else condition entirely and returning the status of
initiate_{file,bulk}_draining() directly.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Flexfilelayout supports data servers which talk NFS v3 and v4.{0,1,2}.
However, this code path is disabled and v3 only servers are accepted.
This change removes this limitation.
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
NFS has some optimizations for readdir to choose between using READDIR or
READDIRPLUS based on workload, and which NFS operation to use is determined
by subsequent interactions with lookup, d_revalidate, and getattr.
Concurrent use of nfs_readdir() via ->iterate_shared() can cause those
optimizations to repeatedly invalidate the pagecache used to store
directory entries during readdir(), which causes some very bad performance
for directories with many entries (more than about 10000).
There's a couple ways to fix this in NFS, but no fix would be as simple as
going back to ->iterate() to serialize nfs_readdir(), and neither fix I
tested performed as well as going back to ->iterate().
The first required taking the directory's i_lock for each entry, with the
result of terrible contention.
The second way adds another flag to the nfs_inode, and so keeps the
optimizations working for large directories. The difference from using
->iterate() here is that much more memory is consumed for a given workload
without any performance gain.
The workings of nfs_readdir() are such that concurrent users are serialized
within read_cache_page() waiting to retrieve pages of entries from the
server. By serializing this work in iterate_dir() instead, contention for
cache pages is reduced. Waiting processes can have an uncontended pass at
the entirety of the directory's pagecache once previous processes have
completed filling it.
v2 - Keep the bits needed for parallel lookup
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Again, a batch that's been sitting a couple of weeks, mostly because I
anticipated a bit more material but it didn't show up -- which is good.
These are all your garden variety fixes for ARM platforms. Most visible issue
fixed here is probably the SMP reset issue on OMAP, the rest are minor stuff.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJY877PAAoJEIwa5zzehBx3TaUQAKtE8CSmvdWIk5grVb4MBDfF
rXQ2R7WJF17LtVhIGwz9BPCQXMK+/XcSuXZ8bvUIoMsmOFXutDao7m7COD0z4aNo
CPWk7ceQRzIWvsmturla6aYGmpAbWzdDgQj61LnaFbQrzmJOHVvcJN685+L5NGkZ
8GBrzNmrvVkWz5N+msnrZRIcKpSqGCXrkjUU1EfHgUgNNdTf6BQRQSzVEYBtILEb
9bC3WdS1fosmSdbsBjcxHsAtWMyO64KqhA691+gGTR93wyOuCRgqv+/ucoFB1oi+
yjuhUVHW7Y5/8KOi67+97PwoKpZYpUFHge1/5iPbgEN6SqhOLzPAALGCKT4ONEQz
KmLl//kX1IJZTD/87OegSLDbpS7h2sRYS7FpwgAa1kyYYOHdsZsECzKfhEvgZNHX
Gtl2XLmtELM9quKQ2X8xWvjU5grfSLkng9OhDJ6ZCFhGddvjtHegq8J5diFyq281
qf4n/Gp/OLC9IoPUyZefn5ya77K8UZPNppqtiWTBbT1IuXWbPUVPKmZoCpq3Qwu+
2fdcFa3sIfpzDg9vkTszFAUCFkRqH5Jzy8BfNIQtfSq+pxwyIRWh88a5CV2RvYFB
uHSt5B88YR2PwoyhV2oFpYNkx3vVu9g+A35ClzIRTLhSMi+Ngh2+DcppHY+LficP
m9Hes98dZOv92JgiSaoQ
=C+Hz
-----END PGP SIGNATURE-----
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Again, a batch that's been sitting a couple of weeks, mostly because
I anticipated a bit more material but it didn't show up -- which is
good.
These are all your garden variety fixes for ARM platforms.
The most visible issue fixed here is probably the SMP reset issue on
OMAP, the rest are minor stuff"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
arm64: allwinner: a64: add pmu0 regs for USB PHY
ARM: OMAP2+: omap_device: Sync omap_device and pm_runtime after probe defer
reset: add exported __reset_control_get, return NULL if optional
ARM: orion5x: only call into phylib when available
ARM: omap2+: Revert omap-smp.c changes resetting CPU1 during boot
ARM: dts: am335x-evmsk: adjust mmc2 param to allow suspend
ARM: dts: ti: fix PCI bus dtc warnings
ARM: dts: am335x-baltos: disable EEE for Atheros 8035 PHY
ARM: dts: OMAP3: Fix MFG ID EEPROM
ARM: sun8i: a33: add operating-points-v2 property to all nodes
ARM: sun8i: a33: remove highest OPP to fix CPU crashes
Pull block fixes from Jens Axboe:
"Four small fixes.
Three of them fix the same error in NVMe, in loop, fc, and rdma
respectively. The last fix from Ming fixes a regression in this
series, where our bvec gap logic was wrong and causes an oops on
NVMe for certain conditions"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: fix bio_will_gap() for first bvec with offset
nvme-fc: Fix sqsize wrong assignment based on ctrl MQES capability
nvme-rdma: Fix sqsize wrong assignment based on ctrl MQES capability
nvme-loop: Fix sqsize wrong assignment based on ctrl MQES capability
Without this fix we can get PM related warnings for devices that
use deferred probe. If necessary, this fix can wait for the
v4.12 merge window no problem.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEkgNvrZJU/QSQYIcQG9Q+yVyrpXMFAljruuwRHHRvbnlAYXRv
bWlkZS5jb20ACgkQG9Q+yVyrpXMWBBAA0kYBB9IA+OinjpgLBB9ltKX21HBXKAHn
JCiygnR6KzDxnBzsPJk0v0GfRq7FixsEbstyDqfGXDpK5pOFTlgffGFeaMLQt+jU
4PcjLtiXklS9j3jJyUS7SAAjh5sPmR8v5q+NO0ELGi5H2q3c7J7X7VojD9LCB9gm
z/4t133EEPwRdTUghoqxTaB+11ROrbctr0eZUNIafytxsnX4kkpcVGuQxRBu740j
rLXxd3lIJbqasJHj+4v/IkE5CT61OslqEEA8QDaP1oK4d6M4+JVGCBAPXIl6AZ1b
vQEjUl1YMgU1QeF/cnQwf6n6fM1DqhjbdySotDRDlZvWExexTG1BXBcKr1mATfNp
zSGJuAne/zG0AHcqfpTZD3VWbrO1iGw5RmifwwtcmbsAmKu6K7ezMx2QO4L8VBma
N407KOdhcnzsGQnTn3iQNwK36b/lc6ph1DA02TeS41vYB6MBrIp7uugP30k7eOf2
Smv3LClY0H9+db9/IMDyuap8Os3QlGEwXwnVyC67TE3dRP3Js5r7Fm3i9WGmV5Oy
5pU79OXMYzphKUd/11QUSnQUO+SMNH7/fU/dSeqLQMfwe2a5oY4FDuM5Y8uFDoZ7
2KPGLEGWFmn8kxuZvBw/nHiwPexe3y91dIfvA+S7kTQnqFGmM0IzlkMnfcZeV5Zu
AaRkd2G7654=
=07Xj
-----END PGP SIGNATURE-----
Merge tag 'omap-for-v4.11/fixes-rc6-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes
Regression fix for omap interconnect code for deferred probe.
Without this fix we can get PM related warnings for devices that
use deferred probe. If necessary, this fix can wait for the
v4.12 merge window no problem.
* tag 'omap-for-v4.11/fixes-rc6-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP2+: omap_device: Sync omap_device and pm_runtime after probe defer
ARM: omap2+: Revert omap-smp.c changes resetting CPU1 during boot
ARM: dts: am335x-evmsk: adjust mmc2 param to allow suspend
ARM: dts: ti: fix PCI bus dtc warnings
ARM: dts: am335x-baltos: disable EEE for Atheros 8035 PHY
ARM: dts: OMAP3: Fix MFG ID EEPROM
Signed-off-by: Olof Johansson <olof@lixom.net>
Pull cgroup fix from Tejun Heo:
"Unfortunately, the commit to fix the cgroup mount race in the previous
pull request can lead to hangs.
The original bug has been around for a while and isn't too likely to
be triggered in usual use cases. Revert the commit for now"
* 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
Revert "cgroup: avoid attaching a cgroup root to two different superblocks"
Here is a single tty core revert for a patch that was reported to cause
problems. The original issue is one that we have lived with for
decades, so trying to scramble to fix the fix in time for 4.11-final
does not make sense due to the fragility of the tty ldisc layer. Just
reverting it makes sense for now.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWPMjIQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yl+ywCfbEc9XQ2+eDebFSs673OaDUxApmIAoMe+4Qj5
puBC7ThZvCwrwACsoBWd
=rugQ
-----END PGP SIGNATURE-----
Merge tag 'tty-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty fix from Greg KH:
"Here is a single tty core revert for a patch that was reported to
cause problems.
The original issue is one that we have lived with for decades, so
trying to scramble to fix the fix in time for 4.11-final does not make
sense due to the fragility of the tty ldisc layer. Just reverting it
makes sense for now"
* tag 'tty-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "tty: don't panic on OOM in tty_set_ldisc()"
bug. This bug has been there sinc function tracing was added way back
when. But my new development depends on this bug being fixed, and it
should be fixed regardless as it causes ftrace to disable itself when
triggered, and a reboot is required to enable it again.
The bug is that the function probe does not disable itself properly
if there's another probe of its type still enabled. For example:
# cd /sys/kernel/debug/tracing
# echo schedule:traceoff > set_ftrace_filter
# echo do_IRQ:traceoff > set_ftrace_filter
# echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter
# echo do_IRQ:traceoff > set_ftrace_filter
The above registers two traceoff probes (one for schedule and one for
do_IRQ, and then removes do_IRQ. But since there still exists one for
schedule, it is not done properly. When adding do_IRQ back, the breakage
in the accounting is noticed by the ftrace self tests, and it causes
a warning and disables ftrace.
-----BEGIN PGP SIGNATURE-----
iQExBAABCAAbBQJY8ovvFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
nkAH/jfsXUWIbZ6J0A7+nmGiBdIVwLwG0ZOJClcxjnCSpsNs+FO/0w6ragtIYCi2
Km+0s/slA5GOddG4Miga/dhtxGhDosyXnxqC+4GmD0maqJGLweJLbmiQ1xhra0hr
XGDI+SXHM/n22zVkFEbkGXgxMvOHeR+X/sREZo3XmoXRLbc1QVtTEe/8TdlLXwE5
5Fs07xSQqx4TS7oBxIjipHnbHL/gIktEo0HiEmq73++r42MztIMYZPoV+cXuim37
C6xO4PxfPN0aRh9W5gdiMnbv6lummVBNQXwpMya0vTbxz/9WeUex8c+lcInQUJgA
FhQWKaCGyi0UK4Pa2Pz/Dmxuti0=
=LYLo
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.11-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull ftrace fix from Steven Rostedt:
"While rewriting the function probe code, I stumbled over a long
standing bug. This bug has been there sinc function tracing was added
way back when. But my new development depends on this bug being fixed,
and it should be fixed regardless as it causes ftrace to disable
itself when triggered, and a reboot is required to enable it again.
The bug is that the function probe does not disable itself properly if
there's another probe of its type still enabled. For example:
# cd /sys/kernel/debug/tracing
# echo schedule:traceoff > set_ftrace_filter
# echo do_IRQ:traceoff > set_ftrace_filter
# echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter
# echo do_IRQ:traceoff > set_ftrace_filter
The above registers two traceoff probes (one for schedule and one for
do_IRQ, and then removes do_IRQ.
But since there still exists one for schedule, it is not done
properly. When adding do_IRQ back, the breakage in the accounting is
noticed by the ftrace self tests, and it causes a warning and disables
ftrace"
* tag 'trace-v4.11-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix removing of second function probe
This reverts commit bfb0b80db5.
Andrei reports CRIU test hangs with the patch applied. The bug fixed
by the patch isn't too likely to trigger in actual uses. Revert the
patch for now.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Andrei Vagin <avagin@virtuozzo.com>
Link: http://lkml.kernel.org/r/20170414232737.GC20350@outlook.office365.com
Pull nvdimm fixes from Dan Williams:
"A small crop of lockdep, sleeping while atomic, and other fixes /
band-aids in advance of the full-blown reworks targeting the next
merge window. The largest change here is "libnvdimm: fix blk free
space accounting" which deletes a pile of buggy code that better
testing would have caught before merging. The next change that is
borderline too big for a late rc is switching the device-dax locking
from rcu to srcu, I couldn't think of a smaller way to make that fix.
The __copy_user_nocache fix will have a full replacement in 4.12 to
move those pmem special case considerations into the pmem driver. The
"libnvdimm: band aid btt vs clear poison locking" commit admits that
our error clearing support for btt went in broken, so we just disable
it in 4.11 and -stable. A replacement / full fix is in the pipeline
for 4.12
Some of these would have been caught earlier had DEBUG_ATOMIC_SLEEP
been enabled on my development station. I wonder if we should have:
config DEBUG_ATOMIC_SLEEP
default PROVE_LOCKING
...since I mistakenly thought I got both with PROVE_LOCKING=y.
These have received a build success notification from the 0day robot,
and some have appeared in a -next release with no reported issues"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
device-dax: switch to srcu, fix rcu_read_lock() vs pte allocation
libnvdimm: band aid btt vs clear poison locking
libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat
libnvdimm: fix blk free space accounting
acpi, nfit, libnvdimm: fix interleave set cookie calculation (64-bit comparison)
This is seven small fixes which are all for user visible issues that
fortunately only occur in rare circumstances. The most serious is the
sr one in which QEMU can cause us to read beyond the end of a buffer
(I don't think it's exploitable, but just in case). The next is the
sd capacity fix which means all non 512 byte sector drives greater
than 2TB fail to be correctly sized. The rest are either in new
drivers (qedf) or on error legs.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJY8krDAAoJEAVr7HOZEZN4BpoQAKKwu5Y7jRS6JNoS94pT2L33
Rz8dEQNdSmr7JvVQphoQDw7YQjYsGV+4f3FdGMrjMzU0Ew3QR1w2SPwK9ww3wUj6
eHrIdnkd77FJgWyyy/wLTcf55YwGqhIKCSVv3tQmZN/xiuM7x12ZsW8Kb8ydAbKR
824+VU49dSx3rpnzUPUI/+eb8u5MgLHoNpzDzRWb4Au+05LOkgthDMfC6GdmOnyx
Asko/u4UNL69cT+xeYNmryMI+OHdqkRILMadefySJ8QTBfvZHsazYqb4YtWA0qf9
1VAmuvQzToGC7waAHj2X0FSPEISUx+NehXgQjIOZ9aVD2L1XDFm1wRpI3CsIFRA1
2IrgGYpR7rQ2Uj6z7VI+uU+0O/fs13Tctyv6C1wuqkjM+itBLQ5IGuG9shBPXazP
UrD9R+SZfMsGXa1ABlcYJE7M+xEusoN2TKGj5yR5D9kt+a/JcqF2GEmY/KpNkEkT
rmdqAgHqL/bvIc/qyfU4ZayR1MlGe38oPfm9D+vYqJk/V+drlWt66PkamRowfyej
WufO4l4mB8bpR+HLjRXK4Nh0wHzh8WDgZUrRJXBmOqpig6v3HCJzUejvFTBjzdjx
uOUlN41RGnOI1L6OpY6IAc2GPw7IwMHKXaIKL0OwyaK+lxvVMO1jRg3TieXzdSp1
wF7GqfOw2WFVvX1n5xX3
=vOpC
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is seven small fixes which are all for user visible issues that
fortunately only occur in rare circumstances.
The most serious is the sr one in which QEMU can cause us to read
beyond the end of a buffer (I don't think it's exploitable, but just
in case).
The next is the sd capacity fix which means all non 512 byte sector
drives greater than 2TB fail to be correctly sized.
The rest are either in new drivers (qedf) or on error legs"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ipr: do not set DID_PASSTHROUGH on CHECK CONDITION
scsi: aacraid: fix PCI error recovery path
scsi: sd: Fix capacity calculation with 32-bit sector_t
scsi: qla2xxx: Add fix to read correct register value for ISP82xx.
scsi: qedf: Fix crash due to unsolicited FIP VLAN response.
scsi: sr: Sanity check returned mode data
scsi: sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable
Pull parisc fix from Helge Deller:
"Mikulas Patocka fixed a few bugs in our new pa_memcpy() assembler
function, e.g. one bug made the kernel unbootable if source and
destination address are the same"
* 'parisc-4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: fix bugs in pa_memcpy
Otherwise lockdep says:
[ 1337.483798] ================================================
[ 1337.483999] [ BUG: lock held when returning to user space! ]
[ 1337.484252] 4.11.0-rc6 #19 Not tainted
[ 1337.484423] ------------------------------------------------
[ 1337.484626] mount/14766 is leaving the kernel with locks still held!
[ 1337.484841] 1 lock held by mount/14766:
[ 1337.485017] #0: (&type->s_umount_key#33/1){+.+.+.}, at: [<ffffffff8124171f>] sget_userns+0x2af/0x520
Caught by xfstests generic/413 which tried to mount with the unsupported
mount option dax. Then xfstests generic/422 ran sync which deadlocks.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Acked-by: Mike Marshall <hubcap@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Normal pathname lookup doesn't allow empty pathnames, but using
AT_EMPTY_PATH (with name_to_handle_at() or fstatat(), for example) you
can trigger an empty pathname lookup.
And not only is the RCU lookup in that case entirely unnecessary
(because we'll obviously immediately finalize the end result), it is
actively wrong.
Why? An empth path is a special case that will return the original
'dirfd' dentry - and that dentry may not actually be RCU-free'd,
resulting in a potential use-after-free if we were to initialize the
path lazily under the RCU read lock and depend on complete_walk()
finalizing the dentry.
Found by syzkaller and KASAN.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The patch 554bfeceb8 ("parisc: Fix access
fault handling in pa_memcpy()") reimplements the pa_memcpy function.
Unfortunatelly, it makes the kernel unbootable. The crash happens in the
function ide_complete_cmd where memcpy is called with the same source
and destination address.
This patch fixes a few bugs in pa_memcpy:
* When jumping to .Lcopy_loop_16 for the first time, don't skip the
instruction "ldi 31,t0" (this bug made the kernel unbootable)
* Use the COND macro when comparing length, so that the comparison is
64-bit (a theoretical issue, in case the length is greater than
0xffffffff)
* Don't use the COND macro after the "extru" instruction (the PA-RISC
specification says that the upper 32-bits of extru result are undefined,
although they are set to zero in practice)
* Fix exception addresses in .Lcopy16_fault and .Lcopy8_fault
* Rename .Lcopy_loop_4 to .Lcopy_loop_8 (so that it is consistent with
.Lcopy8_fault)
Cc: <stable@vger.kernel.org> # v4.9+
Fixes: 554bfeceb8 ("parisc: Fix access fault handling in pa_memcpy()")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Pull input fixes from Dmitry Torokhov:
"Just a small update to xpad driver to recognize yet another gamepad,
and another change making sure userio.h is exported"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: xpad - add support for Razer Wildcat gamepad
uapi: add missing install of userio.h
Pull networking fixes from David Miller:
"Things seem to be settling down as far as networking is concerned,
let's hope this trend continues...
1) Add iov_iter_revert() and use it to fix the behavior of
skb_copy_datagram_msg() et al., from Al Viro.
2) Fix the protocol used in the synthetic SKB we cons up for the
purposes of doing a simulated route lookup for RTM_GETROUTE
requests. From Florian Larysch.
3) Don't add noop_qdisc to the per-device qdisc hashes, from Cong
Wang.
4) Don't call netdev_change_features with the team lock held, from
Xin Long.
5) Revert TCP F-RTO extension to catch more spurious timeouts because
it interacts very badly with some middle-boxes. From Yuchung
Cheng.
6) Fix the loss of error values in l2tp {s,g}etsockopt calls, from
Guillaume Nault.
7) ctnetlink uses bit positions where it should be using bit masks,
fix from Liping Zhang.
8) Missing RCU locking in netfilter helper code, from Gao Feng.
9) Avoid double frees and use-after-frees in tcp_disconnect(), from
Eric Dumazet.
10) Don't do a changelink before we register the netdevice in
bridging, from Ido Schimmel.
11) Lock the ipv6 device address list properly, from Rabin Vincent"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits)
netfilter: ipt_CLUSTERIP: Fix wrong conntrack netns refcnt usage
netfilter: nft_hash: do not dump the auto generated seed
drivers: net: usb: qmi_wwan: add QMI_QUIRK_SET_DTR for Telit PID 0x1201
ipv6: Fix idev->addr_list corruption
net: xdp: don't export dev_change_xdp_fd()
bridge: netlink: register netdevice before executing changelink
bridge: implement missing ndo_uninit()
bpf: reference may_access_skb() from __bpf_prog_run()
tcp: clear saved_syn in tcp_disconnect()
netfilter: nf_ct_expect: use proper RCU list traversal/update APIs
netfilter: ctnetlink: skip dumping expect when nfct_help(ct) is NULL
netfilter: make it safer during the inet6_dev->addr_list traversal
netfilter: ctnetlink: make it safer when checking the ct helper name
netfilter: helper: Add the rcu lock when call __nf_conntrack_helper_find
netfilter: ctnetlink: using bit to represent the ct event
netfilter: xt_TCPMSS: add more sanity tests on tcph->doff
net: tcp: Increase TCP_MIB_OUTRSTS even though fail to alloc skb
l2tp: don't mask errors in pppol2tp_getsockopt()
l2tp: don't mask errors in pppol2tp_setsockopt()
tcp: restrict F-RTO to work-around broken middle-boxes
...
Pull x86 fixes from Thomas Gleixner:
"A set of small fixes for x86:
- fix locking in RDT to prevent memory leaks and freeing in use
memory
- prevent setting invalid values for vdso32_enabled which cause
inconsistencies for user space resulting in application crashes.
- plug a race in the vdso32 code between fork and sysctl which causes
inconsistencies for user space resulting in application crashes.
- make MPX signal delivery work in compat mode
- make the dmesg output of traps and faults readable again"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel_rdt: Fix locking in rdtgroup_schemata_write()
x86/debug: Fix the printk() debug output of signal_fault(), do_trap() and do_general_protection()
x86/vdso: Plug race between mapping and ELF header setup
x86/vdso: Ensure vdso32_enabled gets set to valid values only
x86/signals: Fix lower/upper bound reporting in compat siginfo
Pull perf fixes from Thomas Gleixner:
"Two small fixes for perf:
- the move to support cross arch annotation introduced per arch
initialization requirements, fullfill them for s/390 (Christian
Borntraeger)
- add the missing initialization to the LBR entries to avoid exposing
random or stale data"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32()
perf annotate s390: Fix perf annotate error -95 (4.10 regression)