Only when calling the poll syscall the first time can user receive
POLLPRI correctly. After that, user always fails to acquire the event
signal.
Reproduce case:
1. Get the monitor code in Documentation/accounting/psi.txt
2. Run it, and wait for the event triggered.
3. Kill and restart the process.
The question is why we can end up with poll_scheduled = 1 but the work
not running (which would reset it to 0). And the answer is because the
scheduling side sees group->poll_kworker under RCU protection and then
schedules it, but here we cancel the work and destroy the worker. The
cancel needs to pair with resetting the poll_scheduled flag.
Link: http://lkml.kernel.org/r/1566357985-97781-1-git-send-email-joseph.qi@linux.alibaba.com
Signed-off-by: Jason Xing <kerneljasonxing@linux.alibaba.com>
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Caspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Similar to vmstats, percpu caching of local vmevents leads to an
accumulation of errors on non-leaf levels. This happens because some
leftovers may remain in percpu caches, so that they are never propagated
up by the cgroup tree and just disappear into nonexistence with on
releasing of the memory cgroup.
To fix this issue let's accumulate and propagate percpu vmevents values
before releasing the memory cgroup similar to what we're doing with
vmstats.
Since on cpu hotplug we do flush percpu vmstats anyway, we can iterate
only over online cpus.
Link: http://lkml.kernel.org/r/20190819202338.363363-4-guro@fb.com
Fixes: 42a3003535 ("mm: memcontrol: fix recursive statistics correctness & scalabilty")
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Percpu caching of local vmstats with the conditional propagation by the
cgroup tree leads to an accumulation of errors on non-leaf levels.
Let's imagine two nested memory cgroups A and A/B. Say, a process
belonging to A/B allocates 100 pagecache pages on the CPU 0. The percpu
cache will spill 3 times, so that 32*3=96 pages will be accounted to A/B
and A atomic vmstat counters, 4 pages will remain in the percpu cache.
Imagine A/B is nearby memory.max, so that every following allocation
triggers a direct reclaim on the local CPU. Say, each such attempt will
free 16 pages on a new cpu. That means every percpu cache will have -16
pages, except the first one, which will have 4 - 16 = -12. A/B and A
atomic counters will not be touched at all.
Now a user removes A/B. All percpu caches are freed and corresponding
vmstat numbers are forgotten. A has 96 pages more than expected.
As memory cgroups are created and destroyed, errors do accumulate. Even
1-2 pages differences can accumulate into large numbers.
To fix this issue let's accumulate and propagate percpu vmstat values
before releasing the memory cgroup. At this point these numbers are
stable and cannot be changed.
Since on cpu hotplug we do flush percpu vmstats anyway, we can iterate
only over online cpus.
Link: http://lkml.kernel.org/r/20190819202338.363363-2-guro@fb.com
Fixes: 42a3003535 ("mm: memcontrol: fix recursive statistics correctness & scalabilty")
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 0cfaee2af3 ("include/asm-generic/5level-fixup.h: fix variable
'p4d' set but not used") converted a few functions from macros to static
inline, which causes parisc to complain,
In file included from include/asm-generic/4level-fixup.h:38:0,
from arch/parisc/include/asm/pgtable.h:5,
from arch/parisc/include/asm/io.h:6,
from include/linux/io.h:13,
from sound/core/memory.c:9:
include/asm-generic/5level-fixup.h:14:18: error: unknown type name 'pgd_t'; did you mean 'pid_t'?
#define p4d_t pgd_t
^
include/asm-generic/5level-fixup.h:24:28: note: in expansion of macro 'p4d_t'
static inline int p4d_none(p4d_t p4d)
^~~~~
It is because "4level-fixup.h" is included before "asm/page.h" where
"pgd_t" is defined.
Link: http://lkml.kernel.org/r/20190815205305.1382-1-cai@lca.pw
Fixes: 0cfaee2af3 ("include/asm-generic/5level-fixup.h: fix variable 'p4d' set but not used")
Signed-off-by: Qian Cai <cai@lca.pw>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After commit 907ec5fca3 ("mm: zero remaining unavailable struct
pages"), struct page of reserved memory is zeroed. This causes
page->flags to be 0 and fixes issues related to reading
/proc/kpageflags, for example, of reserved memory.
The VM_BUG_ON() in move_freepages_block(), however, assumes that
page_zone() is meaningful even for reserved memory. That assumption is
no longer true after the aforementioned commit.
There's no reason why move_freepages_block() should be testing the
legitimacy of page_zone() for reserved memory; its scope is limited only
to pages on the zone's freelist.
Note that pfn_valid() can be true for reserved memory: there is a
backing struct page. The check for page_to_nid(page) is also buggy but
reserved memory normally only appears on node 0 so the zeroing doesn't
affect this.
Move the debug checks to after verifying PageBuddy is true. This
isolates the scope of the checks to only be for buddy pages which are on
the zone's freelist which move_freepages_block() is operating on. In
this case, an incorrect node or zone is a bug worthy of being warned
about (and the examination of struct page is acceptable bcause this
memory is not reserved).
Why does move_freepages_block() gets called on reserved memory? It's
simply math after finding a valid free page from the per-zone free area
to use as fallback. We find the beginning and end of the pageblock of
the valid page and that can bring us into memory that was reserved per
the e820. pfn_valid() is still true (it's backed by a struct page), but
since it's zero'd we shouldn't make any inferences here about comparing
its node or zone. The current node check just happens to succeed most
of the time by luck because reserved memory typically appears on node 0.
The fix here is to validate that we actually have buddy pages before
testing if there's any type of zone or node strangeness going on.
We noticed it almost immediately after bringing 907ec5fca3 in on
CONFIG_DEBUG_VM builds. It depends on finding specific free pages in
the per-zone free area where the math in move_freepages() will bring the
start or end pfn into reserved memory and wanting to claim that entire
pageblock as a new migratetype. So the path will be rare, require
CONFIG_DEBUG_VM, and require fallback to a different migratetype.
Some struct pages were already zeroed from reserve pages before
907ec5fca3c so it theoretically could trigger before this commit. I
think it's rare enough under a config option that most people don't run
that others may not have noticed. I wouldn't argue against a stable tag
and the backport should be easy enough, but probably wouldn't single out
a commit that this is fixing.
Mel said:
: The overhead of the debugging check is higher with this patch although
: it'll only affect debug builds and the path is not particularly hot.
: If this was a concern, I think it would be reasonable to simply remove
: the debugging check as the zone boundaries are checked in
: move_freepages_block and we never expect a zone/node to be smaller than
: a pageblock and stuck in the middle of another zone.
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1908122036560.10779@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In z3fold_destroy_pool() we call destroy_workqueue(&pool->compact_wq).
However, we have no guarantee that migration isn't happening in the
background at that time.
Migration directly calls queue_work_on(pool->compact_wq), if destruction
wins that race we are using a destroyed workqueue.
Link: http://lkml.kernel.org/r/20190809213828.202833-1-henryburns@google.com
Signed-off-by: Henry Burns <henryburns@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Fix of a memory leak (and related cleanups) in the hyper-v keyboard
driver by Dexuan Cui.
- Code cleanups for hyper-v clocksource driver during the merge window
by Dexuan Cui.
- Fix for a false positive warning in the userspace hyper-v KVP store by
Vitaly Kuznetsov.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAl1hT7EACgkQ3qZv95d3
LNy4PQ/+J8FDpxqBlg7gh5iv0FMM6z/knU2z6RNuwsXs5xPCF/rVvKb1MlW0V4pO
apGWAqCZbpslQfxN8rNKgOtJQ9FRSXVKzLapfeVI9loVvkoSZCGxrLU1fS0NBEL2
Eg1qzj9qAHMA6UyClNhQipKuK83ZBJJCPsupD2IjLzb0UcPKXgFS3FTJP7UkcREh
jV8djoyzmwQJDFBWsbp1IY1ZAhDhnqeEYJ8GZXziU58TfoT9zS0zrlO5/Pf+0mPT
5QNmLpK/e/l1gDra84DEY9zQUUYepHbCd54XGTMaMGPnUWXCCgDQcEAC4n1/i3ZP
lUP5AG994F0Z5csN3DKtTnX2Mz7DSXmx9ro4jnHDWVqhHKwrFqd9Ml+ukDkICf/S
6dgVcV9pUF1zvO235Dujo2Ht+95eHg+Dbh9aW21khzwKGUd/FyxzirpuTGQZol7N
CjnKnWkVyz8lGjrE6UQwa8oguqyIOjVPAXQv1xI1+jrVnXlZ+EbpMwnVBy39A1SU
BpimckYZwznmKVpmTU6L9MrIfM/cU1KTXt6mMLQfArdMef1a3O6AWsFg/mawFQPq
7C5ixC2BwW9yA0pjO1igmNDgnOqqIjqsOxl7BtQOxgnu5QPkLIY/FVXHNkhygsU3
CsWyo3C7QQK21MmIAFU0mFlrdMLIKuldNf/wkUVBcOTSMbX72CM=
=GJPQ
-----END PGP SIGNATURE-----
Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull Hyper-V fixes from Sasha Levin:
- Fix for panics and network failures on PAE guests by Dexuan Cui.
- Fix of a memory leak (and related cleanups) in the hyper-v keyboard
driver by Dexuan Cui.
- Code cleanups for hyper-v clocksource driver during the merge window
by Dexuan Cui.
- Fix for a false positive warning in the userspace hyper-v KVP store
by Vitaly Kuznetsov.
* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Drivers: hv: vmbus: Fix virt_to_hvpfn() for X86_PAE
Tools: hv: kvp: eliminate 'may be used uninitialized' warning
Input: hyperv-keyboard: Use in-place iterator API in the channel callback
Drivers: hv: vmbus: Remove the unused "tsc_page" from struct hv_context
- Two KVM fixes for MMIO emulation and UBSAN
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl1hJBkACgkQt6xw3ITB
YzTmiAgAxjvbjzgSM2osucgNekCKHPR6VX+tWpm6bUFXbc5J/s5pZXwDonSZG/ys
oEREohIu667kDb6+ryyefs+a9fCxK8LUf1oQHx0GdaPUEhVzghMgYxtJUbtL3kw3
HKDk9e7sb0tB7mkDVPI7hWCYPD2AjrmVu9j1G/DSXi57Iqc5FuDZwp8DTht7YdJm
rRAR7sEWNjrHFAPOIRUH9eshw1KDz8iOsLYSpir4oAFXprPG0XyEXkXbtTXgeeet
2onPBEM5RnBOjXJGK4EVtLgRVhYxq1EnhJ/JXH7/XKdlSaZAwf5QM/8jEUcSp2Sp
sv5gNG6gquf7TbQlDbAifHf5HB6Wsg==
=5rNC
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Two KVM/arm fixes for MMIO emulation and UBSAN.
Unusually, we're routing them via the arm64 tree as per Paolo's
request on the list:
https://lore.kernel.org/kvm/21ae69a2-2546-29d0-bff6-2ea825e3d968@redhat.com/
We don't actually have any other arm64 fixes pending at the moment
(touch wood), so I've pulled from Marc, written a merge commit, tagged
the result and run it through my build/boot/bisect scripts"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
KVM: arm/arm64: VGIC: Properly initialise private IRQ affinity
KVM: arm/arm64: Only skip MMIO insn once
Four fixes, three for edge conditions which don't occur very often.
The lpfc fix mitigates memory exhaustion for some high CPU systems.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXWEBrCYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishds+AQCyQlgV
TzSFQ1zvbAb3SNFdNsCzzb8Aq2vJC+RojF2VFgD/cJfE2fix9E7Nk8PCGwH1sgnf
m5Glsvv8BEmmtoikrb8=
=Ti6g
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four fixes, three for edge conditions which don't occur very often.
The lpfc fix mitigates memory exhaustion for some high CPU systems"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: Mitigate high memory pre-allocation by SCSI-MQ
scsi: ufs: Fix NULL pointer dereference in ufshcd_config_vreg_hpm()
scsi: target: tcmu: avoid use-after-free after command timeout
scsi: qla2xxx: Fix gnl.l memory leak on adapter init failure
- Fix a forgotten inode unlock when chown/chgrp fail due to quota.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl1gnj0ACgkQ+H93GTRK
tOvAlA/8DE5Ff/itTrz7D+1JCGxZgLyD1osTn8ZFuqLn6gEOR36i/WD+7infM5Tr
yowKvHXT3qOzAGGAyJFcjYkKx+wcYd7URR3105RFGVpd5FzW60lA/Cbzi7ecY7vL
e2ukHeWBfOJGZsIuw/+E/sl6PeTmcq3NzHyLSHg2hYjcxTW6wxmvTbporC3Ns73L
48AI39g1++1vz9W/T0wXNVGlDKih8gZIXtSTVqdbX3/sZ6C3dMiNqKUQTce+u/Nh
KI6aELb8ClhWhBv8fBBlCRZ9Zl1iHKEB9Rj4vwotzK2Fm4jnYh1m0R6tuL8BK7jd
H50qpokQ51RmtdWdicQ290S+XZi4kWpUaQiPl5f8Hf9UYj+M3Vg3zrwyx9O2xdnk
Oj4LPG/gvkFtJM5A9hhmK2VvEUqmb04ikovdOy1cmUYJmfyX+78968uX7Fkq4kbR
Gqk2m8zSxwbBxn8Io8jA0PsrQjrAU98rNibhHpcseSsmK2z44M6Ch+uXW8j9a4ws
xllJ2R0wtm0o9phIaUiwhaBq8/j1m8fe+1haUSeeeByMOl3j/oHtk0T8p/zbMAvz
EmMcF3Poe6vFeSXNZTqKuTVg9J445fKZizgouEtNmuBU/mYq9TkHjN6MaqwGDaMn
n8zzzpgoW1YT9Yxf6u0CzBBVZgjapF9wg6Op4JuDdsl/DU//UI8=
=gRWY
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.3-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fix from Darrick Wong:
"A single patch that fixes a xfs lockup problem when a chown/chgrp
operation fails due to running out of quota. It has survived the usual
xfstests runs and merges cleanly with this morning's master:
- Fix a forgotten inode unlock when chown/chgrp fail due to quota"
* tag 'xfs-5.3-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix missing ILOCK unlock when xfs_setattr_nonsize fails due to EDQUOT
mediatek:
- fix build in some cases
nouveau:
- fix hang with i2c and mst docks
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJdYMgbAAoJEAx081l5xIa+GkIP/1k6WUcpg1dRKVT17KOz1dd1
L03nK6yETBT2SrNRqZohQslkRhCY50BBXTZG93lvt2YQvnfLa6/CT2Obm6qzRaDG
mcpkibniikt0UDPpU5QXixEndyfo22DNYc8lr80bFnzWa+AH3/VOH7wlkEiXGUr/
hkc1FSu4LpBdHs0EQB8R5o3VJFOPDl/2ysvNitBZKXtZuAYswSja0ozxgasKSoG1
NxTSVrV5FIlf5Fy1odoOmEpdC2KsJGEOM/hOxg2KxgIm3KqHRmjKyDFlrKCxrf2i
TSu5ab4cgHDmyvZGb2nZsWS30eD/6Pw56RkqjGLTxT/CuuVMOVyzs6dfg+Jkpflr
qEwzN6rT+aH8ZL4zQIBFZwwH5Z5Cq8TBPLKhh+86QS2IY9sZ7MQcc3R7IsLmbI3Q
f1qcXeKeOZ66oLIhXttIDtlc8ZSUVy3fFAyaii0HwTGYDdbEkqgfW+rkI9tVob3m
FyI61VVpimpwfX/F2a0gfwEUFPmCzi37CAreu+kbP+5kqJFpc8Vt/qjCLkOMC8ju
ndXbvpw1dF4H6HQJqPT/SerluA/oWuHlhv/rSrdZhLPXxJPKTUZIGy78IMZxzIp5
ZpnrN165oIO85l2MfiwkTPQ7YI6mIiLm2GKVvTkWuiZ9k+v6BLEEKcokmdXJ/TYB
j4V9+SiVfMbJHarvb3CV
=4afb
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2019-08-24' of git://anongit.freedesktop.org/drm/drm
Pull more drm fixes from Dave Airlie:
"Although the tree built for me fine on arm here, it appears either
header cleanups in next or some kconfig combo it breaks, so this
contains a fix to mediatek to include dma-mapping.h explicitly.
There was also one nouveau fix that came in late that I was going to
leave until next week, but since I was sending this I thought it may
as well be in here:
mediatek:
- fix build in some cases
nouveau:
- fix hang with i2c and mst docks"
* tag 'drm-fixes-2019-08-24' of git://anongit.freedesktop.org/drm/drm:
drm/mediatek: include dma-mapping header
drm/nouveau: Don't retry infinitely when receiving no data on i2c over AUX
- Don't overskip instructions on MMIO emulation
- Fix UBSAN splat when initializing PPI priorities
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl1gE4sPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDvT0QALDyWgNXugpyOTbuH01zgTT5W2PxWPnLT6bl
yCN84C2falMjLgsvBGJo+HuD8nOTwCsam+6mVnbVOmXjDpFtsp1z/unJ9Cv9T4e+
1/TSDgp1Y1wJsdfVMqLOj2LOJterVC65e+eRp4ShEaCaGl0QsJLQIZNndoycen8K
XcwLokABKoypctGz/1XJD9fX0GeJdgJQ2dASVuccaWxvo0lrD5qoRlZUIdWKjTmn
OneayyIB8Dqn2Ju/bQ9bbTzg5VLfw2L9lnrVAlaFnWZETWHAtG6uCK6Zj/eDNZj8
TFBwXtKLbdJPQO+JR7l40QjvK/qkHdVaOp4M1kB+4niYogK23WlWvh7kZ/sZbAUb
A1PRQ39L6f0LJrJJtWeS/bJyUmBnX4PJkwZMNV4EN4fXDi2+79/DxUKXih+im/WN
W26WMAqFwxKiMSEENLfl4ladmrgo9SUBeI8QAnEgvUChCcy9HGpKgQp/KjImM9b3
ab87VS8BUYfyThF7PPshfBteWg3rHPQY2kjRn7B8yRhCcoWBErGtXkEdIhxvtfjk
hUgvT8CPk4uoh4DynqRxvDR16xMPwpUTtedVoZzIkGgG6ZLHAdX0303OLaRZ/KDl
j6vKdm8rU5I4samalckcHuoP+t2Hmmdbd0JNo+BaiorbtBXXXJQWd++85tCMniTg
kWGoHUn5
=FGm5
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-fixes-for-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm/fixes
Pull KVM/arm fixes from Marc Zyngier as per Paulo's request at:
https://lkml.kernel.org/r/21ae69a2-2546-29d0-bff6-2ea825e3d968@redhat.com
"One (hopefully last) set of fixes for KVM/arm for 5.3: an embarassing
MMIO emulation regression, and a UBSAN splat. Oh well...
- Don't overskip instructions on MMIO emulation
- Fix UBSAN splat when initializing PPI priorities"
* tag 'kvmarm-fixes-for-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm:
KVM: arm/arm64: VGIC: Properly initialise private IRQ affinity
KVM: arm/arm64: Only skip MMIO insn once
Although it builds fine here in my arm cross compile, it seems
either via some other patches in -next or some Kconfig combination,
this fails to build for everyone.
Include linux/dma-mapping.h should fix it.
Signed-off-by: Dave Airlie <airlied@redhat.com>
- Fix siw buffer mapping issue
- Fix siw 32/64 casting issues
- Fix a KASAN access issue in bnxt_re
- Fix several memory leaks (hfi1, mlx4)
- Fix a NULL deref in cma_cleanup
- Fixes for UMR memory support in mlx5 (4 patch series)
- Fix namespace check for restrack
- Fixes for counter support
- Fixes for hfi1 TID processing (5 patch series)
- Fix potential NULL deref in siw
- Fix memory page calculations in mlx5
Signed-off-by: Doug Ledford <dledford@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEErmsb2hIrI7QmWxJ0uCajMw5XL90FAl1gLY8ACgkQuCajMw5X
L930yxAAsPMWG9B8TYT80M/+4iA0SP2o9WqJ6VjWt5j8ArcjWKHb9aepmSxMViUq
T+W+ZLqk9tCfqEv88Z4T4iXBhwlzqzz6Xj5goohL2L4sOViit+YoNdx9stfW3yNh
DPOfoxwIehxMiy00OGViQ1F/nC4KeyTYtMtoPgnYeB/7Jqzc20ipkZNopi6MIufn
xSAwwaatzvj00nB1b+DC1eu9IzLWBjvzMmhPI9GBpgYTC6if43Q6PBwp6+hdag0K
jNMHvO2BvtjtMBiZsFtaO3wu2gKIgR5CMxVMsQ5wnhZMDE3kQI7Ilrl0za0RdwMa
+XMjn7mzExUjeNy2kXIxl3i9oH5s/iSmA1bqmcsG1q0dRmg7eI0iaB+ZTonpzMOB
+7oTeKnklR0Q6vQ+rKu24vEND86l4eAWcOhqFat2jzCWOYMgRsnq0NCxzvG6NduF
MdBNDkOIN+SmC+/9tMIwuUerZSrXGm2C6x4T6+YPYgxDOf+iKFm6g6lkGDrNCLsR
g7qDRCWxlhAOqjlnFuH2T56IEmlFPC8RDA9sICxUnjH295ucBfO7w5fGRX9eCc0N
0WrAlKbE9/4Hu7B2RN3slwJf2WpmE3ceOdEHSvfY2lje1SVXzYAl8o1bus8tSg3G
YfWKHkCwA+h8X81lS3KHmlJ9eJnPdUwwBmOrcPHR2aQAw7uboPw=
=bLpd
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Doug Ledford:
"No beating around the bush: this is a monster pull request for an -rc5
kernel. Intel hit me with a series of fixes for TID processing.
Mellanox hit me with a series for their UMR memory support.
And we had one fix for siw that fixes the 32bit build warnings and
because of the number of casts that had to be changed to properly
silence the warnings, that one patch alone is a full 40% of the LOC of
this entire pull request. Given that this is the initial release
kernel for siw, I'm trying to fix anything in it that we can, so that
adds to the impetus to take fixes for it like this one.
I had to do a rebase early in the week. Jason had thought he put a
patch on the rc queue that he needed to be there so he could base some
work off of it, and it had actually not been placed there. So he asked
me (on Tuesday) to fix that up before pushing my wip branch to the
official rc branch. I did, and that's why the early patches look like
they were all committed at the same time on Tuesday. That bunch had
been in my queue prior.
The various patches all pass my test for being legitimate fixes and
not attempts to slide new features or development into a late rc.
Well, they were all fixes with the exception of a couple clean up
patches people wrote for making the fixes they also wrote better (like
a cleanup patch to move UMR checking into a function so that the
remaining UMR fix patches can reference that function), so I left
those in place too.
My apologies for the LOC count and the number of patches here, it's
just how the cards fell this cycle.
Summary:
- Fix siw buffer mapping issue
- Fix siw 32/64 casting issues
- Fix a KASAN access issue in bnxt_re
- Fix several memory leaks (hfi1, mlx4)
- Fix a NULL deref in cma_cleanup
- Fixes for UMR memory support in mlx5 (4 patch series)
- Fix namespace check for restrack
- Fixes for counter support
- Fixes for hfi1 TID processing (5 patch series)
- Fix potential NULL deref in siw
- Fix memory page calculations in mlx5"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (21 commits)
RDMA/siw: Fix 64/32bit pointer inconsistency
RDMA/siw: Fix SGL mapping issues
RDMA/bnxt_re: Fix stack-out-of-bounds in bnxt_qplib_rcfw_send_message
infiniband: hfi1: fix memory leaks
infiniband: hfi1: fix a memory leak bug
IB/mlx4: Fix memory leaks
RDMA/cma: fix null-ptr-deref Read in cma_cleanup
IB/mlx5: Block MR WR if UMR is not possible
IB/mlx5: Fix MR re-registration flow to use UMR properly
IB/mlx5: Report and handle ODP support properly
IB/mlx5: Consolidate use_umr checks into single function
RDMA/restrack: Rewrite PID namespace check to be reliable
RDMA/counters: Properly implement PID checks
IB/core: Fix NULL pointer dereference when bind QP to counter
IB/hfi1: Drop stale TID RDMA packets that cause TIDErr
IB/hfi1: Add additional checks when handling TID RDMA WRITE DATA packet
IB/hfi1: Add additional checks when handling TID RDMA READ RESP packet
IB/hfi1: Unsafe PSN checking for TID RDMA READ Resp packet
IB/hfi1: Drop stale TID RDMA packets
RDMA/siw: Fix potential NULL de-ref
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl1gLIsQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpnNgD/9SVVtQ6DpSyPojSxVrcAfbH7n0Y+62Mfzs
yWeCpYvmxTd2APWAVtGeBh74uH58MYqwHBp6IKF1713WwENDpv5cDXtHCNi+d3xI
KulR9SQSC0wCIov7ak43TeKwuIUjn0cVz9VdrmaXLlp5f5nzEeNDixIlxaDXm1sf
PGksrXxnMnxKJU00uaW3J05E7GW/6kUDYq2IuG26cIkdA6c4TCj+y8uSnn2RNIsc
KeynzPx9UyX40weoLhb1HTi2HzZ+Cfz7t34kZZeluaJOiFkBdS5G/1sBf2MWdPwd
ZdpKCC86SmZF87pk9B455DALj3tqrvtym3nCn2HQ8jiNsgSqmUl+qTseH5OpLLbB
AL6OzSMh5HZ1g+hsBPgATVlb3GyJoSno3BZMAe+dTgu+wcv1sowajpm3p4rEQcbk
p6RmdmCz8mdCGuC0wWpVtQVk7nE0EKIBDMggM2T3dvRPkSTiep2Zdjg1iu/6HNlW
RSIWtcqo8H3CgOi7EcFjbHGLJ0kt98MUXcUHBTbwdGmRGhxbTUyKENL3FeWGiSZ/
Ojmnv4grdBch2rI4wmyenqnL/eQ37Mzr1nW5ZkHkcf27MP/v8HEhRDwS1a+YQr1x
acEsy7OC6nDyycsamWgSavm+x5t0zWWOjl6O92UbnZ3pvIkeoReXLbH9sjzzjj0c
VvBO9UArSg==
=uM7/
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20190823' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Here's a set of fixes that should go into this release. This contains:
- Three minor fixes for NVMe.
- Three minor tweaks for the io_uring polling logic.
- Officially mark Song as the MD maintainer, after he's been filling
that role sucessfully for the last 6 months or so"
* tag 'for-linus-20190823' of git://git.kernel.dk/linux-block:
io_uring: add need_resched() check in inner poll loop
md: update MAINTAINERS info
io_uring: don't enter poll loop if we have CQEs pending
nvme: Add quirk for LiteON CL1 devices running FW 22301111
nvme: Fix cntlid validation when not using NVMEoF
nvme-multipath: fix possible I/O hang when paths are updated
io_uring: fix potential hang with polled IO
proper fix has been made to the block loopback driver.
- Fix DM kcopyd to wakeup so failed subjobs get completed.
- Various fixes to DM zoned target to address error handling, and other
small tweaks (SPDX license identifiers and fix typos).
- Fix DM integrity range locking race by tracking whether journal has
changed.
- Fix DM dust target to detect reads of badblocks beyond the first 512b
sector (applicable if blocksize is larger than 512b).
- Fix DM persistent-data issue in both the DM btree and DM
space-map-metadata interfaces.
- Fix out of bounds memory access with certain DM table configurations.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAl1gCAITHHNuaXR6ZXJA
cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWmKwB/kBsKiN2Vt1a4RuwUvLvEr9aijZ3HEe
l6lwZ8rB6WRDAc4rEbteqKbCMvjg1RMZwkzL3RPrtWtjYdsdC/yJzHGETIym3Ckd
0s1nfZgJ7jWFilwR5/RJ9bFYADjqUwAKdzc49sAT/aEPEaQywYrV7ZiD9rVZf/o5
oQxDMps/zWbayeF2oS1tyb7m1qi8xN3yGe575vXaj+ag+10JbGiYcSObLUwyYCJu
WqELCL3JMiaC6QkZjZWpV99V9+0yO/Px0zwuq6jRSx6VAgKGLV2CoFk0ibsRa/vI
8IyeMwybRfSzUqMnzeh57F1H0FXrvYnD6c8obnDlGP28ZSRQQJvfm3TQ
=R5Dn
-----END PGP SIGNATURE-----
Merge tag 'for-5.3/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Revert a DM bufio change from during the 5.3 merge window now that a
proper fix has been made to the block loopback driver.
- Fix DM kcopyd to wakeup so failed subjobs get completed.
- Various fixes to DM zoned target to address error handling, and other
small tweaks (SPDX license identifiers and fix typos).
- Fix DM integrity range locking race by tracking whether journal has
changed.
- Fix DM dust target to detect reads of badblocks beyond the first 512b
sector (applicable if blocksize is larger than 512b).
- Fix DM persistent-data issue in both the DM btree and DM
space-map-metadata interfaces.
- Fix out of bounds memory access with certain DM table configurations.
* tag 'for-5.3/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm table: fix invalid memory accesses with too high sector number
dm space map metadata: fix missing store of apply_bops() return value
dm btree: fix order of block initialization in btree_split_beneath
dm raid: add missing cleanup in raid_ctr()
dm zoned: fix potential NULL dereference in dmz_do_reclaim()
dm dust: use dust block size for badblocklist index
dm integrity: fix a crash due to BUG_ON in __journal_read_write()
dm zoned: fix a few typos
dm zoned: add SPDX license identifiers
dm zoned: properly handle backing device failure
dm zoned: improve error handling in i/o map code
dm zoned: improve error handling in reclaim
dm kcopyd: always complete failed jobs
Revert "dm bufio: fix deadlock with loop device"
- Fix missing compat ioctl handling for get/setlabel
- Fix missing ioctl pointer sanitization on s390
- Fix a page locking deadlock in the dedupe comparison code
- Fix inadequate locking in reflink code w.r.t. concurrent directio
- Fix broken error detection when breaking layouts
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl1cEXsACgkQ+H93GTRK
tOsXlhAAiUowRArnwXnqR+5Z7e3nyFZOeL0DTJHVE3UpKABz/NBnevQgsy70Bqmk
mo27ANMY8y9i7zatuCvM9UX8PXnOdaUKwoey8j5BB44iaEAkz9afeOt09PuCe141
sNucDjq7yQWkhDNd38lujpcXMNqlVNDkDtpYGx8ArzdVaEJfudqgHFqR+lnL2LRH
xylaJprOxcE6tCFmCVsvQmlnIbuCMWF1e7B5IA0Aoh6dLTWdD8nRNbPi9PNp3nbK
c7UvsDcl2SrngXFbdgGCexmguKT29va8t/GkwRVPmhXgu/hslOIcZPhqIti/LG2w
7u6CuvTa22xIA0yX9utCSq04HSKRsDKygPpYuI3U10caKmvUsvXpMFZ3goktqAgd
8pUZpapMGORe2W+b5Wa1vi5/wv+MKMOxeeAoui38KyDJvFNOADT6hlQ//GfuJSph
/4d7BKcZFykWEl/NI2tzaoiCzHy3ObdBTi3eloNjFE/KxVKKuBbjX/j6YisyhUpW
i6/i4i1POp5E41tM3u17cC2DmgYiqFCzg799yrt1QBgqOCVZvGyOHR4X2B4AFWSh
RALHKS2hBdzDIIRwLJVzA428kRMRptRviELgluJLLvx7fIrhGJ3URNzFBVty+fJi
YG8d1WUHcxLamO3ayjydyWCgO7W8tWOP/jCOGe/2apU+hCNZFUk=
=50ZB
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.3-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"Here are a few more bug fixes that trickled in since the last pull.
They've survived the usual xfstests runs and merge cleanly with this
morning's master.
I expect there to be one more pull request tomorrow for the fix to
that quota related inode unlock bug that we were reviewing last night,
but it will continue to soak in the testing machine for several more
hours.
- Fix missing compat ioctl handling for get/setlabel
- Fix missing ioctl pointer sanitization on s390
- Fix a page locking deadlock in the dedupe comparison code
- Fix inadequate locking in reflink code w.r.t. concurrent directio
- Fix broken error detection when breaking layouts"
* tag 'xfs-5.3-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
fs/xfs: Fix return code of xfs_break_leased_layouts()
xfs: fix reflink source file racing with directio writes
vfs: fix page locking deadlocks when deduping files
xfs: compat_ioctl: use compat_ptr()
xfs: fall back to native ioctls for unhandled compat ones
At the moment we initialise the target *mask* of a virtual IRQ to the
VCPU it belongs to, even though this mask is only defined for GICv2 and
quickly runs out of bits for many GICv3 guests.
This behaviour triggers an UBSAN complaint for more than 32 VCPUs:
------
[ 5659.462377] UBSAN: Undefined behaviour in virt/kvm/arm/vgic/vgic-init.c:223:21
[ 5659.471689] shift exponent 32 is too large for 32-bit type 'unsigned int'
------
Also for GICv3 guests the reporting of TARGET in the "vgic-state" debugfs
dump is wrong, due to this very same problem.
Because there is no requirement to create the VGIC device before the
VCPUs (and QEMU actually does it the other way round), we can't safely
initialise mpidr or targets in kvm_vgic_vcpu_init(). But since we touch
every private IRQ for each VCPU anyway later (in vgic_init()), we can
just move the initialisation of those fields into there, where we
definitely know the VGIC type.
On the way make sure we really have either a VGICv2 or a VGICv3 device,
since the existing code is just checking for "VGICv3 or not", silently
ignoring the uninitialised case.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reported-by: Dave Martin <dave.martin@arm.com>
Tested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
an assert and a NULL pointer dereference) plus a small series from Luis
fixing instances of vfree() under spinlock.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl1f2fITHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzi83fB/0a+TnNY8Q2aEeB9Y/0sckSpRCsMGMV
syt2krwKC0EYM1f2dkJdgCjlSjMzMcHPseP3g5odRXgyPKJt5O9oE7l3vGDC4Oyt
chqhEh86UzG6Kcptx6tIzsAGYS9S4NzxR5sfXF6oRu8m1bwk1n5IhKxYjQDTvAMd
RxwvpdguNA9xvHeUvLMTpy2R3qE3uQ2dxierutW67GeyeCPkvyBmazzi72Q36hlL
y1w8DWaPBemBk5QEM9vmz5i2xQeLO4h4ejhP4LcXyVjJtfvAPl0JWOsHMK4uWRJf
6XjbGDaGYvID0hTQLlEw/k73976HmRxSbaXRtCZN+IG3yWGTL8ID6GqI
=kaFB
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-5.3-rc6' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Three important fixes tagged for stable (an indefinite hang, a crash
on an assert and a NULL pointer dereference) plus a small series from
Luis fixing instances of vfree() under spinlock"
* tag 'ceph-for-5.3-rc6' of git://github.com/ceph/ceph-client:
libceph: fix PG split vs OSD (re)connect race
ceph: don't try fill file_lock on unsuccessful GETFILELOCK reply
ceph: clear page dirty before invalidate page
ceph: fix buffer free while holding i_ceph_lock in fill_inode()
ceph: fix buffer free while holding i_ceph_lock in __ceph_build_xattrs_blob()
ceph: fix buffer free while holding i_ceph_lock in __ceph_setxattr()
libceph: allow ceph_buffer_put() to receive a NULL ceph_buffer
rcar-du:
- LVDS dual-link mode fix
mediatek:
- of node refcount fix
- prime buffer import fix
- dma max seg fix
komeda:
- output polling fix
- abfc format fix
- memory-region DT fix
amdgpu:
- bpc display fix
- ioctl memory leak fix
- gfxoff fix
- smu warnings fix
i915:
- HDMI mode readout fix
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJdX0hyAAoJEAx081l5xIa+1VQP/A6ItEQ3tC2fsf7ukB2w6HvS
lHDP9WMgujR/8q/Q3IguXUiuCba+WsftE1ebv6G8HrcVrFe97roB2mX2g9XBYcPU
oDW7jK8tbOQZ6J+aEQw5IPC4Ko4zOFtf47rUJMd2ceVuqHSf677Y8ZGUy2TSPlmn
uDcwEgvZxGA6/uZwxQBpbiiWHX3l3UPVaRNhmv3K8mOtN95qHp6mhsKinnmwqMWj
9tnVTwM+kA5+n0DAYJfmAaQz7j0znfqNQra8mpGjLeLiHMNJiQ3LMdT9DqwvjFoj
QqIlu/pZWlEsn2QBsxXWflZT02UGgutTgs5D5VeoCmVe9LM+b/XLZQeWwaCNZJmp
XQiHXlm4nhtCxOYbvqGYQrXP6ffSK+aNwKix9DELF4oRl7ZmQ/C6sfrBS2Xdy5rX
PJB1FVU16Y58/kRbkTdkaW3nz/vh5CMF5BactIYrfkQHx2x+F79QvCyRYZZlJ4S5
gZNJmhFeU/AcAHSf30NNDTjvIg3fzKZh935s+kP/9JItUBHNt+lL4KJivPZopVVL
1Ow5+QMnOWfjaSftMQ26FX2/3YAwY/bDmlzos6cluGycd7K2c/oHl8OdC3QLkUr/
w8J+vOyRZNkeGOCbODrKqmVJhlQ5BbXFiNUcMAzrzaL0OczJ1CRTIyMtZoZkK4zi
wa8Cv/IebpT4Rrsjpvqi
=eo32
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2019-08-23' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Live from the laundromat after my washing machine broke down, we have
the 5.3-rc6 fixes. Changelog is in the tag below, but nothing too
noteworthy in here:
rcar-du:
- LVDS dual-link mode fix
mediatek:
- of node refcount fix
- prime buffer import fix
- dma max seg fix
komeda:
- output polling fix
- abfc format fix
- memory-region DT fix
amdgpu:
- bpc display fix
- ioctl memory leak fix
- gfxoff fix
- smu warnings fix
i915:
- HDMI mode readout fix"
* tag 'drm-fixes-2019-08-23' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu/powerplay: silence a warning in smu_v11_0_setup_pptable
drm/amd/display: Calculate bpc based on max_requested_bpc
drm/amdgpu: prevent memory leaks in AMDGPU_CS ioctl
drm/amd/amdgpu: disable MMHUB PG for navi10
drm/amd/powerplay: remove duplicate macro smu_get_uclk_dpm_states in amdgpu_smu.h
drm/amd/powerplay: fix variable type errors in smu_v11_0_setup_pptable
drm/amdgpu/gfx9: update pg_flags after determining if gfx off is possible
drm/i915: Fix HW readout for crtc_clock in HDMI mode
drm/mediatek: mtk_drm_drv.c: Add of_node_put() before goto
drm: rcar_lvds: Fix dual link mode operations
drm/mediatek: set DMA max segment size
drm/mediatek: use correct device to import PRIME buffers
drm/omap: ensure we have a valid dma_mask
drm/komeda: Add support for 'memory-region' DT node property
drm/komeda: Adds internal bpp computing for arm afbc only format YU08 YU10
drm/komeda: Initialize and enable output polling on Komeda
If the sector number is too high, dm_table_find_target() should return a
pointer to a zeroed dm_target structure (the caller should test it with
dm_target_is_valid).
However, for some table sizes, the code in dm_table_find_target() that
performs btree lookup will access out of bound memory structures.
Fix this bug by testing the sector number at the beginning of
dm_table_find_target(). Also, add an "inline" keyword to the function
dm_table_get_size() because this is a hot path.
Fixes: 512875bd96 ("dm: table detect io beyond device")
Cc: stable@vger.kernel.org
Reported-by: Zhang Tao <kontais@zoho.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Benjamin Moody reported to Debian that XFS partially wedges when a chgrp
fails on account of being out of disk quota. I ran his reproducer
script:
# adduser dummy
# adduser dummy plugdev
# dd if=/dev/zero bs=1M count=100 of=test.img
# mkfs.xfs test.img
# mount -t xfs -o gquota test.img /mnt
# mkdir -p /mnt/dummy
# chown -c dummy /mnt/dummy
# xfs_quota -xc 'limit -g bsoft=100k bhard=100k plugdev' /mnt
(and then as user dummy)
$ dd if=/dev/urandom bs=1M count=50 of=/mnt/dummy/foo
$ chgrp plugdev /mnt/dummy/foo
and saw:
================================================
WARNING: lock held when returning to user space!
5.3.0-rc5 #rc5 Tainted: G W
------------------------------------------------
chgrp/47006 is leaving the kernel with locks still held!
1 lock held by chgrp/47006:
#0: 000000006664ea2d (&xfs_nondir_ilock_class){++++}, at: xfs_ilock+0xd2/0x290 [xfs]
...which is clearly caused by xfs_setattr_nonsize failing to unlock the
ILOCK after the xfs_qm_vop_chown_reserve call fails. Add the missing
unlock.
Reported-by: benjamin.moody@gmail.com
Fixes: 253f4911f2 ("xfs: better xfs_trans_alloc interface")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
While I had thought I had fixed this issue in:
commit 342406e4fb ("drm/nouveau/i2c: Disable i2c bus access after
->fini()")
It turns out that while I did fix the error messages I was seeing on my
P50 when trying to access i2c busses with the GPU in runtime suspend, I
accidentally had missed one important detail that was mentioned on the
bug report this commit was supposed to fix: that the CPU would only lock
up when trying to access i2c busses _on connected devices_ _while the
GPU is not in runtime suspend_. Whoops. That definitely explains why I
was not able to get my machine to hang with i2c bus interactions until
now, as plugging my P50 into it's dock with an HDMI monitor connected
allowed me to finally reproduce this locally.
Now that I have managed to reproduce this issue properly, it looks like
the problem is much simpler then it looks. It turns out that some
connected devices, such as MST laptop docks, will actually ACK i2c reads
even if no data was actually read:
[ 275.063043] nouveau 0000:01:00.0: i2c: aux 000a: 1: 0000004c 1
[ 275.063447] nouveau 0000:01:00.0: i2c: aux 000a: 00 01101000 10040000
[ 275.063759] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000001
[ 275.064024] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
[ 275.064285] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
[ 275.064594] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
Because we don't handle the situation of i2c ack without any data, we
end up entering an infinite loop in nvkm_i2c_aux_i2c_xfer() since the
value of cnt always remains at 0. This finally properly explains how
this could result in a CPU hang like the ones observed in the
aforementioned commit.
So, fix this by retrying transactions if no data is written or received,
and give up and fail the transaction if we continue to not write or
receive any data after 32 retries.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
- dma fix for omap.
- Make output polling work on komeda.
- Fix bpp computing for AFBC formats in komeda.
- Support the memory-region property in komeda.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAl1ebNEACgkQ/lWMcqZw
E8MuVQ/8DveoeV+CwTWF8Y9nY2IlhNg4CVcyQVRu/CX/Kiajoc6u+O/p0GPtX3RJ
BW+ulgyCYTrHSGC7QxoJzzrY4ZenQsmda1ATHVxcI/wjXD3Pj5bQ+Q1sDULn975O
QuNcs8duamQbmJoWTyeD/yv2+fBvxjAoB3x9Xxir5xK+gxO09+Q8Jmq9y8eGs0Mt
LWOfnQGmAQEDtl2CPUIU+/r8iHPuPZiGgt/pJ/teheMQNpJ07Y1Gqhed0MJnPiEc
7irnLKMsnkgys8j0eYah8TkjxWtRcTNOWnC276uA+tjgqQzvatwsD+MbNBKZRVcj
2VymC885LyiPstE5xayy7yFe/Rf6gmSTjC7Zi/tckPlVK7OFyQWeFoJr8diQhS/Q
RsfoeGMxXfjTO88S2lY8CFK6IhvH33HaNOh++uz0MUiYH0t23ww7vsjM/o6IfEBH
5RPS80FbnFMFFuwfsZwpLzr2dWm4daZSzxUrwTkV5mbTRuDQ6ENLWxdMdAKe6fBU
V4sEmxMWK0PMSNw3aR1QiSOMcutrKGV7hInzbOYDPhZio2FeU/aZE8cz/KwJKa0s
RdWZjUmhKuVprmB+7vs5z52IDVghjgZCcZkrVq9VTW2v72olP7ozIy1Y4osOj+Ar
SY+q5Nqsi5pePR1QNSsnMLe+BxYvFvyCCHlydRL1tjDkfXsSr/g=
=IG8M
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-fixes-2019-08-22' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Fixes for v5.3-rc6:
- dma fix for omap.
- Make output polling work on komeda.
- Fix bpp computing for AFBC formats in komeda.
- Support the memory-region property in komeda.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/5f1fdfe3-814e-fad1-663c-7279217fc085@linux.intel.com
The outer poll loop checks for whether we need to reschedule, and
returns to userspace if we do. However, it's possible to get stuck
in the inner loop as well, if the CPU we are running on needs to
reschedule to finish the IO work.
Add the need_resched() check in the inner loop as well. This fixes
a potential hang if the kernel is configured with
CONFIG_PREEMPT_VOLUNTARY=y.
Reported-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl1e+hgUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vwwIxAApFrZsr3HcXVZUihrNKc0t7mcbGIT
YZDi7pkkw/L76arjsKmhnyBcjEId+ZbLiitpd9xpgywMq5Nt2/DkQUtlSgNIEzTQ
n0n7tX6f/jPpXcpEBNBru5ZjAfoOHLSZpS01gYlzI7URXxmro3Sl/SrEbgZfdY4J
+GLm5aW/RHGNvhv9davM0tzOI9Du+U7NAVBeARyC2s+8YFKZQA3xz2qhmTpeBeW+
9DWZu2kDTUmcGjmzSNwtg+inEv4cyacg6/uHE3rGPaUV3OJ0/xAAWFbEikhf8Bav
HscL7s8gqzJd0Fy/SbXJNmLjdFFT9PxUpB3w92toy043yLcYPCuPiz4O2Rx7IATl
QoX46AQ7hurbADhZ2r5KeFaSyS0C/QrLCmKgdm1CIngUwsbVStG8SrV2s61K9W6m
xOur/iAnk2u1EMBUMcNfbQoXZFKHZgYKmOP1AKXx5eLxh+N9QbswnrqHH/v93Wjz
1BmXjSYZ7IAOuEX7/PO106nlNhq6bUKU95jll8qibetSrlbkOR9CLS9/TUTwnNYm
TTp5bpDNhrybXogTKyNQgyuL1pUTqOtCyupcx/ysj+GhjiVc1AzomXmx9aepsqcC
p+WoBcUuWLuDgWZn9OCCzv9zWH2EIRnY3pPwAu2JT2kqsT7na/Mbi6R0Rzo48Rxa
eXt+S2qWWtVrEVg=
=APp7
-----END PGP SIGNATURE-----
Merge tag 'pci-v5.3-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Reset both NVIDIA GPU and HDA in ThinkPad P50 quirk, which was broken
by another quirk that enabled the HDA device (Lyude Paul)
- Fix pciebus-howto.rst documentation filename typo (Bjorn Helgaas)
* tag 'pci-v5.3-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Documentation PCI: Fix pciebus-howto.rst filename typo
PCI: Reset both NVIDIA GPU and HDA in ThinkPad P50 workaround
In commit 6096d91af0 ("dm space map metadata: fix occasional leak
of a metadata block on resize"), we refactor the commit logic to a new
function 'apply_bops'. But when that logic was replaced in out() the
return value was not stored. This may lead out() returning a wrong
value to the caller.
Fixes: 6096d91af0 ("dm space map metadata: fix occasional leak of a metadata block on resize")
Cc: stable@vger.kernel.org
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
When btree_split_beneath() splits a node to two new children, it will
allocate two blocks: left and right. If right block's allocation
failed, the left block will be unlocked and marked dirty. If this
happened, the left block'ss content is zero, because it wasn't
initialized with the btree struct before the attempot to allocate the
right block. Upon return, when flushing the left block to disk, the
validator will fail when check this block. Then a BUG_ON is raised.
Fix this by completely initializing the left block before allocating and
initializing the right block.
Fixes: 4dcb8b57df ("dm btree: fix leak of bufio-backed block in btree_split_beneath error path")
Cc: stable@vger.kernel.org
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Hi Linus,
Please, pull the following patches that mark switch cases where we are
expecting to fall through.
- Fix fall-through warnings on arm and mips for multiple
configurations.
Thanks
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAl1clmEACgkQRwW0y0cG
2zGqbg/9HPC3Cf3oYq4o0/kV+cfS0ir6iJCz1mspFfbBloaS/EU7A2CF35bDz7k3
XUzl/ci82EQCnuJv/X6ddayUF1S/vFWLnQXRznz07kJspUnNpu7JKgsZr2qsHaRe
CfCj62J/Kuhnke8EUjuWEuga6YXYsYlcevgg/tpVXsTmxrpq2A15tWyut7WEe4JQ
kWPELwYbPsDvTj2siZrgMRBx4gVzQKQVo5TpZiuADeJu9RuFT/64PI9TDQGE7c+X
fFq4ijd1YPj/E+WI7k5VdUbXYiPIIXmkJ4VAPcu5VWmUS7y7bTeye0Jc3uYAxI1r
7rykYhNzniGn3SZL+wq8rHchL3dTLBYhd34HhTlb5xdGFwmbzKgHBqdlGpH8HOo+
CLu8kPYdmnzYCth4md0ENwgBVkj0tweyZuMzCys1qR6RFhOipxWLNGEvIXWZ0Sp8
uNyXnPdCrZTmlwubwY4FOOLsGKW06GnD64cfmEYoCMcmT2j7clbjasWYM4PXQvbt
0dVtt8k4M5LJBLh8qTX7RMZHDQYMiiYiMnLLAXf4wB0VUTqgNuLc4k0PpX3kBYtO
4b0lU/LQH+8811BMNVBHK55StQ8DjM0C2yfQWx610eoohjV70JTyxOWoqeHFL5hq
DIFdLDOgvJCqtyYgJDjmCmH9x6lgfvmxAKq66h9Z7vt25KLUizQ=
=fQZm
-----END PGP SIGNATURE-----
Merge tag 'Wimplicit-fallthrough-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull more fallthrough fixes from Gustavo A. R. Silva:
"Fix fall-through warnings on arm and mips for multiple configurations"
* tag 'Wimplicit-fallthrough-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
video: fbdev: acornfb: Mark expected switch fall-through
scsi: libsas: sas_discover: Mark expected switch fall-through
MIPS: Octeon: Mark expected switch fall-through
power: supply: ab8500_charger: Mark expected switch fall-through
watchdog: wdt285: Mark expected switch fall-through
mtd: sa1100: Mark expected switch fall-through
drm/sun4i: tcon: Mark expected switch fall-through
drm/sun4i: sun6i_mipi_dsi: Mark expected switch fall-through
ARM: riscpc: Mark expected switch fall-through
dmaengine: fsldma: Mark expected switch fall-through
Fixes:
1. platform/chrome: cros_ec_ishtp: fix crash during suspend
- Fixes a kernel crash during suspend/resume of cros_ec_ishtp
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQCtZK6p/AktxXfkOlzbaomhzOwwgUCXV7G+QAKCRBzbaomhzOw
wsX6AQCrsQJST4h9Zd3BjD9wPlFoQbgC1D8SUv1wMMzoaA/PHAD/bU7IfeE5s2R6
+UOqOrnI94hK5lQfBcRRK5RlnnpZyQc=
=uWdA
-----END PGP SIGNATURE-----
Merge tag 'tag-chrome-platform-fixes-for-v5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform fix from Benson Leung:
"Fix a kernel crash during suspend/resume of cros_ec_ishtp"
* tag 'tag-chrome-platform-fixes-for-v5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: cros_ec_ishtp: fix crash during suspend
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl1ekN4ACgkQ+7dXa6fL
C2v5YA//WpHrecLwBiBfd4UE1QndDVC7bC1aVvmUsPYsMNTnc1wqD7zwPSVAkXt9
u7WVa0XsOK4Ks9PpNwwmtlFk2nSXvFbb1WsPiyUX/QWC+tB0jdHEvkymEonVPn85
UuNMcCx2Yzv7Mxw9aESWDziEN5PzsOChZC1M8fpVuEBDcqqbkkdSTM1LPzfHkRn5
4/OFnlaC/4D4qEfv+0gFZjf6zBEPicHRfgSWYgzyBxsEwZ5eGzTcpVSYPEJRsuYF
Ndqp0ei/65wUihk2gyoNG5PkC/9oouQV9ko17QG1uhiqrFpECiAkbyf8YmkUTDSc
WvNtKN3HnLKJhCPoJ1SpE1qFs0Iw10y2BySO2XLoj7N7421aSIU+nemQ9yZ1mQgc
GGwpBx1jIPMsN0IDXG8HIJCW3aUNU+Ygg2X7gvpF2gOvB29LVPN48/6kahpeQpAR
vzLRUod9+H4wD3kLqpOjDOCPmokZNktn+8rtqlctyCvwp41JBbmQ9/r68aoFhpe9
fFN4zhd3E365tgX63ooUQVa4thc09ltcYTAAhEz1Ma8kRsigwZ6pY5xSrpZ0dehW
4SEykEsqQDlSmFV0G/063F66M621o69VvETe8lhOsVVK3XVWzGkDdIXS1iGlFrNx
A/hXcr2rwau5qomo00blyPyeh2DcQhsAPI3SJyq7JL2bK4JEQD4=
=1/ML
-----END PGP SIGNATURE-----
Merge tag 'afs-fixes-20190822' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS fixes from David Howells:
- Fix a cell record leak due to the default error not being cleared.
- Fix an oops in tracepoint due to a pointer that may contain an error.
- Fix the ACL storage op for YFS where the wrong op definition is being
used. By luck, this only actually affects the information appearing
in traces.
* tag 'afs-fixes-20190822' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: use correct afs_call_type in yfs_fs_store_opaque_acl2
afs: Fix possible oops in afs_lookup trace event
afs: Fix leak in afs_lookup_cell_rcu()
All user level and most in-kernel applications submit WQEs
where the SG list entries are all of a single type.
iSER in particular, however, will send us WQEs with mixed SG
types: sge[0] = kernel buffer, sge[1] = PBL region.
Check and set is_kva on each SG entry individually instead of
assuming the first SGE type carries through to the last.
This fixes iSER over siw.
Fixes: b9be6f18cf ("rdma/siw: transmit path")
Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Tested-by: Krishnamraju Eraparaju <krishna2@chelsio.com>
Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
Link: https://lore.kernel.org/r/20190822150741.21871-1-bmt@zurich.ibm.com
Signed-off-by: Doug Ledford <dledford@redhat.com>
It seems that 'yfs_RXYFSStoreOpaqueACL2' should be use in
yfs_fs_store_opaque_acl2().
Fixes: f5e4546347 ("afs: Implement YFS ACL setting")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David Howells <dhowells@redhat.com>
The afs_lookup trace event can cause the following:
[ 216.576777] BUG: kernel NULL pointer dereference, address: 000000000000023b
[ 216.576803] #PF: supervisor read access in kernel mode
[ 216.576813] #PF: error_code(0x0000) - not-present page
...
[ 216.576913] RIP: 0010:trace_event_raw_event_afs_lookup+0x9e/0x1c0 [kafs]
If the inode from afs_do_lookup() is an error other than ENOENT, or if it
is ENOENT and afs_try_auto_mntpt() returns an error, the trace event will
try to dereference the error pointer as a valid pointer.
Use IS_ERR_OR_NULL to only pass a valid pointer for the trace, or NULL.
Ideally the trace would include the error value, but for now just avoid
the oops.
Fixes: 80548b0399 ("afs: Add more tracepoints")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Fix a leak on the cell refcount in afs_lookup_cell_rcu() due to
non-clearance of the default error in the case a NULL cell name is passed
and the workstation default cell is used.
Also put a bit at the end to make sure we don't leak a cell ref if we're
going to be returning an error.
This leak results in an assertion like the following when the kafs module is
unloaded:
AFS: Assertion failed
2 == 1 is false
0x2 == 0x1 is false
------------[ cut here ]------------
kernel BUG at fs/afs/cell.c:770!
...
RIP: 0010:afs_manage_cells+0x220/0x42f [kafs]
...
process_one_work+0x4c2/0x82c
? pool_mayday_timeout+0x1e1/0x1e1
? do_raw_spin_lock+0x134/0x175
worker_thread+0x336/0x4a6
? rescuer_thread+0x4af/0x4af
kthread+0x1de/0x1ee
? kthread_park+0xd4/0xd4
ret_from_fork+0x24/0x30
Fixes: 989782dcdc ("afs: Overhaul cell database management")
Signed-off-by: David Howells <dhowells@redhat.com>
If after an MMIO exit to userspace a VCPU is immediately run with an
immediate_exit request, such as when a signal is delivered or an MMIO
emulation completion is needed, then the VCPU completes the MMIO
emulation and immediately returns to userspace. As the exit_reason
does not get changed from KVM_EXIT_MMIO in these cases we have to
be careful not to complete the MMIO emulation again, when the VCPU is
eventually run again, because the emulation does an instruction skip
(and doing too many skips would be a waste of guest code :-) We need
to use additional VCPU state to track if the emulation is complete.
As luck would have it, we already have 'mmio_needed', which even
appears to be used in this way by other architectures already.
Fixes: 0d640732db ("arm64: KVM: Skip MMIO insn after emulation")
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
We can't rely on ->peer_features in calc_target() because it may be
called both when the OSD session is established and open and when it's
not. ->peer_features is not valid unless the OSD session is open. If
this happens on a PG split (pg_num increase), that could mean we don't
resend a request that should have been resent, hanging the client
indefinitely.
In userspace this was fixed by looking at require_osd_release and
get_xinfo[osd].features fields of the osdmap. However these fields
belong to the OSD section of the osdmap, which the kernel doesn't
decode (only the client section is decoded).
Instead, let's drop this feature check. It effectively checks for
luminous, so only pre-luminous OSDs would be affected in that on a PG
split the kernel might resend a request that should not have been
resent. Duplicates can occur in other scenarios, so both sides should
already be prepared for them: see dup/replay logic on the OSD side and
retry_attempt check on the client side.
Cc: stable@vger.kernel.org
Fixes: 7de030d6b1 ("libceph: resend on PG splits if OSD has RESEND_ON_SPLIT")
Link: https://tracker.ceph.com/issues/41162
Reported-by: Jerry Lee <leisurelysw24@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Jerry Lee <leisurelysw24@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
When ceph_mdsc_do_request returns an error, we can't assume that the
filelock_reply pointer will be set. Only try to fetch fields out of
the r_reply_info when it returns success.
Cc: stable@vger.kernel.org
Reported-by: Hector Martin <hector@marcansoft.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
clear_page_dirty_for_io(page) before mapping->a_ops->invalidatepage().
invalidatepage() clears page's private flag, if dirty flag is not
cleared, the page may cause BUG_ON failure in ceph_set_page_dirty().
Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/40862
Signed-off-by: Erqi Chen <chenerqi@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Calling ceph_buffer_put() in fill_inode() may result in freeing the
i_xattrs.blob buffer while holding the i_ceph_lock. This can be fixed by
postponing the call until later, when the lock is released.
The following backtrace was triggered by fstests generic/070.
BUG: sleeping function called from invalid context at mm/vmalloc.c:2283
in_atomic(): 1, irqs_disabled(): 0, pid: 3852, name: kworker/0:4
6 locks held by kworker/0:4/3852:
#0: 000000004270f6bb ((wq_completion)ceph-msgr){+.+.}, at: process_one_work+0x1b8/0x5f0
#1: 00000000eb420803 ((work_completion)(&(&con->work)->work)){+.+.}, at: process_one_work+0x1b8/0x5f0
#2: 00000000be1c53a4 (&s->s_mutex){+.+.}, at: dispatch+0x288/0x1476
#3: 00000000559cb958 (&mdsc->snap_rwsem){++++}, at: dispatch+0x2eb/0x1476
#4: 000000000d5ebbae (&req->r_fill_mutex){+.+.}, at: dispatch+0x2fc/0x1476
#5: 00000000a83d0514 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: fill_inode.isra.0+0xf8/0xf70
CPU: 0 PID: 3852 Comm: kworker/0:4 Not tainted 5.2.0+ #441
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
Workqueue: ceph-msgr ceph_con_workfn
Call Trace:
dump_stack+0x67/0x90
___might_sleep.cold+0x9f/0xb1
vfree+0x4b/0x60
ceph_buffer_release+0x1b/0x60
fill_inode.isra.0+0xa9b/0xf70
ceph_fill_trace+0x13b/0xc70
? dispatch+0x2eb/0x1476
dispatch+0x320/0x1476
? __mutex_unlock_slowpath+0x4d/0x2a0
ceph_con_workfn+0xc97/0x2ec0
? process_one_work+0x1b8/0x5f0
process_one_work+0x244/0x5f0
worker_thread+0x4d/0x3e0
kthread+0x105/0x140
? process_one_work+0x5f0/0x5f0
? kthread_park+0x90/0x90
ret_from_fork+0x3a/0x50
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Calling ceph_buffer_put() in __ceph_build_xattrs_blob() may result in
freeing the i_xattrs.blob buffer while holding the i_ceph_lock. This can
be fixed by having this function returning the old blob buffer and have
the callers of this function freeing it when the lock is released.
The following backtrace was triggered by fstests generic/117.
BUG: sleeping function called from invalid context at mm/vmalloc.c:2283
in_atomic(): 1, irqs_disabled(): 0, pid: 649, name: fsstress
4 locks held by fsstress/649:
#0: 00000000a7478e7e (&type->s_umount_key#19){++++}, at: iterate_supers+0x77/0xf0
#1: 00000000f8de1423 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: ceph_check_caps+0x7b/0xc60
#2: 00000000562f2b27 (&s->s_mutex){+.+.}, at: ceph_check_caps+0x3bd/0xc60
#3: 00000000f83ce16a (&mdsc->snap_rwsem){++++}, at: ceph_check_caps+0x3ed/0xc60
CPU: 1 PID: 649 Comm: fsstress Not tainted 5.2.0+ #439
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0x67/0x90
___might_sleep.cold+0x9f/0xb1
vfree+0x4b/0x60
ceph_buffer_release+0x1b/0x60
__ceph_build_xattrs_blob+0x12b/0x170
__send_cap+0x302/0x540
? __lock_acquire+0x23c/0x1e40
? __mark_caps_flushing+0x15c/0x280
? _raw_spin_unlock+0x24/0x30
ceph_check_caps+0x5f0/0xc60
ceph_flush_dirty_caps+0x7c/0x150
? __ia32_sys_fdatasync+0x20/0x20
ceph_sync_fs+0x5a/0x130
iterate_supers+0x8f/0xf0
ksys_sync+0x4f/0xb0
__ia32_sys_sync+0xa/0x10
do_syscall_64+0x50/0x1c0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fc6409ab617
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Calling ceph_buffer_put() in __ceph_setxattr() may end up freeing the
i_xattrs.prealloc_blob buffer while holding the i_ceph_lock. This can be
fixed by postponing the call until later, when the lock is released.
The following backtrace was triggered by fstests generic/117.
BUG: sleeping function called from invalid context at mm/vmalloc.c:2283
in_atomic(): 1, irqs_disabled(): 0, pid: 650, name: fsstress
3 locks held by fsstress/650:
#0: 00000000870a0fe8 (sb_writers#8){.+.+}, at: mnt_want_write+0x20/0x50
#1: 00000000ba0c4c74 (&type->i_mutex_dir_key#6){++++}, at: vfs_setxattr+0x55/0xa0
#2: 000000008dfbb3f2 (&(&ci->i_ceph_lock)->rlock){+.+.}, at: __ceph_setxattr+0x297/0x810
CPU: 1 PID: 650 Comm: fsstress Not tainted 5.2.0+ #437
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0x67/0x90
___might_sleep.cold+0x9f/0xb1
vfree+0x4b/0x60
ceph_buffer_release+0x1b/0x60
__ceph_setxattr+0x2b4/0x810
__vfs_setxattr+0x66/0x80
__vfs_setxattr_noperm+0x59/0xf0
vfs_setxattr+0x81/0xa0
setxattr+0x115/0x230
? filename_lookup+0xc9/0x140
? rcu_read_lock_sched_held+0x74/0x80
? rcu_sync_lockdep_assert+0x2e/0x60
? __sb_start_write+0x142/0x1a0
? mnt_want_write+0x20/0x50
path_setxattr+0xba/0xd0
__x64_sys_lsetxattr+0x24/0x30
do_syscall_64+0x50/0x1c0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7ff23514359a
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
I have been reviewing patches for md in the past few months. Mark me
as the MD maintainer, as I have effectively been filling that role.
Cc: NeilBrown <neilb@suse.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>