Shortly before 3.16-rc1, Dave Jones reported:
WARNING: CPU: 3 PID: 19721 at fs/xfs/xfs_aops.c:971
xfs_vm_writepage+0x5ce/0x630 [xfs]()
CPU: 3 PID: 19721 Comm: trinity-c61 Not tainted 3.15.0+ #3
Call Trace:
xfs_vm_writepage+0x5ce/0x630 [xfs]
shrink_page_list+0x8f9/0xb90
shrink_inactive_list+0x253/0x510
shrink_lruvec+0x563/0x6c0
shrink_zone+0x3b/0x100
shrink_zones+0x1f1/0x3c0
try_to_free_pages+0x164/0x380
__alloc_pages_nodemask+0x822/0xc90
alloc_pages_vma+0xaf/0x1c0
handle_mm_fault+0xa31/0xc50
etc.
970 if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
971 PF_MEMALLOC))
I did not respond at the time, because a glance at the PageDirty block
in shrink_page_list() quickly shows that this is impossible: we don't do
writeback on file pages (other than tmpfs) from direct reclaim nowadays.
Dave was hallucinating, but it would have been disrespectful to say so.
However, my own /var/log/messages now shows similar complaints
WARNING: CPU: 1 PID: 28814 at fs/ext4/inode.c:1881 ext4_writepage+0xa7/0x38b()
WARNING: CPU: 0 PID: 27347 at fs/ext4/inode.c:1764 ext4_writepage+0xa7/0x38b()
from stressing some mmotm trees during July.
Could a dirty xfs or ext4 file page somehow get marked PageSwapBacked,
so fail shrink_page_list()'s page_is_file_cache() test, and so proceed
to mapping->a_ops->writepage()?
Yes, 3.16-rc1's commit 68711a7463 ("mm, migration: add destination
page freeing callback") has provided such a way to compaction: if
migrating a SwapBacked page fails, its newpage may be put back on the
list for later use with PageSwapBacked still set, and nothing will clear
it.
Whether that can do anything worse than issue WARN_ON_ONCEs, and get
some statistics wrong, is unclear: easier to fix than to think through
the consequences.
Fixing it here, before the put_new_page(), addresses the bug directly,
but is probably the worst place to fix it. Page migration is doing too
many parts of the job on too many levels: fixing it in
move_to_new_page() to complement its SetPageSwapBacked would be
preferable, except why is it (and newpage->mapping and newpage->index)
done there, rather than down in migrate_page_move_mapping(), once we are
sure of success? Not a cleanup to get into right now, especially not
with memcg cleanups coming in 3.17.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull drm fixes from Dave Airlie:
"This is radeon and intel fixes, and is a small bit larger than I'm
guessing you'd like it to be.
- i915: fixes 32-bit highmem i915 blank screen, semaphore hang and
runtime pm fix
- radeon: gpuvm stability fix for hangs since 3.15, and hang/reboot
regression on TN/RL devices,
The only slightly controversial one is the change to use GB for the
vm_size, which I'm letting through as its a new interface we defined
in this merge window, and I'd prefer to have the released kernel have
the final interface rather than changing it later"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: fix cut and paste issue for hawaii.
drm/radeon: fix irq ring buffer overflow handling
drm/i915: Simplify i915_gem_release_all_mmaps()
drm/radeon: fix error handling in radeon_vm_bo_set_addr
drm/i915: fix freeze with blank screen booting highmem
drm/i915: Reorder the semaphore deadlock check, again
drm/radeon/TN: only enable bapm on MSI systems
drm/radeon: fix VM IB handling
drm/radeon: fix handling of radeon_vm_bo_rmv v3
drm/radeon: let's use GB for vm_size (v2)
Here contains only the fixes for the new FireWire bebob driver.
All fairly trivial and local fixes, so safe to apply.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJT0RKNAAoJEGwxgFQ9KSmkc3AP/22rWmdBQ1KaAIve6edzZJN9
76EAFpXkTBb8RPAXuB12OcQ9mz0YjVLy5GD0yQ7ydJtZHySr0eXfAQBk7tvTfLk4
msCeShCszVbecEcqYTKcfqsjd56TB/nKge3X03XFk/iN5TenLGhCAmRoihyAIlot
3h9Kzgm75XFIb9KS3chU41ics0/PafPW1b1vsh/rPMNMuNwr3fG9XnYkmmd5gsL9
WnL6lKk9oj2wLT3w9zwja2NGRc8G87wZ8WMcXTlSPyFMXUgyCxOD+CFnu38Zk0rv
URUrbRqagwi6yyV6JEU1o/bc5UteK2LO5dR4VI9IogwxiGBRYMpX0NTClN3cgQgi
G4aqJi6I6nsyeFb4m3NfHfxyKzV3n3QXpc3940G7PYMcuxGlTnMDjy9aKjWnGSCi
VnPfGuqwb210wumhvBNoRNUTq91Qw0JrW6pUJsJqkefaHw4PbhRD1xdT7cs12xn7
zFAq9ZpERQ6SuyaqnviTBUiuAjNzmqkIr3DGgiClFzuBGu5s5OyZGHyUx6sPAg1J
ZEsoNplBoPN9qmZgAsskuSVWlfPq6icXqZVVDY63IgMFU68E6BN0LapxM1aFOtyG
FfX5eD3JsfPGQeporxpTd6ehybUnrJH/5M1dnybydODwmAAuIzryrA/3FXuxP3z4
m2h9E7IW6zLmuWkBdcmn
=2EEt
-----END PGP SIGNATURE-----
Merge tag 'sound-3.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Here contains only the fixes for the new FireWire bebob driver. All
fairly trivial and local fixes, so safe to apply"
* tag 'sound-3.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: bebob: Correction for return value of special_clk_ctl_put() in error
ALSA: bebob: Correction for return value of .put callback
ALSA: bebob: Use different labels for digital input/output
ALSA: bebob: Fix a missing to unlock mutex in error handling case
Do not split the PARPORT-related symbols with the new kconfig
symbol ARCH_MIGHT_HAVE_PC_PARPORT. The split was causing incorrect
display of these symbols -- they were not being displayed together
as they should be.
Fixes: d90c3eb315 "Kconfig cleanup (PARPORT_PC dependencies)"
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: stable@vger.kernel.org # for 3.13, 3.14, 3.15
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJT0vhRAAoJEJommM3PjknHU0kQAJCVpETov4VEojTWjW54BE59
Yflb/EcZjfrZY7/LixV1AhhlInlXj9TuthAPG0TLkhaLElBsbEwuVGHHOytM77s2
8wI8Ecw8hXPTZkqDC4yYF1ysHqKkIXQ7e5IImWOozybKy6ymMIYIBGm8rP0m6oS5
nC25pR+4vWPr0B/L6+XpdBbxJmRyNw4y4NmKLSp6Uv+j3u/emolp7lKE7tRsWcjc
k6ey32k57XIbtYMjILIele27g2JGsx2u2s+tm5Qffuy2IG/0ErO6nrG5DDzjMTI1
v/uSC6WDeJICd4Gj0QzPCu1gGztv7xaBzD94ssPsnIcruuD4TSuqB9b+3YEjrEV2
Iwo/y615qYU6AxtwqA8vhhrwk+0I5oj7xJb3ASgEsCco8o3sKhE1QQtRsLw6cOnI
sb4jiA6nh5FkiTfinnUi00Xp8W2bd6weg+GJHEMGgCn/dWYKyo6TxheUexorzH4f
tb5oxPkZ2WtR3FxWNcd2xDklPoV+d3BDUlR4x+4hmd31vJZ25Fcy8eg8HDp9Lk4a
wwI1iFotw3coDj/Tkyrf/4z/XBQh0KIjgia9sAmdYSEo28Z8DHHoZFMPPB9S98vO
V8aA2DN62FQ5oNpPYBla4+RPrCJKJfWbSI3Mq0kaRRkBUtt7hTz+OsX9a/khLxn2
o5Awtn+sSYUFl2vt896c
=tFJe
-----END PGP SIGNATURE-----
Merge tag 'blackfin-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/realmz6/blackfin-linux
Pull blackfin fixes from Steven Miao:
"smc nor flash PM fix, pinctrl group fix, update defconfig, and build
fixes"
* tag 'blackfin-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/realmz6/blackfin-linux:
blackfin: vmlinux.lds.S: reserve 32 bytes space at the end of data section for XIP kernel
defconfig: BF609: update spi config name
irq: blackfin sec: drop duplicated sec priority set
blackfin: bind different groups of one pinmux function to different state name
blackfin: fix some bf5xx boards build for missing <linux/gpio.h>
pm: bf609: cleanup smc nor flash
drop smc pin state change code, pin state will be saved in pinctrl-adi2 driver
cleanup nor flash init/exit for pm suspend/resume
Signed-off-by: Steven Miao <realmz6@gmail.com>
Pull parisc fixes from Helge Deller:
"We have two trivial patches in here. One removes the SA_RESTORER
#define since on parisc we don't have the sa_restorer field in struct
sigaction, the other patch removes an unnecessary memset().
The SA_RESTORER removal patch is scheduled for stable trees, since
without it some userspace apps don't build"
* 'parisc-3.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Eliminate memset after alloc_bootmem_pages
parisc: Remove SA_RESTORER define
Pull fuse fixes from Miklos Szeredi:
"These two pathes fix issues with the kernel-userspace protocol changes
in v3.15"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: add FUSE_NO_OPEN_SUPPORT flag to INIT
fuse: s_time_gran fix
This is a halfway fix for hawaii acceleration. More fixes to come
but hopefully isolated to userspace.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
two more radeon fixes.
* 'drm-fixes-3.16' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: fix irq ring buffer overflow handling
drm/radeon: fix error handling in radeon_vm_bo_set_addr
This time in time! Just 32bit-pae fix from Hugh, semaphores fun from Chris
and a fix for runtime pm cherry-picked from next.
Paulo is still working on a fix for runtime pm when X does cursor fun when
the display is off, but that one isn't ready yet.
* tag 'drm-intel-fixes-2014-07-24' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Simplify i915_gem_release_all_mmaps()
drm/i915: fix freeze with blank screen booting highmem
drm/i915: Reorder the semaphore deadlock check, again
alloc_bootmem and related function always return zeroed region of
memory. Thus a memset after calls to these functions is unnecessary.
The following Coccinelle semantic patch was used for making the change:
@@
expression E,E1;
@@
E = \(alloc_bootmem\|alloc_bootmem_low\|alloc_bootmem_pages\|alloc_bootmem_low_pages\)(...)
... when != E
- memset(E,0,E1);
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Helge Deller <deller@gmx.de>
The sa_restorer field in struct sigaction is obsolete and no longer in
the parisc implementation. However, the core code assumes the field is
present if SA_RESTORER is defined. So, the define needs to be removed.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>
Temperature limit clamps are applied after converting the temperature
from milli-degrees C to degrees C, so either the clamp limit needs
to be specified in degrees C, not milli-degrees C, or clamping must
happen before converting to degrees C. Use the latter method to avoid
overflows.
vrm is an u8, so the written value needs to be limited to [0, 255].
Cc: Axel Lin <axel.lin@ingics.com>
Cc: stable@vger.kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Pull nfsd bugfix from Bruce Fields:
"Another regression from the xdr encoding rewrite"
* 'for-3.16' of git://linux-nfs.org/~bfields/linux:
NFSD: Fix crash encoding lock reply on 32-bit
- resolve FIXMEs in double exception handler for window overflow. This
fix makes native building of linux on xtensa host possible;
- fix sysmem region removal issue introduced in 3.15.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJTzhdsAAoJEI9vqH3mFV2ss5QP/1EVWvoI59eK3QQjrKdCRk/C
bLi7yCgZHh75ZTPvHpwVWFkIc0C1w7wprgJJVe9LjT4BPs609IRPrwMq4CLznNpn
nT3PcxyfJxv6q1sv/Aj4h3X/bqLMbBMclldqD4rH1WGpwLpxDjVlcfGc+7592L8a
LInSpBsby4G44gW/4ed5I5bIKNiQYPN9WkGsHEH3gGJLosQtEOmPYV+cu/2P4rVF
cWFHLKVSzSEEYQ7EvqWPYeBu6tp4HyHZHKwNTOaJbUg1POLn7cEH8O503nsRQ/DB
KKEG5vy/EBvB20U819VgmtS22/C07wT7dvMNy2B0IV/T4KCLpTDE5SmFdZc20HGc
W3/BHpBDiRQvdxsUnDEWibhgz4fgFkUi0/t20qKMJEWbHOKvrfAuM/KAgMDayWu+
RNZuIf3sA2mxg25b9cnFFOwhBRVCJl/zhYnQl3VuDbn3qdCHfFeNz6MAYyzNGPbd
PSRpsS8bG0y1eVaHlKc0wUrXx2COYDiKBrJ8OgswQxa+J4oTW2bS29j3ZdEeVLf5
+aDln0FjqxRMSjPe3sq1Ex0qKW15lImSK/znjrVWOFUdPrc+FhGtkWrjc5jC6NUH
8VN9ktuBh/ZKUT7/hLf5xWUljV5wEKZ3aeFTMT+zSQ93n/xf+lNMpDmjda+o8jru
Wrc2dWoE2o1UjCnTqbop
=khlu
-----END PGP SIGNATURE-----
Merge tag 'xtensa-next-20140721' of git://github.com/czankel/xtensa-linux
Pull Xtensa fixes from Chris Zankel:
- resolve FIXMEs in double exception handler for window overflow. This
fix makes native building of linux on xtensa host possible;
- fix sysmem region removal issue introduced in 3.15.
* tag 'xtensa-next-20140721' of git://github.com/czankel/xtensa-linux:
xtensa: fix sysmem reservation at the end of existing block
xtensa: add fixup for double exception raised in window overflow
- An IRQ handling fix for the STi driver, also for stable
- Another IRQ fix for the RCAR GPIO driver
- A MAINTAINERS entry
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTz6+qAAoJEEEQszewGV1z+o0QAJSXONx5WyVHNXGII24loIyQ
035O3ENmTLP2RNiibREcOk4tAD3IuIW/GYL+rx1yCCU5fAzKdoG7bG53DfariTWO
zSe+mUxdspKEiWGs3vXP5G2+UOD0dM+GytqduAIw6Uuwd9WcZkd46D4BLmFq0V0A
ZcY4oGaB1JetEu2LVLAuMaUkWeRMVH3SQX+YwM7dNigljgMJSHo2FduRN0+A8nJD
TpvdsZbAzsdNyH6oOnzzOAnpIRyL/TthPwnwUcEKVtcv8zj9A4gjB9fBObuR2v5L
rncVdVV/2anWIsSjSnaXOSLni18wIxB1YDRanTI1tH3xAfCNQNCBjPrDzdoQcMvP
B7MkYqRnl/ZGUq74kAixAozQJBLfzyLhV5G+1zAty81x1+gLITUD3ZkqwBqGvinY
UMg2s3eHvW20ypc2i9cTj3VZeepjyzbQ0JaF7o1GiU3By9Igy+WJPu0OlDyxGoi0
fZZRQHfyERcGqG71RwP7aBqWJLoerOG6hJ50vbcqlKw8/EBOJNN7eROvfNk06IYp
2RqjXGu2MKcFZWGh6Os7tO1WUp14fXX6W3szwB47EGZAPrcgTb/3NMXoj61t6dba
uGM/PXu5xensdc98r2Hd4iOp2Z6LN9/f/9Iqm9gR1dq8B6qn2389gNXIU6j3onFh
UUzA3NVT3W5WiNdLU1es
=pzyJ
-----END PGP SIGNATURE-----
Merge tag 'pinctrl-v3.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Here are three pin control fixes for the v3.16 series. Sorry that
some of these arrive late, the summer heat in Sweden makes me slow.
- an IRQ handling fix for the STi driver, also for stable
- another IRQ fix for the RCAR GPIO driver
- a MAINTAINERS entry"
* tag 'pinctrl-v3.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
gpio: rcar: Add support for DT IRQ flags
MAINTAINERS: Add entry for the Renesas pin controller driver
pinctrl: st: Fix irqmux handler
Pull libata regression fix from Tejun Heo:
"The last libata/for-3.16-fixes pull contained a regression introduced
by 1871ee134b ("libata: support the ata host which implements a
queue depth less than 32") which in turn was a fix for a regression
introduced earlier while changing queue tag order to accomodate hard
drives which perform poorly if tags are not allocated in circular
order (ugh...).
The regression happens only for SAS controllers making use of libata
to serve ATA devices. They don't fill an ata_host field which is used
by the new tag allocation function leading to NULL dereference.
This patch adds a new intermediate field ata_host->n_tags which is
initialized for both SAS and !SAS cases to fix the issue"
* 'for-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libata: introduce ata_host->n_tags to avoid oops on SAS controllers
Pull powerpc fixes from Ben Herrenschmidt:
"Here is a handful of powerpc fixes for 3.16. They are all pretty
simple and self contained and should still make this release"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: use _GLOBAL_TOC for memmove
powerpc/pseries: dynamically added OF nodes need to call of_node_init
powerpc: subpage_protect: Increase the array size to take care of 64TB
powerpc: Fix bugs in emulate_step()
powerpc: Disable doorbells on Power8 DD1.x
kmem_cache_sanity_check() that has been repeatedly reported (as recently
as today against Fedora rawhide). Pekka seemed to have it staged for a
late 3.15-rc in his 'slab/urgent' branch but never sent a pull request,
see: https://lkml.org/lkml/2014/5/23/648
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJTzuh9AAoJEMUj8QotnQNa4kkH/A0cHsQ3RraN1vvJvvQwiKgo
fXaLDCikEoAKUNEs5394fd8HKcHrR3JAS3I1PpeiKaqO2TsQO+yGuoQyqNptUsCJ
w0u46BWsQXXe1cUFlpWYFoZ0uCaUQ9XcIKCtR0uExSXYj48ILu855ObLSEAr/zSU
IdXnrNrt6MGAzTkBG6gJ3gBan+DkjVb//2Es3M86xibotferxKfOTa9tUcRFRaCg
Sl85hnfIZgA7SXf1sOMPP+B7e9TFFrrTARsXecqMgCsiIE8Pkcg8sbTHPtHM4th6
upzk7MjvEvYmFGN20LF9EVO9JiPwqitZjS2v8RceHzPssvHazWu5xgABWLKoy4c=
=8SD1
-----END PGP SIGNATURE-----
Merge tag 'urgent-slab-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull slab fix from Mike Snitzer:
"This fixes the broken duplicate slab name check in
kmem_cache_sanity_check() that has been repeatedly reported (as
recently as today against Fedora rawhide).
Pekka seemed to have it staged for a late 3.15-rc in his 'slab/urgent'
branch but never sent a pull request, see:
https://lkml.org/lkml/2014/5/23/648"
* tag 'urgent-slab-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
slab_common: fix the check for duplicate slab names
Merge fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: hugetlb: fix copy_hugetlb_page_range()
simple_xattr: permit 0-size extended attributes
mm/fs: fix pessimization in hole-punching pagecache
shmem: fix splicing from a hole while it's punched
shmem: fix faulting into a hole, not taking i_mutex
mm: do not call do_fault_around for non-linear fault
sh: also try passing -m4-nofpu for SH2A builds
zram: avoid lockdep splat by revalidate_disk
mm/rmap.c: fix pgoff calculation to handle hugepage correctly
coredump: fix the setting of PF_DUMPCORE
Commit 4a705fef98 ("hugetlb: fix copy_hugetlb_page_range() to handle
migration/hwpoisoned entry") changed the order of
huge_ptep_set_wrprotect() and huge_ptep_get(), which leads to breakage
in some workloads like hugepage-backed heap allocation via libhugetlbfs.
This patch fixes it.
The test program for the problem is shown below:
$ cat heap.c
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#define HPS 0x200000
int main() {
int i;
char *p = malloc(HPS);
memset(p, '1', HPS);
for (i = 0; i < 5; i++) {
if (!fork()) {
memset(p, '2', HPS);
p = malloc(HPS);
memset(p, '3', HPS);
free(p);
return 0;
}
}
sleep(1);
free(p);
return 0;
}
$ export HUGETLB_MORECORE=yes ; export HUGETLB_NO_PREFAULT= ; hugectl --heap ./heap
Fixes 4a705fef98 ("hugetlb: fix copy_hugetlb_page_range() to handle
migration/hwpoisoned entry"), so is applicable to -stable kernels which
include it.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Guillaume Morin <guillaume@morinfr.org>
Suggested-by: Guillaume Morin <guillaume@morinfr.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org> [2.6.37+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If a filesystem uses simple_xattr to support user extended attributes,
LTP setxattr01 and xfstests generic/062 fail with "Cannot allocate
memory": simple_xattr_alloc()'s wrap-around test mistakenly excludes
values of zero size. Fix that off-by-one (but apparently no filesystem
needs them yet).
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I wanted to revert my v3.1 commit d0823576bf ("mm: pincer in
truncate_inode_pages_range"), to keep truncate_inode_pages_range() in
synch with shmem_undo_range(); but have stepped back - a change to
hole-punching in truncate_inode_pages_range() is a change to
hole-punching in every filesystem (except tmpfs) that supports it.
If there's a logical proof why no filesystem can depend for its own
correctness on the pincer guarantee in truncate_inode_pages_range() - an
instant when the entire hole is removed from pagecache - then let's
revisit later. But the evidence is that only tmpfs suffered from the
livelock, and we have no intention of extending hole-punch to ramfs. So
for now just add a few comments (to match or differ from those in
shmem_undo_range()), and fix one silliness noticed in d0823576bf4b...
Its "index == start" addition to the hole-punch termination test was
incomplete: it opened a way for the end condition to be missed, and the
loop go on looking through the radix_tree, all the way to end of file.
Fix that pessimization by resetting index when detected in inner loop.
Note that it's actually hard to hit this case, without the obsessive
concurrent faulting that trinity does: normally all pages are removed in
the initial trylock_page() pass, and this loop finds nothing to do. I
had to "#if 0" out the initial pass to reproduce bug and test fix.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
shmem_fault() is the actual culprit in trinity's hole-punch starvation,
and the most significant cause of such problems: since a page faulted is
one that then appears page_mapped(), needing unmap_mapping_range() and
i_mmap_mutex to be unmapped again.
But it is not the only way in which a page can be brought into a hole in
the radix_tree while that hole is being punched; and Vlastimil's testing
implies that if enough other processors are busy filling in the hole,
then shmem_undo_range() can be kept from completing indefinitely.
shmem_file_splice_read() is the main other user of SGP_CACHE, which can
instantiate shmem pagecache pages in the read-only case (without holding
i_mutex, so perhaps concurrently with a hole-punch). Probably it's
silly not to use SGP_READ already (using the ZERO_PAGE for holes): which
ought to be safe, but might bring surprises - not a change to be rushed.
shmem_read_mapping_page_gfp() is an internal interface used by
drivers/gpu/drm GEM (and next by uprobes): it should be okay. And
shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when
called internally by the kernel (perhaps for a stacking filesystem,
which might rely on holes to be reserved): it's unclear whether it could
be provoked to keep hole-punch busy or not.
We could apply the same umbrella as now used in shmem_fault() to
shmem_file_splice_read() and the others; but it looks ugly, and use over
a range raises questions - should it actually be per page? can these get
starved themselves?
The origin of this part of the problem is my v3.1 commit d0823576bf
("mm: pincer in truncate_inode_pages_range"), once it was duplicated
into shmem.c. It seemed like a nice idea at the time, to ensure
(barring RCU lookup fuzziness) that there's an instant when the entire
hole is empty; but the indefinitely repeated scans to ensure that make
it vulnerable.
Revert that "enhancement" to hole-punch from shmem_undo_range(), but
retain the unproblematic rescanning when it's truncating; add a couple
of comments there.
Remove the "indices[0] >= end" test: that is now handled satisfactorily
by the inner loop, and mem_cgroup_uncharge_start()/end() are too light
to be worth avoiding here.
But if we do not always loop indefinitely, we do need to handle the case
of swap swizzled back to page before shmem_free_swap() gets it: add a
retry for that case, as suggested by Konstantin Khlebnikov; and for the
case of page swizzled back to swap, as suggested by Johannes Weiner.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <stable@vger.kernel.org> [3.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit f00cdc6df7 ("shmem: fix faulting into a hole while it's
punched") was buggy: Sasha sent a lockdep report to remind us that
grabbing i_mutex in the fault path is a no-no (write syscall may already
hold i_mutex while faulting user buffer).
We tried a completely different approach (see following patch) but that
proved inadequate: good enough for a rational workload, but not good
enough against trinity - which forks off so many mappings of the object
that contention on i_mmap_mutex while hole-puncher holds i_mutex builds
into serious starvation when concurrent faults force the puncher to fall
back to single-page unmap_mapping_range() searches of the i_mmap tree.
So return to the original umbrella approach, but keep away from i_mutex
this time. We really don't want to bloat every shmem inode with a new
mutex or completion, just to protect this unlikely case from trinity.
So extend the original with wait_queue_head on stack at the hole-punch
end, and wait_queue item on the stack at the fault end.
This involves further use of i_lock to guard against the races: lockdep
has been happy so far, and I see fs/inode.c:unlock_new_inode() holds
i_lock around wake_up_bit(), which is comparable to what we do here.
i_lock is more convenient, but we could switch to shmem's info->lock.
This issue has been tagged with CVE-2014-4171, which will require commit
f00cdc6df7 and this and the following patch to be backported: we
suggest to 3.1+, though in fact the trinity forkbomb effect might go
back as far as 2.6.16, when madvise(,,MADV_REMOVE) came in - or might
not, since much has changed, with i_mmap_mutex a spinlock before 3.0.
Anyone running trinity on 3.0 and earlier? I don't think we need care.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <stable@vger.kernel.org> [3.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Korb reported that "repeated mapping of the same file on tmpfs
using remap_file_pages sometimes triggers a BUG at mm/filemap.c:202 when
the process exits".
He bisected the bug to d7c1755179 ("mm: implement ->map_pages for
shmem/tmpfs"), although the bug was actually added by commit
8c6e50b029 ("mm: introduce vm_ops->map_pages()").
The problem is caused by calling do_fault_around for a _non-linear_
fault. In this case pgoff is shifted and might become negative during
calculation.
Faulting around non-linear page-fault makes no sense and breaks the
logic in do_fault_around because pgoff is shifted.
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Ingo Korb <ingo.korb@tu-dortmund.de>
Tested-by: Ingo Korb <ingo.korb@tu-dortmund.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Ning Qu <quning@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org> [3.15.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When compiling a SH2A kernel (e.g. se7206_defconfig or rsk7203_defconfig)
using sh4-linux-gcc, linking fails with:
net/built-in.o: In function `__sk_run_filter':
net/core/filter.c:566: undefined reference to `__fpscr_values'
net/core/filter.c:269: undefined reference to `__fpscr_values'
...
net/built-in.o:net/core/filter.c:580: more undefined references to `__fpscr_values' follow
This happens because sh4-linux-gcc doesn't support the "-m2a-nofpu",
which is thus filtered out by "$(call cc-option, ...)".
As compiling using sh4-linux-gcc is useful for compile coverage, also
try passing "-m4-nofpu" (which is presumably filtered out when using a
real sh2a-linux toolchain) to disable the generation of FPU instructions
and references to __fpscr_values[].
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Tony Breeds <tony@bakeyournoodle.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sasha reported lockdep warning [1] introduced by [2].
It could be fixed by doing disk revalidation out of the init_lock. It's
okay because disk capacity change is protected by init_lock so that
revalidate_disk always sees up-to-date value so there is no race.
[1] https://lkml.org/lkml/2014/7/3/735
[2] zram: revalidate disk after capacity change
Fixes 2e32baea46 ("zram: revalidate disk after capacity change").
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: "Alexander E. Patrakov" <patrakov@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I triggered VM_BUG_ON() in vma_address() when I tried to migrate an
anonymous hugepage with mbind() in the kernel v3.16-rc3. This is
because pgoff's calculation in rmap_walk_anon() fails to consider
compound_order() only to have an incorrect value.
This patch introduces page_to_pgoff(), which gets the page's offset in
PAGE_CACHE_SIZE.
Kirill pointed out that page cache tree should natively handle
hugepages, and in order to make hugetlbfs fit it, page->index of
hugetlbfs page should be in PAGE_CACHE_SIZE. This is beyond this patch,
but page_to_pgoff() contains the point to be fixed in a single function.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 079148b919 ("coredump: factor out the setting of PF_DUMPCORE")
cleaned up the setting of PF_DUMPCORE by removing it from all the
linux_binfmt->core_dump() and moving it to zap_threads().But this ended
up clearing all the previously set flags. This causes issues during
core generation when tsk->flags is checked again (eg. for PF_USED_MATH
to dump floating point registers). Fix this.
Signed-off-by: Silesh C V <svellattu@mvista.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: <stable@vger.kernel.org> [3.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We must mask out the overflow bit as well, otherwise
the wptr will never match the rptr again and the interrupt
handler will loop forever.
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Commit 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low
on space" forgot to free conf->data in nfsd4_encode_lockt and before
sign conf->data to NULL in nfsd4_encode_lock_denied, causing a leak.
Worse, kfree() can be called on an uninitialized pointer in the case of
a succesful lock (or one that fails for a reason other than a conflict).
(Note that lock->lk_denied.ld_owner.data appears it should be zero here,
until you notice that it's one arm of a union the other arm of which is
written to in the succesful case by the
memcpy(&lock->lk_resp_stateid, &lock_stp->st_stid.sc_stateid,
sizeof(stateid_t));
in nfsd4_lock(). In the 32-bit case this overwrites ld_owner.data.)
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Fixes: 8c7424cff6 ""nfsd4: don't try to encode conflicting owner if low on space"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
An object can only have an active gtt mapping if it is currently bound
into the global gtt. Therefore we can simply walk the list of all bound
objects and check the flag upon those for an active gtt mapping.
From commit 48018a57a8
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date: Fri Dec 13 15:22:31 2013 -0200
drm/i915: release the GTT mmaps when going into D3
Also note that the WARN is inappropriate for this function as GPU
activity is orthogonal to GTT mmap status. Rather it is the caller that
relies upon this condition and so it should assert that the GPU is idle
itself.
References: https://bugs.freedesktop.org/show_bug.cgi?id=80081
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
[danvet: cherry-pick from -next to -fixes.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
ZONE_DMA is created to allow 32-bit only devices to access memory in the
absence of an IOMMU. On systems where the memory starts above 4GB, it is
expected that some devices have a DMA offset hardwired to be able to
access the bottom of the memory. Linux currently supports DT bindings
for the DMA offsets but they are not (easily) available early during
boot.
This patch tries to guess a DMA offset and assumes that ZONE_DMA
corresponds to the 32-bit mask above the start of DRAM.
Fixes: 2d5a5612bc (arm64: Limit the CMA buffer to 32-bit if ZONE_DMA)
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Mark Salter <msalter@redhat.com>
Tested-by: Mark Salter <msalter@redhat.com>
Tested-by: Anup Patel <anup.patel@linaro.org>
This commit is a supplement to my previous patch.
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-July/079190.html
The special_clk_ctl_put() still returns 0 in error handling case. It should
return -EINVAL.
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Here some additional changes to set a capability flag so that clients can
detect when it's appropriate to return -ENOSYS from open.
This amends the following commit introduced in 3.14:
7678ac5061 fuse: support clients that don't implement 'open'
However we can only add the flag to 3.15 and later since there was no
protocol version update in 3.14.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: <stable@vger.kernel.org> # v3.15+