Commit Graph

812037 Commits

Author SHA1 Message Date
Linus Torvalds
b9de6efed2 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "24 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits)
  autofs: fix error return in autofs_fill_super()
  autofs: drop dentry reference only when it is never used
  fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
  mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
  psi: clarify the Kconfig text for the default-disable option
  mm, memory_hotplug: __offline_pages fix wrong locking
  mm: hwpoison: use do_send_sig_info() instead of force_sig()
  kasan: mark file common so ftrace doesn't trace it
  init/Kconfig: fix grammar by moving a closing parenthesis
  lib/test_kmod.c: potential double free in error handling
  mm, oom: fix use-after-free in oom_kill_process
  mm/hotplug: invalid PFNs from pfn_to_online_page()
  mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages
  psi: fix aggregation idle shut-off
  mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone
  mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
  oom, oom_reaper: do not enqueue same task twice
  mm: migrate: make buffer_migrate_page_norefs() actually succeed
  kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
  x86_64: increase stack size for KASAN_EXTRA
  ...
2019-02-02 09:32:58 -08:00
Qian Cai
74c953ca5f efi/arm64: Fix debugfs crash by adding a terminator for ptdump marker
When reading 'efi_page_tables' debugfs triggers an out-of-bounds access here:

  arch/arm64/mm/dump.c: 282
  if (addr >= st->marker[1].start_address) {

called from:

  arch/arm64/mm/dump.c: 331
  note_page(st, addr, 2, pud_val(pud));

because st->marker++ is is called after "UEFI runtime end" which is the
last element in addr_marker[]. Therefore, add a terminator like the one
for kernel_page_tables, so it can be skipped to print out non-existent
markers.

Here's the KASAN bug report:

  # cat /sys/kernel/debug/efi_page_tables
  ---[ UEFI runtime start ]---
  0x0000000020000000-0x0000000020010000          64K PTE       RW NX SHD AF ...
  0x0000000020200000-0x0000000021340000       17664K PTE       RW NX SHD AF ...
  ...
  0x0000000021920000-0x0000000021950000         192K PTE       RW x  SHD AF ...
  0x0000000021950000-0x00000000219a0000         320K PTE       RW NX SHD AF ...
  ---[ UEFI runtime end ]---
  ---[ (null) ]---
  ---[ (null) ]---

   BUG: KASAN: global-out-of-bounds in note_page+0x1f0/0xac0
   Read of size 8 at addr ffff2000123f2ac0 by task read_all/42464
   Call trace:
    dump_backtrace+0x0/0x298
    show_stack+0x24/0x30
    dump_stack+0xb0/0xdc
    print_address_description+0x64/0x2b0
    kasan_report+0x150/0x1a4
    __asan_report_load8_noabort+0x30/0x3c
    note_page+0x1f0/0xac0
    walk_pgd+0xb4/0x244
    ptdump_walk_pgd+0xec/0x140
    ptdump_show+0x40/0x50
    seq_read+0x3f8/0xad0
    full_proxy_read+0x9c/0xc0
    __vfs_read+0xfc/0x4c8
    vfs_read+0xec/0x208
    ksys_read+0xd0/0x15c
    __arm64_sys_read+0x84/0x94
    el0_svc_handler+0x258/0x304
    el0_svc+0x8/0xc

  The buggy address belongs to the variable:
   __compound_literal.0+0x20/0x800

  Memory state around the buggy address:
   ffff2000123f2980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   ffff2000123f2a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa
  >ffff2000123f2a80: fa fa fa fa 00 00 00 00 fa fa fa fa 00 00 00 00
                                            ^
   ffff2000123f2b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   ffff2000123f2b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0

[ ardb: fix up whitespace ]
[ mingo: fix up some moar ]

Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 9d80448ac9 ("efi/arm64: Add debugfs node to dump UEFI runtime page tables")
Link: http://lkml.kernel.org/r/20190202095017.13799-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-02-02 11:27:29 +01:00
Codrin Ciubotariu
dc3f595b66 dmaengine: at_xdmac: Fix wrongfull report of a channel as in use
atchan->status variable is used to store two different information:
 - pass channel interrupts status from interrupt handler to tasklet;
 - channel information like whether it is cyclic or paused;

This causes a bug when device_terminate_all() is called,
(AT_XDMAC_CHAN_IS_CYCLIC cleared on atchan->status) and then a late End
of Block interrupt arrives (AT_XDMAC_CIS_BIS), which sets bit 0 of
atchan->status. Bit 0 is also used for AT_XDMAC_CHAN_IS_CYCLIC, so when
a new descriptor for a cyclic transfer is created, the driver reports
the channel as in use:

if (test_and_set_bit(AT_XDMAC_CHAN_IS_CYCLIC, &atchan->status)) {
	dev_err(chan2dev(chan), "channel currently used\n");
	return NULL;
}

This patch fixes the bug by adding a different struct member to keep
the interrupts status separated from the channel status bits.

Fixes: e1f7c9eee7 ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver")
Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com>
Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2019-02-02 15:55:26 +05:30
Johannes Weiner
e6d429313e x86/resctrl: Avoid confusion over the new X86_RESCTRL config
"Resource Control" is a very broad term for this CPU feature, and a term
that is also associated with containers, cgroups etc. This can easily
cause confusion.

Make the user prompt more specific. Match the config symbol name.

 [ bp: In the future, the corresponding ARM arch-specific code will be
   under ARM_CPU_RESCTRL and the arch-agnostic bits will be carved out
   under the CPU_RESCTRL umbrella symbol. ]

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Babu Moger <Babu.Moger@amd.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linux-doc@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190130195621.GA30653@cmpxchg.org
2019-02-02 10:34:52 +01:00
Linus Torvalds
cd984a5be2 xtensa fixes for v5.0-rc5
- fix ccount_timer_shutdown for secondary CPUs;
 - fix secondary CPU initialization;
 - fix secondary CPU reset vector clash with double exception vector;
 - fix present CPUs when booting with 'maxcpus' parameter;
 - limit possible CPUs by configured NR_CPUS;
 - issue a warning if xtensa PIC is asked to retrigger anything other
   than software IRQ;
 - fix masking/unmasking of the first two IRQs on xtensa MX PIC;
 - fix typo in Kconfig description for user space unaligned access
   feature;
 - fix Kconfig warning for selecting BUILTIN_DTB.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEK2eFS5jlMn3N6xfYUfnMkfg/oEQFAlxUn8MTHGpjbXZia2Jj
 QGdtYWlsLmNvbQAKCRBR+cyR+D+gRAX6D/4oOJd7TghbVdSC82rIQXRBxmlWB1YV
 tPtb/W3qrkkM8c693FbAoLFNoikYFftzn4EomTz1KtkBxq7HjZmmphiTU5E23Zg5
 nonnBkcFnm4Yfr4gLaTJl3rMqJNbDTMg6EyCPRHVI43Ux1jA9j/T2MN/dMZox+5a
 PU2q8k/HHDAlumOPj93MIKBb8XA9Sq9Jfpw2Jnlc0r8b4fR/9pKfVPOcsqs/jv3x
 BFIIH/vPvl2/j+DShpFcYnK8VgRo6zj2ny343J4zYqXspky43ZMMIaE/ZpkT592b
 uheDQYHAHvpZT+FD8waE5P5quBS5P+CmZIbuz7YTxB1VTcoV+OGGCpAvpj5CqmNr
 Mj2f3Yar+4q/QczHP+/42zGVDoJ/3dLBIu9IqSjWkY90qgncd9TD+dMWzI2ejJU7
 LPMIAw//Y/L4m3TAg84GFfGkOjzGQUXGQGl+9sqIr3eOWgoouXatq6L4F2CxCnz2
 zMKT0HFzdxs1gt13oRngugyfK1xRF0H5DW2eNt4dsFURIeIUP5cqou0v+b+CA3Li
 sbvI6yJ0g9tRf1f0yDnSRlvm0nB56zsXKGz5uuD6MiMOleUM3N41+IqCPHwuH0F1
 wNyYSWZEkt1t88rQxnter1+sDpi4brW0BZMVUSHsf+USqcwMrwu9ZmFKj08nihMP
 dCEHHlVGizOEsQ==
 =Pim9
 -----END PGP SIGNATURE-----

Merge tag 'xtensa-20190201' of git://github.com/jcmvbkbc/linux-xtensa

Pull xtensa fixes from Max Filippov:

 - fix ccount_timer_shutdown for secondary CPUs

 - fix secondary CPU initialization

 - fix secondary CPU reset vector clash with double exception vector

 - fix present CPUs when booting with 'maxcpus' parameter

 - limit possible CPUs by configured NR_CPUS

 - issue a warning if xtensa PIC is asked to retrigger anything other
   than software IRQ

 - fix masking/unmasking of the first two IRQs on xtensa MX PIC

 - fix typo in Kconfig description for user space unaligned access
   feature

 - fix Kconfig warning for selecting BUILTIN_DTB

* tag 'xtensa-20190201' of git://github.com/jcmvbkbc/linux-xtensa:
  xtensa: SMP: limit number of possible CPUs by NR_CPUS
  xtensa: rename BUILTIN_DTB to BUILTIN_DTB_SOURCE
  xtensa: Fix typo use space=>user space
  drivers/irqchip: xtensa-mx: fix mask and unmask
  drivers/irqchip: xtensa: add warning to irq_retrigger
  xtensa: SMP: mark each possible CPU as present
  xtensa: smp_lx200_defconfig: fix vectors clash
  xtensa: SMP: fix secondary CPU initialization
  xtensa: SMP: fix ccount_timer_shutdown
2019-02-01 16:56:30 -08:00
Linus Torvalds
8b050fe42d arm64 fixes for -rc5
- Fix module loading when KASLR is configured but disabled at runtime
 
 - Fix accidental IPI when mapping user executable pages
 
 - Ensure hyp-stub and KVM world switch code cannot be kprobed
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlxUWvEACgkQt6xw3ITB
 YzTNGwgAgAk1D4xfe01rasMK8dnogIMNbpxeeAuiwwFuXq/lBqo2UDX8UMt6qW9k
 b0rLpzyVelhPRJ3lQ90XvIv279cFBjHQT42YyVLlAnzkYz89kPKZweE7cWetO3sV
 kB7uJIaCfstGAp13p2ztN1KzHeIFKxk14gUa9jgmVdbHdJt+L5WdNUExs4+0jgnq
 lxBPWwvluzEc6mmpniIpF7em+svIci5M4al2gjxKVO7DI5slphOEZ4+VSGeSDxTI
 jOzTilS8UfldEnwjndl/hKes9r65afeLSDXvkJTTpfXB9jnKS54FJzy1QCibSRLT
 SHY9xkA3VEgJBifmpCEuJoNs2XaBFQ==
 =Eh1r
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "Although we're still debugging a few minor arm64-specific issues in
  mainline, I didn't want to hold this lot up in the meantime.

  We've got an additional KASLR fix after the previous one wasn't quite
  complete, a fix for a performance regression when mapping executable
  pages into userspace and some fixes for kprobe blacklisting. All
  candidates for stable.

  Summary:

   - Fix module loading when KASLR is configured but disabled at runtime

   - Fix accidental IPI when mapping user executable pages

   - Ensure hyp-stub and KVM world switch code cannot be kprobed"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: hibernate: Clean the __hyp_text to PoC after resume
  arm64: hyp-stub: Forbid kprobing of the hyp-stub
  arm64: kprobe: Always blacklist the KVM world-switch code
  arm64: kaslr: ensure randomized quantities are clean also when kaslr is off
  arm64: Do not issue IPIs for user executable ptes
2019-02-01 16:54:25 -08:00
Linus Torvalds
33640d718c SMB3 fixes, some from this week's SMB3 test evemt, 5 for stable and a particularly important one for queryxattr (see xfstests 70 and 117)
-----BEGIN PGP SIGNATURE-----
 
 iQGyBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAlxUf8kACgkQiiy9cAdy
 T1Fy8Qv4rlgVRBPcSJpvZKhFjMd5KskVIDnP0tOc5od+Mg+547eAg7UKKnJVDgKS
 OUtiP8uC/UErWvQ214hoXF1sMwoG+rDdTRdIOVDLD2a2gIIJzaIPADhYt4w5kiAX
 nO9NwPwk03KTzNUQmpdFPofbLK4csmJUR4kBNE+XhZTulmw9BkplGXnNkQGjsEdf
 nY6YdfqofE//DmR/KB1BylD0lxDtnfuB5zoELPmM6X4iWtD9W6pKVg23huEv+Dmh
 80LdYt77vI4RKEzOAmsYEpROdAmlCksVQC1nlh5EHOuMN/9TCdpdWRahJWWhbbKr
 /b8TuD+0QUUKEQdGvwM8DM/OCjnwoJphzI8CK1VzRpM4XkZxpQTzng2zVFPNzs4Y
 wwKCsxxTO7ePGg6hE/DTnRn9XJtKbR1nQsIjHcw6rE0ydl+P6YRZDUEehqyRU+jd
 Z09XCPS1+zeUPE6PRE9wlrIt7QdxOynufhaWRKf18b2b4cfKbNGcd+rMMZPagUQe
 x2CMkEU=
 =qX/z
 -----END PGP SIGNATURE-----

Merge tag '5.0-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb3 fixes from Steve French:
 "SMB3 fixes, some from this week's SMB3 test evemt, 5 for stable and a
  particularly important one for queryxattr (see xfstests 70 and 117)"

* tag '5.0-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal module version number
  CIFS: fix use-after-free of the lease keys
  CIFS: Do not consider -ENODATA as stat failure for reads
  CIFS: Do not count -ENODATA as failure for query directory
  CIFS: Fix trace command logging for SMB2 reads and writes
  CIFS: Fix possible oops and memory leaks in async IO
  cifs: limit amount of data we request for xattrs to CIFSMaxBufSize
  cifs: fix computation for MAX_SMB2_HDR_SIZE
2019-02-01 16:53:01 -08:00
Linus Torvalds
b7bd29b530 Bug Fixes
- Fix aa_label_build() error handling for failed merges
 - Fix warning about unused function apparmor_ipv6_postroute
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE7cSDD705q2rFEEf7BS82cBjVw9gFAlxUbasACgkQBS82cBjV
 w9hEyA//XLzkJ6mdreIWLSO0KO8dwCwpWyJ5EFIx3R+2Nrz3BVN2/Ysg+GZ2oEdD
 iF8OEVW/RBjapHbtZeruu8tqfU9i3WYyc/cIpsdZTZ2yIfRikyV/bT21/XggAe4g
 cGhcNVvTpVxwDFHT1YdQFHeDrcJ4nHeGVcErQ0oDV3KJRIUII8uKUmJzxiFqFQjq
 iPZHcaU3b65RrPIaxfEIk6anBW9UwInnXBdOQqjxLCg1efLBfmpT3HpUbfQrTmIq
 zh2xSgTFVS8jRbg8N7HIQDR8GZTew1x6yWx6v84fE+8NfqXEuOwo3+EPGcJWfB4i
 9W+nLlmyrJ5wyFJ4dpWzj7ZMAwuZYKSOSgXIVh2ew9bI13KUthymQ+pWJ6S91Rxf
 Z3zPzjHG3RuyeHuZqiNknM6iXgmfR+PaVk2aq22No40BsXiNjTO8l0kSH36HQKCh
 vpOC1VVXq7B0+Xe6FVQd5R1KydXwO2DrYkO4ygyrYqWXD0F6bFhQyZJlbf2las7B
 DtGjC6+nhh9eQPaYDQ5IIJEhFmNSqE+q+yRqNgcN1vj+gPEayw2KqG67Nt50UjmW
 6Lz3tTZJIw8NRSgUYqRzsJfocCADRamHgysWD+etd3mPnQbFXU5Nx6qwztcpzISP
 fHZY75etD+fkg7pR4FvzqO7EpACtZnk/Yx2BTEjUOnTuUY9hjc0=
 =dMPT
 -----END PGP SIGNATURE-----

Merge tag 'apparmor-pr-2019-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor

Pull apparmor bug fixes from John Johansen:
 "Two bug fixes for apparmor:

   - Fix aa_label_build() error handling for failed merges

   - Fix warning about unused function apparmor_ipv6_postroute"

* tag 'apparmor-pr-2019-02-01' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
  apparmor: Fix aa_label_build() error handling for failed merges
  apparmor: Fix warning about unused function apparmor_ipv6_postroute
2019-02-01 16:18:38 -08:00
Ian Kent
f585b283e3 autofs: fix error return in autofs_fill_super()
In autofs_fill_super() on error of get inode/make root dentry the return
should be ENOMEM as this is the only failure case of the called
functions.

Link: http://lkml.kernel.org/r/154725123240.11260.796773942606871359.stgit@pluto-themaw-net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:24 -08:00
Pan Bian
63ce5f552b autofs: drop dentry reference only when it is never used
autofs_expire_run() calls dput(dentry) to drop the reference count of
dentry.  However, dentry is read via autofs_dentry_ino(dentry) after
that.  This may result in a use-free-bug.  The patch drops the reference
count of dentry only when it is never used.

Link: http://lkml.kernel.org/r/154725122396.11260.16053424107144453867.stgit@pluto-themaw-net
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:24 -08:00
Jan Kara
c27d82f52f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
When superblock has lots of inodes without any pagecache (like is the
case for /proc), drop_pagecache_sb() will iterate through all of them
without dropping sb->s_inode_list_lock which can lead to softlockups
(one of our customers hit this).

Fix the problem by going to the slow path and doing cond_resched() in
case the process needs rescheduling.

Link: http://lkml.kernel.org/r/20190114085343.15011-1-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:24 -08:00
David Hildenbrand
e0a352fabc mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
We had a race in the old balloon compaction code before b1123ea6d3
("mm: balloon: use general non-lru movable page feature") refactored it
that became visible after backporting 195a8c43e9 ("virtio-balloon:
deflate via a page list") without the refactoring.

The bug existed from commit d6d86c0a7f ("mm/balloon_compaction:
redesign ballooned pages management") till b1123ea6d3 ("mm: balloon:
use general non-lru movable page feature").  d6d86c0a7f
("mm/balloon_compaction: redesign ballooned pages management") was
backported to 3.12, so the broken kernels are stable kernels [3.12 -
4.7].

There was a subtle race between dropping the page lock of the newpage in
__unmap_and_move() and checking for __is_movable_balloon_page(newpage).

Just after dropping this page lock, virtio-balloon could go ahead and
deflate the newpage, effectively dequeueing it and clearing PageBalloon,
in turn making __is_movable_balloon_page(newpage) fail.

This resulted in dropping the reference of the newpage via
putback_lru_page(newpage) instead of put_page(newpage), leading to
page->lru getting modified and a !LRU page ending up in the LRU lists.
With 195a8c43e9 ("virtio-balloon: deflate via a page list")
backported, one would suddenly get corrupted lists in
release_pages_balloon():

- WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
- list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100

Nowadays this race is no longer possible, but it is hidden behind very
ugly handling of __ClearPageMovable() and __PageMovable().

__ClearPageMovable() will not make __PageMovable() fail, only
PageMovable().  So the new check (__PageMovable(newpage)) will still
hold even after newpage was dequeued by virtio-balloon.

If anybody would ever change that special handling, the BUG would be
introduced again.  So instead, make it explicit and use the information
of the original isolated page before migration.

This patch can be backported fairly easy to stable kernels (in contrast
to the refactoring).

Link: http://lkml.kernel.org/r/20190129233217.10747-1-david@redhat.com
Fixes: d6d86c0a7f ("mm/balloon_compaction: redesign ballooned pages management")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Vratislav Bendel <vbendel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vratislav Bendel <vbendel@redhat.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>	[3.12 - 4.7]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:24 -08:00
Johannes Weiner
7b2489d37e psi: clarify the Kconfig text for the default-disable option
The current help text caused some confusion in online forums about
whether or not to default-enable or default-disable psi in vendor
kernels.  This is because it doesn't communicate the reason for why we
made this setting configurable in the first place: that the overhead is
non-zero in an artificial scheduler stress test.

Since this isn't representative of real workloads, and the effect was
not measurable in scheduler-heavy real world applications such as the
webservers and memcache installations at Facebook, it's fair to point
out that this is a pretty cautious option to select.

Link: http://lkml.kernel.org/r/20190129233617.16767-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:24 -08:00
Michal Hocko
e3df4c6e48 mm, memory_hotplug: __offline_pages fix wrong locking
Jan has noticed that we do double unlock on some failure paths when
offlining a page range.  This is indeed the case when
test_pages_in_a_zone respp.  start_isolate_page_range fail.  This was an
omission when forward porting the debugging patch from an older kernel.

Fix the issue by dropping mem_hotplug_done from the failure condition
and keeping the single unlock in the catch all failure path.

Link: http://lkml.kernel.org/r/20190115120307.22768-1-mhocko@kernel.org
Fixes: 7960509329 ("mm, memory_hotplug: print reason for the offlining failure")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Naoya Horiguchi
6376360ecb mm: hwpoison: use do_send_sig_info() instead of force_sig()
Currently memory_failure() is racy against process's exiting, which
results in kernel crash by null pointer dereference.

The root cause is that memory_failure() uses force_sig() to forcibly
kill asynchronous (meaning not in the current context) processes.  As
discussed in thread https://lkml.org/lkml/2010/6/8/236 years ago for OOM
fixes, this is not a right thing to do.  OOM solves this issue by using
do_send_sig_info() as done in commit d2d393099d ("signal:
oom_kill_task: use SEND_SIG_FORCED instead of force_sig()"), so this
patch is suggesting to do the same for hwpoison.  do_send_sig_info()
properly accesses to siglock with lock_task_sighand(), so is free from
the reported race.

I confirmed that the reported bug reproduces with inserting some delay
in kill_procs(), and it never reproduces with this patch.

Note that memory_failure() can send another type of signal using
force_sig_mceerr(), and the reported race shouldn't happen on it because
force_sig_mceerr() is called only for synchronous processes (i.e.
BUS_MCEERR_AR happens only when some process accesses to the corrupted
memory.)

Link: http://lkml.kernel.org/r/20190116093046.GA29835@hori1.linux.bs1.fc.nec.co.jp
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Anders Roxell
0d0c8de878 kasan: mark file common so ftrace doesn't trace it
When option CONFIG_KASAN is enabled toghether with ftrace, function
ftrace_graph_caller() gets in to a recursion, via functions
kasan_check_read() and kasan_check_write().

 Breakpoint 2, ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179
 179             mcount_get_pc             x0    //     function's pc
 (gdb) bt
 #0  ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:179
 #1  0xffffff90101406c8 in ftrace_caller () at ../arch/arm64/kernel/entry-ftrace.S:151
 #2  0xffffff90106fd084 in kasan_check_write (p=0xffffffc06c170878, size=4) at ../mm/kasan/common.c:105
 #3  0xffffff90104a2464 in atomic_add_return (v=<optimized out>, i=<optimized out>) at ./include/generated/atomic-instrumented.h:71
 #4  atomic_inc_return (v=<optimized out>) at ./include/generated/atomic-fallback.h:284
 #5  trace_graph_entry (trace=0xffffffc03f5ff380) at ../kernel/trace/trace_functions_graph.c:441
 #6  0xffffff9010481774 in trace_graph_entry_watchdog (trace=<optimized out>) at ../kernel/trace/trace_selftest.c:741
 #7  0xffffff90104a185c in function_graph_enter (ret=<optimized out>, func=<optimized out>, frame_pointer=18446743799894897728, retp=<optimized out>) at ../kernel/trace/trace_functions_graph.c:196
 #8  0xffffff9010140628 in prepare_ftrace_return (self_addr=18446743592948977792, parent=0xffffffc03f5ff418, frame_pointer=18446743799894897728) at ../arch/arm64/kernel/ftrace.c:231
 #9  0xffffff90101406f4 in ftrace_graph_caller () at ../arch/arm64/kernel/entry-ftrace.S:182
 Backtrace stopped: previous frame identical to this frame (corrupt stack?)
 (gdb)

Rework so that the kasan implementation isn't traced.

Link: http://lkml.kernel.org/r/20181212183447.15890-1-anders.roxell@linaro.org
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Jonathan Neuschäfer
9807683384 init/Kconfig: fix grammar by moving a closing parenthesis
Link: http://lkml.kernel.org/r/20190129150813.15785-1-j.neuschaefer@gmx.net
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Dan Carpenter
db7ddeab3c lib/test_kmod.c: potential double free in error handling
There is a copy and paste bug so we set "config->test_driver" to NULL
twice instead of setting "config->test_fs".  Smatch complains that it
leads to a double free:

  lib/test_kmod.c:840 __kmod_config_init() warn: 'config->test_fs' double freed

Link: http://lkml.kernel.org/r/20190121140011.GA14283@kadam
Fixes: d9c6a72d6f ("kmod: add test driver to stress test the module loader")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Shakeel Butt
cefc7ef3c8 mm, oom: fix use-after-free in oom_kill_process
Syzbot instance running on upstream kernel found a use-after-free bug in
oom_kill_process.  On further inspection it seems like the process
selected to be oom-killed has exited even before reaching
read_lock(&tasklist_lock) in oom_kill_process().  More specifically the
tsk->usage is 1 which is due to get_task_struct() in oom_evaluate_task()
and the put_task_struct within for_each_thread() frees the tsk and
for_each_thread() tries to access the tsk.  The easiest fix is to do
get/put across the for_each_thread() on the selected task.

Now the next question is should we continue with the oom-kill as the
previously selected task has exited? However before adding more
complexity and heuristics, let's answer why we even look at the children
of oom-kill selected task? The select_bad_process() has already selected
the worst process in the system/memcg.  Due to race, the selected
process might not be the worst at the kill time but does that matter?
The userspace can use the oom_score_adj interface to prefer children to
be killed before the parent.  I looked at the history but it seems like
this is there before git history.

Link: http://lkml.kernel.org/r/20190121215850.221745-1-shakeelb@google.com
Reported-by: syzbot+7fbbfa368521945f0e3d@syzkaller.appspotmail.com
Fixes: 6b0c81b3be ("mm, oom: reduce dependency on tasklist_lock")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Qian Cai
b13bc35193 mm/hotplug: invalid PFNs from pfn_to_online_page()
On an arm64 ThunderX2 server, the first kmemleak scan would crash [1]
with CONFIG_DEBUG_VM_PGFLAGS=y due to page_to_nid() found a pfn that is
not directly mapped (MEMBLOCK_NOMAP).  Hence, the page->flags is
uninitialized.

This is due to the commit 9f1eb38e0e ("mm, kmemleak: little
optimization while scanning") starts to use pfn_to_online_page() instead
of pfn_valid().  However, in the CONFIG_MEMORY_HOTPLUG=y case,
pfn_to_online_page() does not call memblock_is_map_memory() while
pfn_valid() does.

Historically, the commit 68709f4538 ("arm64: only consider memblocks
with NOMAP cleared for linear mapping") causes pages marked as nomap
being no long reassigned to the new zone in memmap_init_zone() by
calling __init_single_page().

Since the commit 2d070eab2e ("mm: consider zone which is not fully
populated to have holes") introduced pfn_to_online_page() and was
designed to return a valid pfn only, but it is clearly broken on arm64.

Therefore, let pfn_to_online_page() call pfn_valid_within(), so it can
handle nomap thanks to the commit f52bb98f5a ("arm64: mm: always
enable CONFIG_HOLES_IN_ZONE"), while it will be optimized away on
architectures where have no HOLES_IN_ZONE.

[1]
  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000006
  Mem abort info:
    ESR = 0x96000005
    Exception class = DABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
  Data abort info:
    ISV = 0, ISS = 0x00000005
    CM = 0, WnR = 0
  Internal error: Oops: 96000005 [#1] SMP
  CPU: 60 PID: 1408 Comm: kmemleak Not tainted 5.0.0-rc2+ #8
  pstate: 60400009 (nZCv daif +PAN -UAO)
  pc : page_mapping+0x24/0x144
  lr : __dump_page+0x34/0x3dc
  sp : ffff00003a5cfd10
  x29: ffff00003a5cfd10 x28: 000000000000802f
  x27: 0000000000000000 x26: 0000000000277d00
  x25: ffff000010791f56 x24: ffff7fe000000000
  x23: ffff000010772f8b x22: ffff00001125f670
  x21: ffff000011311000 x20: ffff000010772f8b
  x19: fffffffffffffffe x18: 0000000000000000
  x17: 0000000000000000 x16: 0000000000000000
  x15: 0000000000000000 x14: ffff802698b19600
  x13: ffff802698b1a200 x12: ffff802698b16f00
  x11: ffff802698b1a400 x10: 0000000000001400
  x9 : 0000000000000001 x8 : ffff00001121a000
  x7 : 0000000000000000 x6 : ffff0000102c53b8
  x5 : 0000000000000000 x4 : 0000000000000003
  x3 : 0000000000000100 x2 : 0000000000000000
  x1 : ffff000010772f8b x0 : ffffffffffffffff
  Process kmemleak (pid: 1408, stack limit = 0x(____ptrval____))
  Call trace:
   page_mapping+0x24/0x144
   __dump_page+0x34/0x3dc
   dump_page+0x28/0x4c
   kmemleak_scan+0x4ac/0x680
   kmemleak_scan_thread+0xb4/0xdc
   kthread+0x12c/0x13c
   ret_from_fork+0x10/0x18
  Code: d503201f f9400660 36000040 d1000413 (f9400661)
  ---[ end trace 4d4bd7f573490c8e ]---
  Kernel panic - not syncing: Fatal exception
  SMP: stopping secondary CPUs
  Kernel Offset: disabled
  CPU features: 0x002,20000c38
  Memory Limit: none
  ---[ end Kernel panic - not syncing: Fatal exception ]---

Link: http://lkml.kernel.org/r/20190122132916.28360-1-cai@lca.pw
Fixes: 9f1eb38e0e ("mm, kmemleak: little optimization while scanning")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Oscar Salvador
eeb0efd071 mm,memory_hotplug: fix scan_movable_pages() for gigantic hugepages
This is the same sort of error we saw in commit 17e2e7d7e1 ("mm,
page_alloc: fix has_unmovable_pages for HugePages").

Gigantic hugepages cross several memblocks, so it can be that the page
we get in scan_movable_pages() is a page-tail belonging to a
1G-hugepage.  If that happens, page_hstate()->size_to_hstate() will
return NULL, and we will blow up in hugepage_migration_supported().

The splat is as follows:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  #PF error: [normal kernel read fault]
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  CPU: 1 PID: 1350 Comm: bash Tainted: G            E     5.0.0-rc1-mm1-1-default+ #27
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
  RIP: 0010:__offline_pages+0x6ae/0x900
  Call Trace:
   memory_subsys_offline+0x42/0x60
   device_offline+0x80/0xa0
   state_store+0xab/0xc0
   kernfs_fop_write+0x102/0x180
   __vfs_write+0x26/0x190
   vfs_write+0xad/0x1b0
   ksys_write+0x42/0x90
   do_syscall_64+0x5b/0x180
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  Modules linked in: af_packet(E) xt_tcpudp(E) ipt_REJECT(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv4(E) ip_set(E) nfnetlink(E) ebtable_nat(E) ebtable_broute(E) bridge(E) stp(E) llc(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ebtable_filter(E) ebtables(E) iptable_filter(E) ip_tables(E) x_tables(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) bochs_drm(E) ttm(E) aesni_intel(E) drm_kms_helper(E) aes_x86_64(E) crypto_simd(E) cryptd(E) glue_helper(E) drm(E) virtio_net(E) syscopyarea(E) sysfillrect(E) net_failover(E) sysimgblt(E) pcspkr(E) failover(E) i2c_piix4(E) fb_sys_fops(E) parport_pc(E) parport(E) button(E) btrfs(E) libcrc32c(E) xor(E) zstd_decompress(E) zstd_compress(E) xxhash(E) raid6_pq(E) sd_mod(E) ata_generic(E) ata_piix(E) ahci(E) libahci(E) libata(E) crc32c_intel(E) serio_raw(E) virtio_pci(E) virtio_ring(E) virtio(E) sg(E) scsi_mod(E) autofs4(E)

[akpm@linux-foundation.org: fix brace layout, per David.  Reduce indentation]
Link: http://lkml.kernel.org/r/20190122154407.18417-1-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Johannes Weiner
1b69ac6b40 psi: fix aggregation idle shut-off
psi has provisions to shut off the periodic aggregation worker when
there is a period of no task activity - and thus no data that needs
aggregating.  However, while developing psi monitoring, Suren noticed
that the aggregation clock currently won't stay shut off for good.

Debugging this revealed a flaw in the idle design: an aggregation run
will see no task activity and decide to go to sleep; shortly thereafter,
the kworker thread that executed the aggregation will go idle and cause
a scheduling change, during which the psi callback will kick the
!pending worker again.  This will ping-pong forever, and is equivalent
to having no shut-off logic at all (but with more code!)

Fix this by exempting aggregation workers from psi's clock waking logic
when the state change is them going to sleep.  To do this, tag workers
with the last work function they executed, and if in psi we see a worker
going to sleep after aggregating psi data, we will not reschedule the
aggregation work item.

What if the worker is also executing other items before or after?

Any psi state times that were incurred by work items preceding the
aggregation work will have been collected from the per-cpu buckets
during the aggregation itself.  If there are work items following the
aggregation work, the worker's last_func tag will be overwritten and the
aggregator will be kept alive to process this genuine new activity.

If the aggregation work is the last thing the worker does, and we decide
to go idle, the brief period of non-idle time incurred between the
aggregation run and the kworker's dequeue will be stranded in the
per-cpu buckets until the clock is woken by later activity.  But that
should not be a problem.  The buckets can hold 4s worth of time, and
future activity will wake the clock with a 2s delay, giving us 2s worth
of data we can leave behind when disabling aggregation.  If it takes a
worker more than two seconds to go idle after it finishes its last work
item, we likely have bigger problems in the system, and won't notice one
sample that was averaged with a bogus per-CPU weight.

Link: http://lkml.kernel.org/r/20190116193501.1910-1-hannes@cmpxchg.org
Fixes: eb414681d5 ("psi: pressure stall information for CPU, memory, and IO")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Mikhail Zaslonko
24feb47c5f mm, memory_hotplug: test_pages_in_a_zone do not pass the end of zone
If memory end is not aligned with the sparse memory section boundary,
the mapping of such a section is only partly initialized.  This may lead
to VM_BUG_ON due to uninitialized struct pages access from
test_pages_in_a_zone() function triggered by memory_hotplug sysfs
handlers.

Here are the the panic examples:
 CONFIG_DEBUG_VM_PGFLAGS=y
 kernel parameter mem=2050M
 --------------------------
 page:000003d082008000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
   test_pages_in_a_zone+0xde/0x160
   show_valid_zones+0x5c/0x190
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   test_pages_in_a_zone+0xde/0x160
 Kernel panic - not syncing: Fatal exception: panic_on_oops

Fix this by checking whether the pfn to check is within the zone.

[mhocko@suse.com: separated this change from http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com]
Link: http://lkml.kernel.org/r/20190128144506.15603-3-mhocko@kernel.org

[mhocko@suse.com: separated this change from
http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Michal Hocko
efad4e475c mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
Patch series "mm, memory_hotplug: fix uninitialized pages fallouts", v2.

Mikhail Zaslonko has posted fixes for the two bugs quite some time ago
[1].  I have pushed back on those fixes because I believed that it is
much better to plug the problem at the initialization time rather than
play whack-a-mole all over the hotplug code and find all the places
which expect the full memory section to be initialized.

We have ended up with commit 2830bf6f05 ("mm, memory_hotplug:
initialize struct pages for the full memory section") merged and cause a
regression [2][3].  The reason is that there might be memory layouts
when two NUMA nodes share the same memory section so the merged fix is
simply incorrect.

In order to plug this hole we really have to be zone range aware in
those handlers.  I have split up the original patch into two.  One is
unchanged (patch 2) and I took a different approach for `removable'
crash.

[1] http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1666948
[3] http://lkml.kernel.org/r/20190125163938.GA20411@dhcp22.suse.cz

This patch (of 2):

Mikhail has reported the following VM_BUG_ON triggered when reading sysfs
removable state of a memory block:

 page:000003d08300c000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
   is_mem_section_removable+0xb4/0x190
   show_mem_removable+0x9a/0xd8
   dev_attr_show+0x34/0x70
   sysfs_kf_seq_show+0xc8/0x148
   seq_read+0x204/0x480
   __vfs_read+0x32/0x178
   vfs_read+0x82/0x138
   ksys_read+0x5a/0xb0
   system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
   is_mem_section_removable+0xb4/0x190
 Kernel panic - not syncing: Fatal exception: panic_on_oops

The reason is that the memory block spans the zone boundary and we are
stumbling over an unitialized struct page.  Fix this by enforcing zone
range in is_mem_section_removable so that we never run away from a zone.

Link: http://lkml.kernel.org/r/20190128144506.15603-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Debugged-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Tetsuo Handa
9bcdeb51bd oom, oom_reaper: do not enqueue same task twice
Arkadiusz reported that enabling memcg's group oom killing causes
strange memcg statistics where there is no task in a memcg despite the
number of tasks in that memcg is not 0.  It turned out that there is a
bug in wake_oom_reaper() which allows enqueuing same task twice which
makes impossible to decrease the number of tasks in that memcg due to a
refcount leak.

This bug existed since the OOM reaper became invokable from
task_will_free_mem(current) path in out_of_memory() in Linux 4.7,

  T1@P1     |T2@P1     |T3@P1     |OOM reaper
  ----------+----------+----------+------------
                                   # Processing an OOM victim in a different memcg domain.
                        try_charge()
                          mem_cgroup_out_of_memory()
                            mutex_lock(&oom_lock)
             try_charge()
               mem_cgroup_out_of_memory()
                 mutex_lock(&oom_lock)
  try_charge()
    mem_cgroup_out_of_memory()
      mutex_lock(&oom_lock)
                            out_of_memory()
                              oom_kill_process(P1)
                                do_send_sig_info(SIGKILL, @P1)
                                mark_oom_victim(T1@P1)
                                wake_oom_reaper(T1@P1) # T1@P1 is enqueued.
                            mutex_unlock(&oom_lock)
                 out_of_memory()
                   mark_oom_victim(T2@P1)
                   wake_oom_reaper(T2@P1) # T2@P1 is enqueued.
                 mutex_unlock(&oom_lock)
      out_of_memory()
        mark_oom_victim(T1@P1)
        wake_oom_reaper(T1@P1) # T1@P1 is enqueued again due to oom_reaper_list == T2@P1 && T1@P1->oom_reaper_list == NULL.
      mutex_unlock(&oom_lock)
                                   # Completed processing an OOM victim in a different memcg domain.
                                   spin_lock(&oom_reaper_lock)
                                   # T1P1 is dequeued.
                                   spin_unlock(&oom_reaper_lock)

but memcg's group oom killing made it easier to trigger this bug by
calling wake_oom_reaper() on the same task from one out_of_memory()
request.

Fix this bug using an approach used by commit 855b018325 ("oom,
oom_reaper: disable oom_reaper for oom_kill_allocating_task").  As a
side effect of this patch, this patch also avoids enqueuing multiple
threads sharing memory via task_will_free_mem(current) path.

Link: http://lkml.kernel.org/r/e865a044-2c10-9858-f4ef-254bc71d6cc2@i-love.sakura.ne.jp
Link: http://lkml.kernel.org/r/5ee34fc6-1485-34f8-8790-903ddabaa809@i-love.sakura.ne.jp
Fixes: af8e15cc85 ("oom, oom_reaper: do not enqueue task if it is on the oom_reaper_list head")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Aleksa Sarai <asarai@suse.de>
Cc: Jay Kamat <jgkamat@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Jan Kara
80409c65e2 mm: migrate: make buffer_migrate_page_norefs() actually succeed
Currently, buffer_migrate_page_norefs() was constantly failing because
buffer_migrate_lock_buffers() grabbed reference on each buffer.  In
fact, there's no reason for buffer_migrate_lock_buffers() to grab any
buffer references as the page is locked during all our operation and
thus nobody can reclaim buffers from the page.

So remove grabbing of buffer references which also makes
buffer_migrate_page_norefs() succeed.

Link: http://lkml.kernel.org/r/20190116131217.7226-1-jack@suse.cz
Fixes: 89cb0888ca "mm: migrate: provide buffer_migrate_page_norefs()"
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Andrei Vagin
8fb335e078 kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
Currently, exit_ptrace() adds all ptraced tasks in a dead list, then
zap_pid_ns_processes() waits on all tasks in a current pidns, and only
then are tasks from the dead list released.

zap_pid_ns_processes() can get stuck on waiting tasks from the dead
list.  In this case, we will have one unkillable process with one or
more dead children.

Thanks to Oleg for the advice to release tasks in find_child_reaper().

Link: http://lkml.kernel.org/r/20190110175200.12442-1-avagin@gmail.com
Fixes: 7c8bd2322c ("exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent()")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Qian Cai
a8e911d135 x86_64: increase stack size for KASAN_EXTRA
If the kernel is configured with KASAN_EXTRA, the stack size is
increasted significantly because this option sets "-fstack-reuse" to
"none" in GCC [1].  As a result, it triggers stack overrun quite often
with 32k stack size compiled using GCC 8.  For example, this reproducer

  https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c

triggers a "corrupted stack end detected inside scheduler" very reliably
with CONFIG_SCHED_STACK_END_CHECK enabled.

There are just too many functions that could have a large stack with
KASAN_EXTRA due to large local variables that have been called over and
over again without being able to reuse the stacks.  Some noticiable ones
are

  size
  7648 shrink_page_list
  3584 xfs_rmap_convert
  3312 migrate_page_move_mapping
  3312 dev_ethtool
  3200 migrate_misplaced_transhuge_page
  3168 copy_process

There are other 49 functions are over 2k in size while compiling kernel
with "-Wframe-larger-than=" even with a related minimal config on this
machine.  Hence, it is too much work to change Makefiles for each object
to compile without "-fsanitize-address-use-after-scope" individually.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c23

Although there is a patch in GCC 9 to help the situation, GCC 9 probably
won't be released in a few months and then it probably take another
6-month to 1-year for all major distros to include it as a default.
Hence, the stack usage with KASAN_EXTRA can be revisited again in 2020
when GCC 9 is everywhere.  Until then, this patch will help users avoid
stack overrun.

This has already been fixed for arm64 for the same reason via
6e8830674e ("arm64: kasan: Increase stack size for KASAN_EXTRA").

Link: http://lkml.kernel.org/r/20190109215209.2903-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Andrea Arcangeli
1ac25013fb mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT
hugetlb needs the same fix as faultin_nopage (which was applied in
commit 96312e6128 ("mm/gup.c: teach get_user_pages_unlocked to handle
FOLL_NOWAIT")) or KVM hangs because it thinks the mmap_sem was already
released by hugetlb_fault() if it returned VM_FAULT_RETRY, but it wasn't
in the FOLL_NOWAIT case.

Link: http://lkml.kernel.org/r/20190109020203.26669-2-aarcange@redhat.com
Fixes: ce53053ce3 ("kvm: switch get_user_page_nowait() to get_user_pages_unlocked()")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Reported-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:23 -08:00
Masahiro Yamada
36c0f7f0f8 arch: unexport asm/shmparam.h for all architectures
Most architectures do not export shmparam.h to user-space.

  $ find arch -name shmparam.h  | sort
  arch/alpha/include/asm/shmparam.h
  arch/arc/include/asm/shmparam.h
  arch/arm64/include/asm/shmparam.h
  arch/arm/include/asm/shmparam.h
  arch/csky/include/asm/shmparam.h
  arch/ia64/include/asm/shmparam.h
  arch/mips/include/asm/shmparam.h
  arch/nds32/include/asm/shmparam.h
  arch/nios2/include/asm/shmparam.h
  arch/parisc/include/asm/shmparam.h
  arch/powerpc/include/asm/shmparam.h
  arch/s390/include/asm/shmparam.h
  arch/sh/include/asm/shmparam.h
  arch/sparc/include/asm/shmparam.h
  arch/x86/include/asm/shmparam.h
  arch/xtensa/include/asm/shmparam.h

Strangely, some users of the asm-generic wrapper export shmparam.h

  $ git grep 'generic-y += shmparam.h'
  arch/c6x/include/uapi/asm/Kbuild:generic-y += shmparam.h
  arch/h8300/include/uapi/asm/Kbuild:generic-y += shmparam.h
  arch/hexagon/include/uapi/asm/Kbuild:generic-y += shmparam.h
  arch/m68k/include/uapi/asm/Kbuild:generic-y += shmparam.h
  arch/microblaze/include/uapi/asm/Kbuild:generic-y += shmparam.h
  arch/openrisc/include/uapi/asm/Kbuild:generic-y += shmparam.h
  arch/riscv/include/asm/Kbuild:generic-y += shmparam.h
  arch/unicore32/include/uapi/asm/Kbuild:generic-y += shmparam.h

The newly added riscv correctly creates the asm-generic wrapper
in the kernel space, but the others (c6x, h8300, hexagon, m68k,
microblaze, openrisc, unicore32) create the one in the uapi directory.

Digging into the git history, now I guess fcc8487d47 ("uapi:
export all headers under uapi directories") was the misconversion.
Prior to that commit, no architecture exported to shmparam.h
As its commit description said, that commit exported shmparam.h
for c6x, h8300, hexagon, m68k, openrisc, unicore32.

83f0124ad8 ("microblaze: remove asm-generic wrapper headers")
accidentally exported shmparam.h for microblaze.

This commit unexports shmparam.h for those architectures.

There is no more reason to export include/uapi/asm-generic/shmparam.h,
so it has been moved to include/asm-generic/shmparam.h

Link: http://lkml.kernel.org/r/1546904307-11124-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Stafford Horne <shorne@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Aurelien Jacquiot <jacquiot.aurelien@gmail.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Mark Salter <msalter@redhat.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:22 -08:00
Alexey Dobriyan
1fde6f21d9 proc: fix /proc/net/* after setns(2)
/proc entries under /proc/net/* can't be cached into dcache because
setns(2) can change current net namespace.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: avoid vim miscolorization]
[adobriyan@gmail.com: write test, add dummy ->d_revalidate hook: necessary if /proc/net/* is pinned at setns time]
  Link: http://lkml.kernel.org/r/20190108192350.GA12034@avx2
Link: http://lkml.kernel.org/r/20190107162336.GA9239@avx2
Fixes: 1da4d377f9 ("proc: revalidate misc dentries")
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: Mateusz Stępień <mateusz.stepien@netrounds.com>
Reported-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:22 -08:00
Oscar Salvador
1723058eab mm, memory_hotplug: don't bail out in do_migrate_range() prematurely
do_migrate_range() takes a memory range and tries to isolate the pages
to put them into a list.  This list will be later on used in
migrate_pages() to know the pages we need to migrate.

Currently, if we fail to isolate a single page, we put all already
isolated pages back to their LRU and we bail out from the function.
This is quite suboptimal, as this will force us to start over again
because scan_movable_pages will give us the same range.  If there is no
chance that we can isolate that page, we will loop here forever.

Issue debugged in [1] has proved that.  During the debugging of that
issue, it was noticed that if do_migrate_ranges() fails to isolate a
single page, we will just discard the work we have done so far and bail
out, which means that scan_movable_pages() will find again the same set
of pages.

Instead, we can just skip the error, keep isolating as much pages as
possible and then proceed with the call to migrate_pages().

This will allow us to do as much work as possible at once.

[1] https://lkml.org/lkml/2018/12/6/324

Michal said:

: I still think that this doesn't give us a whole picture.  Looping for
: ever is a bug.  Failing the isolation is quite possible and it should
: be a ephemeral condition (e.g.  a race with freeing the page or
: somebody else isolating the page for whatever reason).  And here comes
: the disadvantage of the current implementation.  We simply throw
: everything on the floor just because of a ephemeral condition.  The
: racy page_count check is quite dubious to prevent from that.

Link: http://lkml.kernel.org/r/20181211135312.27034-1-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dan Williams <dan.j.williams@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-01 15:46:22 -08:00
David S. Miller
e7b816415e Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2019-01-31

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) disable preemption in sender side of socket filters, from Alexei.

2) fix two potential deadlocks in syscall bpf lookup and prog_register,
   from Martin and Alexei.

3) fix BTF to allow typedef on func_proto, from Yonghong.

4) two bpftool fixes, from Jiri and Paolo.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:28:07 -08:00
James Bottomley
42caa0edab scsi: aic94xx: fix module loading
The aic94xx driver is currently failing to load with errors like

sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:03.0/0000:02:00.3/0000:07:02.0/revision'

Because the PCI code had recently added a file named 'revision' to every
PCI device.  Fix this by renaming the aic94xx revision file to
aic_revision.  This is safe to do for us because as far as I can tell,
there's nothing in userspace relying on the current aic94xx revision file
so it can be renamed without breaking anything.

Fixes: 702ed3be1b (PCI: Create revision file in sysfs)
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-02-01 18:04:07 -05:00
Eric Dumazet
9b1f19d810 dccp: fool proof ccid_hc_[rt]x_parse_options()
Similarly to commit 276bdb82de ("dccp: check ccid before dereferencing")
it is wise to test for a NULL ccid.

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.0.0-rc3+ #37
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:ccid_hc_tx_parse_options net/dccp/ccid.h:205 [inline]
RIP: 0010:dccp_parse_options+0x8d9/0x12b0 net/dccp/options.c:233
Code: c5 0f b6 75 b3 80 38 00 0f 85 d6 08 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 b8 4c 8b b8 f8 07 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 08 00 0f 85 95 08 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b
kobject: 'loop5' (0000000080f78fc1): kobject_uevent_env
RSP: 0018:ffff8880a94df0b8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8880858ac723 RCX: dffffc0000000000
RDX: 0000000000000100 RSI: 0000000000000007 RDI: 0000000000000001
RBP: ffff8880a94df140 R08: 0000000000000001 R09: ffff888061b83a80
R10: ffffed100c370752 R11: ffff888061b83a97 R12: 0000000000000026
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0defa33518 CR3: 000000008db5e000 CR4: 00000000001406e0
kobject: 'loop5' (0000000080f78fc1): fill_kobj_path: path = '/devices/virtual/block/loop5'
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 dccp_rcv_state_process+0x2b6/0x1af6 net/dccp/input.c:654
 dccp_v4_do_rcv+0x100/0x190 net/dccp/ipv4.c:688
 sk_backlog_rcv include/net/sock.h:936 [inline]
 __sk_receive_skb+0x3a9/0xea0 net/core/sock.c:473
 dccp_v4_rcv+0x10cb/0x1f80 net/dccp/ipv4.c:880
 ip_protocol_deliver_rcu+0xb6/0xa20 net/ipv4/ip_input.c:208
 ip_local_deliver_finish+0x23b/0x390 net/ipv4/ip_input.c:234
 NF_HOOK include/linux/netfilter.h:289 [inline]
 NF_HOOK include/linux/netfilter.h:283 [inline]
 ip_local_deliver+0x1f0/0x740 net/ipv4/ip_input.c:255
 dst_input include/net/dst.h:450 [inline]
 ip_rcv_finish+0x1f4/0x2f0 net/ipv4/ip_input.c:414
 NF_HOOK include/linux/netfilter.h:289 [inline]
 NF_HOOK include/linux/netfilter.h:283 [inline]
 ip_rcv+0xed/0x620 net/ipv4/ip_input.c:524
 __netif_receive_skb_one_core+0x160/0x210 net/core/dev.c:4973
 __netif_receive_skb+0x2c/0x1c0 net/core/dev.c:5083
 process_backlog+0x206/0x750 net/core/dev.c:5923
 napi_poll net/core/dev.c:6346 [inline]
 net_rx_action+0x76d/0x1930 net/core/dev.c:6412
 __do_softirq+0x30b/0xb11 kernel/softirq.c:292
 run_ksoftirqd kernel/softirq.c:654 [inline]
 run_ksoftirqd+0x8e/0x110 kernel/softirq.c:646
 smpboot_thread_fn+0x6ab/0xa10 kernel/smpboot.c:164
 kthread+0x357/0x430 kernel/kthread.c:246
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
Modules linked in:
---[ end trace 58a0ba03bea2c376 ]---
RIP: 0010:ccid_hc_tx_parse_options net/dccp/ccid.h:205 [inline]
RIP: 0010:dccp_parse_options+0x8d9/0x12b0 net/dccp/options.c:233
Code: c5 0f b6 75 b3 80 38 00 0f 85 d6 08 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 b8 4c 8b b8 f8 07 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 08 00 0f 85 95 08 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b
RSP: 0018:ffff8880a94df0b8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8880858ac723 RCX: dffffc0000000000
RDX: 0000000000000100 RSI: 0000000000000007 RDI: 0000000000000001
RBP: ffff8880a94df140 R08: 0000000000000001 R09: ffff888061b83a80
R10: ffffed100c370752 R11: ffff888061b83a97 R12: 0000000000000026
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0defa33518 CR3: 0000000009871000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:49:10 -08:00
David S. Miller
ec34f792d3 Merge branch 'smc-fixes'
Ursula Braun says:

====================
net/smc: fixes 2019-01-30

here are some fixes in different areas of the smc code for the net
tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:45:45 -08:00
Karsten Graul
46ad02229d net/smc: fix use of variable in cleared area
Do not use pend->idx as index for the arrays because its value is
located in the cleared area. Use the existing local variable instead.
Without this fix the wrong area might be cleared.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:45:45 -08:00
Karsten Graul
e5f3aa04db net/smc: use device link provided in qp_context
The device field of the IB event structure does not always point to the
SMC IB device. Load the pointer from the qp_context which is always
provided to smc_ib_qp_event_handler() in the priv field. And for qp
events the affected port is given in the qp structure of the ibevent,
derive it from there.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:45:45 -08:00
Karsten Graul
2dee25af42 net/smc: call smc_cdc_msg_send() under send_lock
Call smc_cdc_msg_send() under the connection send_lock to make sure all
send operations for one connection are serialized.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:45:45 -08:00
Karsten Graul
33f3fcc290 net/smc: do not wait under send_lock
smc_cdc_get_free_slot() might wait for free transfer buffers when using
SMC-R. This wait should not be done under the send_lock, which is a
spin_lock. This fixes a cpu loop in parallel threads waiting for the
send_lock.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:45:45 -08:00
Karsten Graul
51c5aba3b6 net/smc: recvmsg and splice_read should return 0 after shutdown
When a socket was connected and is now shut down for read, return 0 to
indicate end of data in recvmsg and splice_read (like TCP) and do not
return ENOTCONN. This behavior is required by the socket api.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:45:45 -08:00
Karsten Graul
6889b36da7 net/smc: don't wait for send buffer space when data was already sent
When there is no more send buffer space and at least 1 byte was already
sent then return to user space. The wait is only done when no data was
sent by the sendmsg() call.
This fixes smc_tx_sendmsg() which tried to always send all user data and
started to wait for free send buffer space when needed. During this wait
the user space program was blocked in the sendmsg() call and hence not
able to receive incoming data. When both sides were in such a situation
then the connection stalled forever.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:45:44 -08:00
Karsten Graul
77f838ace7 net/smc: prevent races between smc_lgr_terminate() and smc_conn_free()
To prevent races between smc_lgr_terminate() and smc_conn_free() add an
extra check of the lgr field before accessing it, and cancel a delayed
free_work when a new smc connection is created.
This fixes the problem that free_work cleared the lgr variable but
smc_lgr_terminate() or smc_conn_free() still access it in parallel.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:45:44 -08:00
Hans Wippel
ca8dc1334a net/smc: allow 16 byte pnetids in netlink policy
Currently, users can only send pnetids with a maximum length of 15 bytes
over the SMC netlink interface although the maximum pnetid length is 16
bytes. This patch changes the SMC netlink policy to accept 16 byte
pnetids.

Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:45:44 -08:00
Ursula Braun
14d22d4d61 net/smc: fix another sizeof to int comparison
Comparing an int to a size, which is unsigned, causes the int to become
unsigned, giving the wrong result. kernel_sendmsg can return a negative
error code.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:45:44 -08:00
Govindarajulu Varadarajan
7596175e99 enic: fix checksum validation for IPv6
In case of IPv6 pkts, ipv4_csum_ok is 0. Because of this, driver does
not set skb->ip_summed. So IPv6 rx checksum is not offloaded.

Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:41:42 -08:00
Greg Kroah-Hartman
ba59fb0273 sctp: walk the list of asoc safely
In sctp_sendmesg(), when walking the list of endpoint associations, the
association can be dropped from the list, making the list corrupt.
Properly handle this by using list_for_each_entry_safe()

Fixes: 4910280503 ("sctp: add support for snd flag SCTP_SENDALL process in sendmsg")
Reported-by: Secunia Research <vuln@secunia.com>
Tested-by: Secunia Research <vuln@secunia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 10:41:46 -08:00
Linus Torvalds
5eeb63359b Second 5.0 rc pull request
Still not much going on, the usual set of OOP's and driver updates this time.
 
 - Fix two uapi breakage regressions in mlx5 drivers
 
 - Various oops fixes in hfi1, mlx4, umem, uverbs, and ipoib
 
 - A protocol bug fix for hfi1 preventing it from implementing the verbs
   API properly, and a compatability fix for EXEC STACK user programs
 
 - Fix missed refcounting in the 'advise_mr' patches merged this
   cycle.
 
 - Fix wrong use of the uABI in the hns SRQ patches merged this cycle.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlxUguAACgkQOG33FX4g
 mxq93Q/+N7SyMjClN2/4Wz+vY1dAAOneshVdg+vsoe+S3sZG77xTAps1NX9nJ89o
 VXc4RyTCQXIirN1tiqXtjvbcrrPvxpDhX9OOo1qE/a1LfnUAtNM2om9IrxOQLxjN
 suYCk9WihilePS6WEtbZNeR3cP496Npm5N/uxVibPNCZghdxIZCa+PXAhosf0vCH
 aslOgQnQXG1hRc4XjyxMa9ltT3R967wYHnhIpuPe9psH3HcB+wrLzusyBYUMYu4N
 B2v9dyBNsuHlwTldF4b2ELD7mAJUMSk8X+x54a7RGQe5kgxD1BRjyOol67WwpNnl
 JZTwTwluiVSEZ5V63dj2AkXKMo1695Q1tuMftGoEipaHopUXOTb70MOH9TDR1jbX
 /jUjMUNJtafHhNBAVtTd04nX1fOsKp4GYVImoKPPRXPaQFoF5jXvqyGUuKZ9DAYb
 zbnLKMaJOCu2JcB8sHHUfjgiGtJqtf73V0a4O3fq+rTSkY6UDnEgYF7j7GY1X+5N
 E478aWw1JYVTIyUgiRFguJcBgJsY7hjhIS9fJaRD9aXuwo9Lg7bdpDUogfpcERA3
 iNZFEyx8h2fBAb8DojYIBzRgbEtZkIYxe18jILK3SMjGQ0PfB9GGAl3JAWDOQIGx
 uphV0WM/MKf31SeY+zD9dh9Z6ew4oTUNv3upRAzLDmkRe3J07j8=
 =LxVe
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Still not much going on, the usual set of oops and driver fixes this
  time:

   - Fix two uapi breakage regressions in mlx5 drivers

   - Various oops fixes in hfi1, mlx4, umem, uverbs, and ipoib

   - A protocol bug fix for hfi1 preventing it from implementing the
     verbs API properly, and a compatability fix for EXEC STACK user
     programs

   - Fix missed refcounting in the 'advise_mr' patches merged this
     cycle.

   - Fix wrong use of the uABI in the hns SRQ patches merged this cycle"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/uverbs: Fix OOPs in uverbs_user_mmap_disassociate
  IB/ipoib: Fix for use-after-free in ipoib_cm_tx_start
  IB/uverbs: Fix ioctl query port to consider device disassociation
  RDMA/mlx5: Fix flow creation on representors
  IB/uverbs: Fix OOPs upon device disassociation
  RDMA/umem: Add missing initialization of owning_mm
  RDMA/hns: Update the kernel header file of hns
  IB/mlx5: Fix how advise_mr() launches async work
  RDMA/device: Expose ib_device_try_get(()
  IB/hfi1: Add limit test for RC/UC send via loopback
  IB/hfi1: Remove overly conservative VM_EXEC flag check
  IB/{hfi1, qib}: Fix WC.byte_len calculation for UD_SEND_WITH_IMM
  IB/mlx4: Fix using wrong function to destroy sqp AHs under SRIOV
  RDMA/mlx5: Fix check for supported user flags when creating a QP
2019-02-01 10:39:24 -08:00
Linus Torvalds
9ace868a17 Changes since last update:
- fix page migration when using iomap for pagecache management
 - fix a use-after-free bug in the directio code
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAlxPK1IACgkQ+H93GTRK
 tOvzeA/+I4bWVmovfV+EGFzHSV6zRsy17v6c4ncMrdia41rhmvxl+sAJgj2+uFCb
 J37cCMpPrmAOz+JGIW7PbCt8uzmwaXOfB9p9N58wM+hSSxtlN+wZFsIaoepOUTDK
 t3e2L7QQxQjN9HXZU0RNUi/zgS3poDfzap7cZ71spBxX5hVd1zVQa0q/o5OXr7OI
 sBlZLsIOhmS8WU2TmfkwzUVi+/FR4dCgyP8eDAGho/KbwvO9sfWzLNxf0U/ORWfA
 JG+2LX42eKKjX6wo39zW1mAXwOBhnLnCqOOgKrDy8XRXPARiNAzLNn0AwhxJSAqD
 z3qE298Oag6gZo0lNJjDXIko5D9y2koWjQe7z4fJ2JYpV5mSyq8/F4XDNO2FKIap
 07p0OBGa1yfwQYmS5TrhJvYwvsHqTNs122jpowqeD3o0Xh64y2TZLEMdhhIsRjga
 +S9OSVQ15JDf/QeI5LPGK6Oc6B3JnRYgrYf7g7DYu4eqEsJ3V3pqtbXzjxGkzUjx
 5xf85ujuRUQoKCPZQ00ewmsfZMfOcaYqfhosx6LvR6ZPvPH3Ex3nLHks5ZV/eXfR
 Auusq6XMiHb5ljukfVCa0WStntUl5gMaRha5QJy1Vg5Zd1ikvTkB5CkJFlODJ8hS
 GaEIa58Gf75dkHOvm4bFEOoy2EcE3I+qdwR+0xgg+6gDlNeZZNQ=
 =KYgc
 -----END PGP SIGNATURE-----

Merge tag 'iomap-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap fixes from Darrick Wong:
 "A couple of iomap fixes to eliminate some memory corruption and hang
  problems that were reported:

   - fix page migration when using iomap for pagecache management

   - fix a use-after-free bug in the directio code"

* tag 'iomap-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: fix a use after free in iomap_dio_rw
  iomap: get/put the page in iomap_page_create/release()
2019-02-01 10:30:18 -08:00
Linus Torvalds
3325254ca1 Power management fixes for 5.0-rc5
- Since ktime_get() turns out to be problematic for device
    autosuspend in the PM-runtime framework, make it use
    ktime_get_mono_fast_ns() instead (Vincent Guittot).
 
  - Fix an initial value of a local variable in the "poll idle state"
    code that makes it behave not exactly as expected when all idle
    states except for the "polling" one are disabled (Doug Smythies).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJcVB7NAAoJEILEb/54YlRxJNsP+gM+tCQ4oUTj5Q1sZnYDF3Ti
 VZCBCaN4JfRdNj+grYNAi/MHRNG2jox+Epn7EFHahtRAtbck9TEmnqfP8uJrV5fY
 mJ0mNgkt09O2PI0U+8IhSjC71Nba7zq5IPkhMr9+Gha3f3UsTGD0HCJz3fHm/50W
 coilIUTH5usdnbGZjiLhbVb9d9koJjBCWOgYZsQp+flAoj4aasw4xVuu7AoQ6ZOy
 hqjhyUPSPlE35ZVi3NaXUYeDmPgx6IJ+Bs/SQvMEcH9fyCKfEIL82S0yBZxys7j2
 ZqE2qBsEPl8elZFFNS7hwwezL0ny2BpzzrODlSrVafSr0kuS6r+sRDgkpn8whwZw
 cMmf/E7HXrgwpdnvjMywJ3vOVNsXu1QwDI21c8E3lBG08wlk1yWzpMB6BK+zusmC
 TvOjfPEEosA6RUC9rOeFU6h3OhWr3AJjCfBzkK0gkTLC3jWlPWwqor0vWR4EaSo5
 5atl8hfZPEVBkPvIix6i1luP1Kvew2Ud7LR60MTAfVJAag6/AfxYAF4IuRCdj4uO
 BLdFdN6PBsT416BJ0w6qBWxsHX0qiNMH9O1B8jNWyrAIMjgKz2cRexdwBWmxVBnn
 lijnyiH7owGi3qbjnOQ86mU7XvnApnyf/+E8tFuj4VveLqxvItLvb7rT/Zn6xDTn
 ZbQXI+5nYrmLbI6AB58u
 =3UG4
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix a PM-runtime framework regression introduced by the recent
  switch-over of device autosuspend to hrtimers and a mistake in the
  "poll idle state" code introduced by a recent change in it.

  Specifics:

   - Since ktime_get() turns out to be problematic for device
     autosuspend in the PM-runtime framework, make it use
     ktime_get_mono_fast_ns() instead (Vincent Guittot).

   - Fix an initial value of a local variable in the "poll idle state"
     code that makes it behave not exactly as expected when all idle
     states except for the "polling" one are disabled (Doug Smythies)"

* tag 'pm-5.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle: poll_state: Fix default time limit
  PM-runtime: Fix deadlock with ktime_get()
2019-02-01 10:23:39 -08:00