Commit Graph

618225 Commits

Author SHA1 Message Date
Linus Torvalds
c6169de730 Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:

 - Four fixes for "flush hint" support.

   Flush hints are addresses advertised by the ACPI 6+ NFIT (NVDIMM
   Firmware Interface Table) that when written and fenced guarantee that
   writes pending in platform write buffers (outside the cpu) have been
   flushed to media.  They might also be used by hypervisors as a
   trigger condition to flush guest-persistent memory ranges to storage.

    Fix a potential data corruption issue, a broken definition of the
    hint array, a wrong allocation size for the unit test implementation
    of the flush hint table, and missing NULL check in an error path.

    The unit test, while it did not prevent these bugs from being
    merged, at least triggered occasional crashes in advance of
    production usages.

 - Fix handling of ACPI DSM error status results.  The DSM mechanism
   allows communication with platform and memory device firmware.  We
   correctly parse known errors, but were silently ignoring others.

   Fix it to consistently fail any command with a non-zero status return
   that we otherwise do not interpret / handle.

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm, region: fix flush hint table thinko
  nfit: fail DSMs that return non-zero status by default
  libnvdimm: fix devm_nvdimm_memremap() error path
  tools/testing/nvdimm: fix allocation range for mock flush hint tables
  nvdimm: fix PHYS_PFN/PFN_PHYS mixup
2016-09-29 14:59:11 -07:00
Andy Lutomirski
e1bfc11c5a x86/init: Fix cr4_init_shadow() on CR4-less machines
cr4_init_shadow() will panic on 486-like machines without CR4.  Fix
it using __read_cr4_safe().

Reported-by: david@saggiorato.net
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 1e02ce4ccc ("x86: Store a per-cpu shadow copy of CR4")
Link: http://lkml.kernel.org/r/43a20f81fb504013bf613913dc25574b45336a61.1475091074.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-29 19:08:30 +02:00
Paul Burton
058effe7fd MIPS: Fix detection of unsupported highmem with cache aliases
The paging_init() function contains code which detects that highmem is
in use but unsupported due to dcache aliasing. However this code was
ineffective because it was being run before the caches are probed,
meaning that cpu_has_dc_aliases would always evaluate to false (unless a
platform overrides it to a compile-time constant) and the detection of
the unsupported case is never triggered. The kernel would then go on to
attempt to use highmem & either hit coherency issues or trigger the
BUG_ON in flush_kernel_dcache_page().

Fix this by running paging_init() later than cpu_cache_init(), such that
the cpu_has_dc_aliases macro will evaluate correctly & the unsupported
highmem case will be detected successfully.

This then leads to a formerly hidden issue in that
mem_init_free_highmem() will attempt to free all highmem pages, even
though we're avoiding use of them & don't have valid page structs for
them. This leads to an invalid pointer dereference & a TLB exception.
Avoid this by skipping the loop in mem_init_free_highmem() if
cpu_has_dc_aliases evaluates true.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Rabin Vincent <rabinv@axis.com>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Jaedon Shin <jaedon.shin@gmail.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Cc: Jonas Gorski <jogo@openwrt.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/14184/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-09-29 18:59:49 +02:00
Paul Burton
305723ab43 MIPS: Malta: Fix IOCU disable switch read for MIPS64
Malta boards used with CPU emulators feature a switch to disable use of
an IOCU. Software has to check this switch & ignore any present IOCU if
the switch is closed. The read used to do this was unsafe for 64 bit
kernels, as it simply casted the address 0xbf403000 to a pointer &
dereferenced it. Whilst in a 32 bit kernel this would access kseg1, in a
64 bit kernel this attempts to access xuseg & results in an address
error exception.

Fix by accessing a correctly formed ckseg1 address generated using the
CKSEG1ADDR macro.

Whilst modifying this code, define the name of the register and the bit
we care about within it, which indicates whether PCI DMA is routed to
the IOCU or straight to DRAM. The code previously checked that bit 0 was
also set, but the least significant 7 bits of the CONFIG_GEN0 register
contain the value of the MReqInfo signal provided to the IOCU OCP bus,
so singling out bit 0 makes little sense & that part of the check is
dropped.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: b6d92b4a6b ("MIPS: Add option to disable software I/O coherency.")
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/14187/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-09-29 18:59:49 +02:00
Paul Burton
1eefcbc89c MIPS: Fix BUILD_ROLLBACK_PROLOGUE for microMIPS
When the kernel is built for microMIPS, branches targets need to be
known to be microMIPS code in order to result in bit 0 of the PC being
set. The branch target in the BUILD_ROLLBACK_PROLOGUE macro was simply
the end of the macro, which may be pointing at padding rather than at
code. This results in recent enough GNU linkers complaining like so:

    mips-img-linux-gnu-ld: arch/mips/built-in.o: .text+0x3e3c: Unsupported branch between ISA modes.
    mips-img-linux-gnu-ld: final link failed: Bad value
    Makefile:936: recipe for target 'vmlinux' failed
    make: *** [vmlinux] Error 1

Fix this by changing the branch target to be the start of the
appropriate handler, skipping over any padding.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14019/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-09-29 18:59:49 +02:00
Paul Burton
67acd8d5c6 MIPS: clear execution hazard after changing FTLB enable
On current P-series cores from Imagination the FTLB can be enabled or
disabled via a bit in the Config6 register, and an execution hazard is
created by changing the value of bit. The ftlb_disable function already
cleared that hazard but that does no good for other callers. Clear the
hazard in the set_ftlb_enable function that creates it, and only for the
cores where it applies.

This has the effect of reverting c982c6d6c4 ("MIPS: cpu-probe: Remove
cp0 hazard barrier when enabling the FTLB") which was incorrect.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: c982c6d6c4 ("MIPS: cpu-probe: Remove cp0 hazard barrier when enabling the FTLB")
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14023/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-09-29 18:59:49 +02:00
Paul Burton
ebd0e0f503 MIPS: Configure FTLB after probing TLB sizes from config4
On some cores (proAptiv, P5600) we make use of the sizes of the TLBs
to determine the desired FTLB:VTLB write ratio. However set_ftlb_enable
& thus calculate_ftlb_probability is called before decode_config4. This
results in us calculating a probability based on zero sizes, and we end
up setting FTLBP=3 for a 3:1 FTLB:VTLB write ratio in all cases. This
will make abysmal use of the available FTLB resources in the affected
cores.

Fix this by configuring the FTLB probability after having decoded
config4. However we do need to have enabled the FTLB before that point
such that fields in config4 actually reflect that an FTLB is present. So
set_ftlb_enable is now called twice, with flags indicating that it
should configure the write probability only the second time.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: cf0a8aa022 ("MIPS: cpu-probe: Set the FTLB probability bit on supported cores")
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14022/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-09-29 18:59:49 +02:00
Paul Burton
72c70f010d MIPS: Stop setting I6400 FTLBP
The FTLBP field in Config7 for the I6400 is intended as chicken bits for
debugging rather than as a field that software actually makes use of.
For best performance, FTLBP should be left at its default value of 0
with all TLB writes hitting the FTLB by default.

Additionally, since set_ftlb_enable is called from decode_configs before
decode_config4 which determines the size of the TLBs, this was
previously always setting FTLBP=3 for a 3:1 FTLB:VTLB write ratio which
makes abysmal use of the available FTLB resources.

This effectively reverts b0c4e1b79d8a ("MIPS: Set up FTLB probability
for I6400").

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: b0c4e1b79d8a ("MIPS: Set up FTLB probability for I6400")
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14021/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-09-29 18:59:49 +02:00
Ralf Baechle
3021773c7c MIPS: DEC: Avoid la pseudo-instruction in delay slots
When expanding the la or dla pseudo-instruction in a delay slot the GNU
assembler will complain should the pseudo-instruction expand to multiple
actual instructions, since only the first of them will be in the delay
slot leading to the pseudo-instruction being only partially executed if
the branch is taken. Use of PTR_LA in the dec int-handler.S leads to
such warnings:

  arch/mips/dec/int-handler.S: Assembler messages:
  arch/mips/dec/int-handler.S:149: Warning: macro instruction expanded into multiple instructions in a branch delay slot
  arch/mips/dec/int-handler.S:198: Warning: macro instruction expanded into multiple instructions in a branch delay slot

Avoid this by open coding the PTR_LA macros.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-09-29 18:59:49 +02:00
Steven J. Hill
0a90055371 MIPS: Octeon: mark GPIO controller node not populated after IRQ init.
We clear the OF_POPULATED flag for the GPIO controller node on Octeon
processors. Otherwise, none of the devices hanging on the GPIO lines
are probed. The 'gpio-leds' driver on OCTEON failed to probe in addition
to other devices on Cavium 71xx and 78xx development boards.

Fixes: 15cc2ed6dc ("of/irq: Mark initialised interrupt controllers as populated")
Signed-off-by: Steven J. Hill <steven.hill@cavium.com>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: David Daney <david.daney@cavium.com>
Cc: Rob Herring <robh@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: devicetree@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/14091/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-09-29 18:59:49 +02:00
Marcin Nowakowski
ca86c9ef2b MIPS: uprobes: fix use of uninitialised variable
arch_uprobe_pre_xol needs to emulate a branch if a branch instruction
has been replaced with a breakpoint, but in fact an uninitialised local
variable was passed to the emulator routine instead of the original
instruction

Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Fixes: 40e084a506 ('MIPS: Add uprobes support.')
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14300/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-09-29 18:59:49 +02:00
Marcin Nowakowski
ddabfa5c2e MIPS: uprobes: remove incorrect set_orig_insn
Generic kernel code implements a weak version of set_orig_insn that
moves cached 'insn' from arch_uprobe to the original code location when
the trap is removed.
MIPS variant used arch_uprobe->orig_inst which was never initialised
properly, so this code only inserted a nop instead of the original
instruction. With that change orig_inst can also be safely removed.

Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Fixes: 40e084a506 ('MIPS: Add uprobes support.')
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14299/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-09-29 18:59:49 +02:00
Marcin Nowakowski
db06068a4f MIPS: fix uretprobe implementation
arch_uretprobe_hijack_return_addr should replace the return address for
a call with a trampoline address.

Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
Fixes: 40e084a506 ('MIPS: Add uprobes support.')
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/14298/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-09-29 18:59:49 +02:00
Matt Redfearn
6ca8ac773e MIPS: smp-cps: Avoid BUG() when offlining pre-r6 CPUs
Commit 0d2808f338 ("MIPS: smp-cps: Add support for CPU hotplug of
MIPSr6 processors") added a call to mips_cm_lock_other in order to lock
the CPC in CPUs containing a version 3 or higher Coherence Manager,
which use the general CM core other register, where previous CMs had a
dedicated core other register for the CPC.

A kernel BUG() is triggered, however, if mips_cm_lock_other is called
with a VP other than 0 on a CPU with CM < 3, a condition introduced by
0d2808f338.

Avoid the BUG() by always locking VP0 when locking the CPC, since the
required register, cpc_stat_conf, is shared by all vps in a core.

Fixes: 0d2808f338 ("MIPS: smp-cps: Add support for CPU hotplug...)

Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Qais Yousef <qsyousef@gmail.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/14297/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-09-29 18:59:49 +02:00
Roger Quadros
d248220f04 ARM: 8617/1: dma: fix dma_max_pfn()
Since commit 6ce0d20016 ("ARM: dma: Use dma_pfn_offset for dma address translation"),
dma_to_pfn() already returns the PFN with the physical memory start offset
so we don't need to add it again.

This fixes USB mass storage lock-up problem on systems that can't do DMA
over the entire physical memory range (e.g.) Keystone 2 systems with 4GB RAM
can only do DMA over the first 2GB. [K2E-EVM].

What happens there is that without this patch SCSI layer sets a wrong
bounce buffer limit in scsi_calculate_bounce_limit() for the USB mass
storage device. dma_max_pfn() evaluates to 0x8fffff and bounce_limit
is set to 0x8fffff000 whereas maximum DMA'ble physical memory on Keystone 2
is 0x87fffffff. This results in non DMA'ble pages being given to the
USB controller and hence the lock-up.

NOTE: in the above case, USB-SCSI-device's dma_pfn_offset was showing as 0.
This should have really been 0x780000 as on K2e, LOWMEM_START is 0x80000000
and HIGHMEM_START is 0x800000000. DMA zone is 2GB so dma_max_pfn should be
0x87ffff. The incorrect dma_pfn_offset for the USB storage device is because
USB devices are not correctly inheriting the dma_pfn_offset from the
USB host controller. This will be fixed by a separate patch.

Fixes: 6ce0d20016 ("ARM: dma: Use dma_pfn_offset for dma address translation")
Cc: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Olof Johansson <olof@lixom.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Reported-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-09-29 16:57:44 +01:00
Robin Murphy
ba6dea4f7c ARM: 8616/1: dt: Respect property size when parsing CPUs
Whilst MPIDR values themselves are less than 32 bits, it is still
perfectly valid for a DT to have #address-cells > 1 in the CPUs node,
resulting in the "reg" property having leading zero cell(s). In that
situation, the big-endian nature of the data conspires with the current
behaviour of only reading the first cell to cause the kernel to think
all CPUs have ID 0, and become resoundingly unhappy as a consequence.

Take the full property length into account when parsing CPUs so as to
be correct under any circumstances.

Cc: Russell King <linux@armlinux.org.uk>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2016-09-29 16:57:43 +01:00
David S. Miller
2a0100d7be sparc64: Fix non-SMP build.
Need to provide a dummy smp_fill_in_cpu_possible_map.

Fixes: 9b2f753ec2 ("sparc64: Fix cpu_possible_mask if nr_cpus is set")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28 20:40:52 -04:00
Linus Torvalds
53061afee4 Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "4 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mem-hotplug: use nodes that contain memory as mask in new_node_page()
  scripts/recordmcount.c: account for .softirqentry.text
  dma-mapping.h: preserve unmap info for CONFIG_DMA_API_DEBUG
  mm,ksm: fix endless looping in allocating memory when ksm enable
2016-09-28 16:20:24 -07:00
Li Zhong
231e97e2b8 mem-hotplug: use nodes that contain memory as mask in new_node_page()
9bb627be47 ("mem-hotplug: don't clear the only node in new_node_page()")
prevents allocating from an empty nodemask, but as David points out, it is
still wrong.  As node_online_map may include memoryless nodes, only
allocating from these nodes is meaningless.

This patch uses node_states[N_MEMORY] mask to prevent the above case.

Fixes: 9bb627be47 ("mem-hotplug: don't clear the only node in new_node_page()")
Fixes: 394e31d2ce ("mem-hotplug: alloc new page from a nearest neighbor node when mem-offline")
Link: http://lkml.kernel.org/r/1474447117.28370.6.camel@TP420
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Suggested-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-28 16:19:02 -07:00
Dmitry Vyukov
e436fd61a8 scripts/recordmcount.c: account for .softirqentry.text
be7635e728 ("arch, ftrace: for KASAN put hard/soft IRQ entries into
separate sections") added .softirqentry.text section, but it was not added
to recordmcount.  So functions in the section are untracable.  Add the
section to scripts/recordmcount.c and scripts/recordmcount.pl.

Fixes: be7635e728 ("arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections")
Link: http://lkml.kernel.org/r/1474902626-73468-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Steve Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>	[4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-28 16:19:02 -07:00
Andrey Smirnov
2481366afd dma-mapping.h: preserve unmap info for CONFIG_DMA_API_DEBUG
When CONFIG_DMA_API_DEBUG is enabled we need to preserve unmapping address
even if "unmap" is a no-op for our architecutre because we need
debug_dma_unmap_page() to correctly cleanup all of the debug bookkeeping.
Failing to do so results in a false positive warnings about previously
mapped areas never being unmapped.

Link: http://lkml.kernel.org/r/1474387125-3713-1-git-send-email-andrew.smirnov@gmail.com
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: "Luis R. Rodriguez" <mcgrof@suse.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Geliang Tang <geliangtang@163.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-28 16:19:01 -07:00
zhong jiang
5b398e416e mm,ksm: fix endless looping in allocating memory when ksm enable
I hit the following hung task when runing a OOM LTP test case with 4.1
kernel.

Call trace:
[<ffffffc000086a88>] __switch_to+0x74/0x8c
[<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
[<ffffffc000a1c09c>] schedule+0x3c/0x94
[<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
[<ffffffc000a1e32c>] down_write+0x64/0x80
[<ffffffc00021f794>] __ksm_exit+0x90/0x19c
[<ffffffc0000be650>] mmput+0x118/0x11c
[<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
[<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
[<ffffffc0000d0f34>] get_signal+0x444/0x5e0
[<ffffffc000089fcc>] do_signal+0x1d8/0x450
[<ffffffc00008a35c>] do_notify_resume+0x70/0x78

The oom victim cannot terminate because it needs to take mmap_sem for
write while the lock is held by ksmd for read which loops in the page
allocator

ksm_do_scan
	scan_get_next_rmap_item
		down_read
		get_next_rmap_item
			alloc_rmap_item   #ksmd will loop permanently.

There is no way forward because the oom victim cannot release any memory
in 4.1 based kernel.  Since 4.6 we have the oom reaper which would solve
this problem because it would release the memory asynchronously.
Nevertheless we can relax alloc_rmap_item requirements and use
__GFP_NORETRY because the allocation failure is acceptable as ksm_do_scan
would just retry later after the lock got dropped.

Such a patch would be also easy to backport to older stable kernels which
do not have oom_reaper.

While we are at it add GFP_NOWARN so the admin doesn't have to be alarmed
by the allocation failure.

Link: http://lkml.kernel.org/r/1474165570-44398-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Suggested-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-28 16:19:01 -07:00
Linus Torvalds
ae6dd8d619 Another round of MTD fixes for v4.8
Davinci NAND: fix a long-standing bug in how we clear/prep 4-bit ECC
 
 OMAP NAND: an error-handling fix that made it into v4.8-rc1 caused
 error-handling cases in other configurations/code-paths; this fixes the fix
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJX7Bv2AAoJEFySrpd9RFgtgn4P/0kmLaMl9/mnU8yV7o09uifR
 pwDW1sd6TCG9htjsILkZ2s8gAHEeXqH/5aN0rRpIng4OFaVGyihCsBOAnV9YojXC
 SlUEKLVqLNQxwp9rfCv4fe6amoBWEzw5BZRkSn5SPYP30JTyxCT1vZIYu+Nw22c2
 H0YBRZKI8HnNapeScy9ccI+6Hvm9hUl33kFKuh/pNtHq16ocQvGeNNic8avW6ALr
 hmmRBR0lmzBgRmJiykEe/xmjRjsFhB2Tb6GoxanTUOgcPIje5J4ly7YP8dfeXXIW
 WDXG8wKaHrP5Igh0gLZlPTmjbdd6FAh/qQQkByPxhjMlu+OLZWwDzTw/F9ZvVmZa
 ekcnH4UzYIKez4DFvZ2c0y5z4S64Qy2ajFJ28/3KNTATyBVtvUriGO2hafDUoGOu
 6P0ZeAKr3iIHddFVFZaHb2lbNAhi+3JBv93LXOxMHyu4xroSIw1Sr2NdkBRs2t5J
 +SRmdAjW612oVO7h6L2jEjAoWZMX2rn/ovNSp7AoQA+72BHTkOpGAv9KVI5cZezE
 +08DhatIQdyP4Y0r5w6aMJLBiia33ofe+4hPiWlbZfyt0HkPuXYXDlxNXXyQKEcy
 Vy2Jq0kB+hMMzYis+1lhasrvMBe67ykCKud8wSSE4L+kM8SMvibh9Hw4aCQVKKkb
 fIpTwk2WzsRyMB+8+0d8
 =coFg
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20160928' of git://git.infradead.org/linux-mtd

Pull late MTD fixes from Brian Norris:
 "Another round of MTD fixes for v4.8

  My apologies for sending this so late.  I've been fairly absent as a
  maintainer this cycle, but I did queue these up weeks ago.  In the
  meantime, Richard was able to handle some other fixes (thanks!) but
  didn't pick these up.

  On the bright side, these are very simple changes that should carry
  little risk.

  Summary:

   - Davinci NAND: fix a long-standing bug in how we clear/prep 4-bit ECC

   - OMAP NAND: an error-handling fix that made it into v4.8-rc1 caused
     error-handling cases in other configurations/code-paths; this fixes
     the fix"

* tag 'for-linus-20160928' of git://git.infradead.org/linux-mtd:
  mtd: nand: davinci: Reinitialize the HW ECC engine in 4bit hwctl
  mtd: nand: omap2: Don't call dma_release_channel() if dma_request_chan() failed
2016-09-28 12:53:08 -07:00
Mark Fasheh
0a966fa891 MAINTAINERS: Update my e-mail
I will be starting employment at Versity next week and would like to update
my MAINTAINERS e-mail to reflect that change. My versity e-mail is already
activated so I shouldn't get any bounces on the new one. My ability to help
with Ocfs2 kernel maintenance won't change as a result of the new job.

Signed-off-by: Mark Fasheh <mfasheh@versity.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-28 12:52:05 -07:00
Atish Patra
ebb99a4c12 sparc64: Fix irq stack bootmem allocation.
Currently, irq stack bootmem is allocated for all possible cpus
before nr_cpus value changes the list of possible cpus. As a result,
there is unnecessary wastage of bootmemory.

Move the irq stack bootmem allocation so that it happens after
possible cpu list is modified based on nr_cpus value.

Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28 08:24:03 -07:00
Atish Patra
9b2f753ec2 sparc64: Fix cpu_possible_mask if nr_cpus is set
If kernel boot parameter nr_cpus is set, it should define the number
of CPUs that can ever be available in the system i.e.
cpu_possible_mask. setup_nr_cpu_ids() overrides the nr_cpu_ids based
on the cpu_possible_mask during kernel initialization. If
cpu_possible_mask is not set based on the nr_cpus value, earlier part
of the kernel would be initialized using nr_cpus value leading to a
kernel crash.

Set cpu_possible_mask based on nr_cpus value. Thus setup_nr_cpu_ids()
becomes redundant and does not corrupt nr_cpu_ids value.

Signed-off-by: Atish Patra <atish.patra@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28 08:24:02 -07:00
Mike Kravetz
1e953d846a sparc64 mm: Fix more TSB sizing issues
Commit af1b1a9b36 ("sparc64 mm: Fix base TSB sizing when hugetlb
pages are used") addressed the difference between hugetlb and THP
pages when computing TSB sizes.  The following additional issues
were also discovered while working with the code.

In order to save memory, THP makes use of a huge zero page.  This huge
zero page does not count against a task's RSS, but it does consume TSB
entries.  This is similar to hugetlb pages.  Therefore, count huge
zero page entries in hugetlb_pte_count.

Accounting of THP pages is done in the routine set_pmd_at().
Unfortunately, this does not catch the case where a THP page is split.
To handle this case, decrement the count in pmdp_invalidate().
pmdp_invalidate is only called when splitting a THP.  However, 'sanity
checks' are added in case it is ever called for other purposes.

A more general issue exists with HPAGE_SIZE accounting.
hugetlb_pte_count tracks the number of HPAGE_SIZE (8M) pages.  This
value is used to size the TSB for HPAGE_SIZE pages.  However,
each HPAGE_SIZE page consists of two REAL_HPAGE_SIZE (4M) pages.
The TSB contains an entry for each REAL_HPAGE_SIZE page.  Therefore,
the number of REAL_HPAGE_SIZE pages should be used to size the huge
page TSB.  A new compile time constant REAL_HPAGE_PER_HPAGE is used
to multiply hugetlb_pte_count before sizing the TSB.

Changes from V1
- Fixed build issue if hugetlb or THP not configured

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28 08:24:02 -07:00
Paul Gortmaker
bdf2f59e64 sparc64: fix section mismatch in find_numa_latencies_for_group
To fix:

  WARNING: vmlinux.o(.text.unlikely+0x580): Section mismatch in
  reference from the function find_numa_latencies_for_group() to the
  function .init.text:find_mlgroup()

  The function find_numa_latencies_for_group() references the
  function __init find_mlgroup().  This is often because
  find_numa_latencies_for_group lacks a __init annotation or the
  annotation of find_mlgroup is wrong.

It turns out find_numa_latencies_for_group is only called from:
    static int __init numa_parse_mdesc(void)
and hence we can tag find_numa_latencies_for_group with __init.

In doing so we see that find_best_numa_node_for_mlgroup is only
called from within __init and hence can also be marked with __init.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Nitin Gupta <nitin.m.gupta@oracle.com>
Cc: Chris Hyser <chris.hyser@oracle.com>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: sparclinux@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-28 08:24:02 -07:00
James Bottomley
539294b76a Merge remote-tracking branch 'mkp-scsi/4.8/scsi-fixes' into fixes 2016-09-27 22:30:51 -07:00
David Herrmann
90fd68dcf9 drm/udl: fix line iterator in damage handling
The udl damage handler is supposed to render 'height' lines, but its
iterator has an obvious typo that makes it miss most lines if the
rectangle does not cover 0/0.

Fix the damage handler to correctly render all lines.

This is a fallout from:

    commit e375882406
    Author: Noralf Trønnes <noralf@tronnes.org>
    Date:   Thu Apr 28 17:18:37 2016 +0200

        drm/udl: Use drm_fb_helper deferred_io support

Tested-by: poma <poma@gmail.com>
Cc: stable@vger.kernel.org # 4.7+
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-09-28 13:29:18 +10:00
Dave Airlie
aaee1d1e2d Merge branch 'linux-4.8' of git://github.com/skeggsb/linux into drm-fixes
nouveau: couple of fixes.

* 'linux-4.8' of git://github.com/skeggsb/linux:
  drm/nouveau: Revert "bus: remove cpu_coherent flag"
  drm/nouveau/fifo/nv04: avoid ramht race against cookie insertion
2016-09-28 10:23:50 +10:00
Dave Airlie
b86f9faa34 Merge branch 'drm-fixes-4.8' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
two amd fixes.

* 'drm-fixes-4.8' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon/si/dpm: add workaround for for Jet parts
  drm/amdgpu: disable CRTCs before teardown
2016-09-28 10:19:35 +10:00
Linus Torvalds
8ab293e3a1 Merge branch 'for-4.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "Three late fixes for cgroup: Two cpuset ones, one trivial and the
  other pretty obscure, and a cgroup core fix for a bug which impacts
  cgroup v2 namespace users"

* 'for-4.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix invalid controller enable rejections with cgroup namespace
  cpuset: fix non static symbol warning
  cpuset: handle race between CPU hotplug and cpuset_hotplug_work
2016-09-27 16:43:11 -07:00
Alex Deucher
670bb4fd21 drm/radeon/si/dpm: add workaround for for Jet parts
Add clock quirks for Jet parts.

Reviewed-by: Sonny Jiang <sonny.jiang@amd.com>
Tested-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2016-09-27 12:22:29 -04:00
Grazvydas Ignotas
a951ed85ab drm/amdgpu: disable CRTCs before teardown
Some code called by drm_crtc_force_disable_all() wants to wait for all
fences, so only do fence teardown after CRTCs are disabled.

Fixes: 84b89bdced ("drm/amdgpu: Turn off CRTCs on driver unload")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2016-09-27 12:22:21 -04:00
David S. Miller
7b8147aae7 Merge branch 'act_ife-fixes'
Yotam Gigi says:

====================
Fix tc-ife bugs

This patch-set contains two bugfixes in the tc-ife action, one fixing some
random behaviour in encode side, and one fixing the decode side packet
parsing logic.

v2->v3
 - Fix the encode side instead of the decode side
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27 09:53:30 -04:00
Yotam Gigi
c006da0be0 act_ife: Fix false encoding
On ife encode side, the action stores the different tlvs inside the ife
header, where each tlv length field should refer to the length of the
whole tlv (without additional padding) and not just the data length.

On ife decode side, the action iterates over the tlvs in the ife header
and parses them one by one, where in each iteration the current pointer is
advanced according to the tlv size.

Before, the encoding encoded only the data length inside the tlv, which led
to false parsing of ife the header. In addition, due to the fact that the
loop counter was unsigned, it could lead to infinite parsing loop.

This fix changes the loop counter to be signed and fixes the encoding to
take into account the tlv type and size.

Fixes: 28a10c426e ("net sched: fix encoding to use real length")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27 09:53:17 -04:00
Yotam Gigi
4b1d488a28 act_ife: Fix external mac header on encode
On ife encode side, external mac header is copied from the original packet
and may be overridden if the user requests. Before, the mac header copy
was done from memory region that might not be accessible anymore, as
skb_cow_head might free it and copy the packet. This led to random values
in the external mac header once the values were not set by user.

This fix takes the internal mac header from the packet, after the call to
skb_cow_head.

Fixes: ef6980b6be ("net sched: introduce IFE action")
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27 09:53:16 -04:00
Jorgen Hansen
1190cfdb1a VSOCK: Don't dec ack backlog twice for rejected connections
If a pending socket is marked as rejected, we will decrease the
sk_ack_backlog twice. So don't decrement it for rejected sockets
in vsock_pending_work().

Testing of the rejected socket path was done through code
modifications.

Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27 07:59:25 -04:00
Florian Fainelli
bf1a85a838 Revert "net: ethernet: bcmgenet: use phydev from struct net_device"
This reverts commit 62469c7600 ("net: ethernet: bcmgenet: use phydev
from struct net_device") because it causes GENETv1/2/3 adapters to
expose the following behavior after an ifconfig down/up sequence:

PING fainelli-linux (10.112.156.244): 56 data bytes
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.352 ms
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.472 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.496 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.517 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.536 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.557 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=752.448 ms (DUP!)

This was previously fixed by commit 5dbebbb44a ("net: bcmgenet:
Software reset EPHY after power on") but the commit we are reverting was
essentially making this previous commit void, here is why.

Without commit 62469c7600 we would have the following scenario after
an ifconfig down then up sequence:

- bcmgenet_open() calls bcmgenet_power_up() to make sure the PHY is
  initialized *before* we get to initialize the UniMAC, this is
  critical to ensure the PHY is in a correct state, priv->phydev is
  valid, this code executes fine

- second time from bcmgenet_mii_probe(), through the normal
  phy_init_hw() call (which arguably could be optimized out)

Everything is fine in that case. With commit 62469c7600, we would have
the following scenario to happen after an ifconfig down then up
sequence:

- bcmgenet_close() calls phy_disonnect() which makes dev->phydev become
  NULL

- when bcmgenet_open() executes again and calls bcmgenet_mii_reset() from
  bcmgenet_power_up() to initialize the internal PHY, the NULL check
  becomes true, so we do not reset the PHY, yet we keep going on and
  initialize the UniMAC, causing MAC activity to occur

- we call bcmgenet_mii_reset() from bcmgenet_mii_probe(), but this is
  too late, the PHY is botched, and causes the above bogus pings/packets
  transmission/reception to occur

Reported-by: Jaedon Shin <jaedon.shin@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27 07:41:12 -04:00
David S. Miller
6c1394f30b Merge branch 'fec-align'
Eric Nelson says:

====================
net: fec: updates to align IP header

This patch series is the outcome of investigation into very high
numbers of alignment faults on kernel 4.1.33 from the linux-fslc
tree:
    https://github.com/freescale/linux-fslc/tree/4.1-1.0.x-imx

The first two patches remove support for the receive accelerator (RACC) from
the i.MX25 and i.MX27 SoCs which don't support the function.

The third patch enables hardware alignment of the ethernet packet payload
(and especially the IP header) to prevent alignment faults in the IP stack.

Testing on i.MX6UL on the 4.1.33 kernel showed that this patch removed
on the order of 70k alignment faults during a 100MiB transfer using
wget.

Testing on an i.MX6Q (SABRE Lite) board on net-next (4.8.0-rc7) showed
a much more modest improvement from 10's of faults, and it's not clear
why that's the case.
====================

Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27 07:40:19 -04:00
Eric Nelson
3ac72b7b63 net: fec: align IP header in hardware
The FEC receive accelerator (RACC) supports shifting the data payload of
received packets by 16-bits, which aligns the payload (IP header) on a
4-byte boundary, which is, if not required, at least strongly suggested
by the Linux networking layer.

Without this patch, a huge number of alignment faults will be taken by the
IP stack, as seen in /proc/cpu/alignment:

	~/$ cat /proc/cpu/alignment
	User:		0
	System:		72645 (inet_gro_receive+0x104/0x27c)
	Skipped:	0
	Half:		0
	Word:		0
	DWord:		0
	Multi:		72645
	User faults:	3 (fixup+warn)

This patch was suggested by Andrew Lunn in this message to linux-netdev:
	http://marc.info/?l=linux-arm-kernel&m=147465452108384&w=2

and adapted from a patch by Russell King from 2014:
	http://git.arm.linux.org.uk/cgit/linux-arm.git/commit/?id=70d8a8a

Signed-off-by: Eric Nelson <eric@nelint.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27 07:39:34 -04:00
Eric Nelson
97dc499c1a net: fec: remove QUIRK_HAS_RACC from i.mx27
According to the i.MX27 reference manual, this SoC does not have support
for the receive accelerator (RACC) register at offset 0x1C4.

	http://cache.nxp.com/files/32bit/doc/ref_manual/MCIMX27RM.pdf

Signed-off-by: Eric Nelson <eric@nelint.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27 07:39:34 -04:00
Eric Nelson
653d37d8bc net: fec: remove QUIRK_HAS_RACC from i.mx25
According to the i.MX25 reference manual, this SoC does not have support
for the receive accelerator (RACC) register at offset 0x1C4.

http://www.nxp.com/files/dsp/doc/ref_manual/IMX25RM.pdf

Signed-off-by: Eric Nelson <eric@nelint.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-27 07:39:33 -04:00
Ville Ranki
9fb6de1b0b Input: joydev - recognize devices with Z axis as joysticks
Current implementation of joydev's input_device_id table recognizes only
devices with ABS_X, ABS_WHEEL or ABS_THROTTLE axes as joysticks.

There are joystick devices that do not have those axes, for example TRC
Rudder device. The device in question has ABS_Z, ABS_RX and ABS_RY axes
causing it not being detected as joystick.

This patch adds ABS_Z to the input_device_id list allowing devices with
ABS_Z axis to be detected correctly.

Signed-off-by: Ville Ranki <ville.ranki@iki.fi>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2016-09-26 20:03:06 -07:00
Bart Van Assche
8d58881b99 scsi: Avoid that toggling use_blk_mq triggers a memory leak
This patch avoids that the following memory leak is triggered if
use_blk_mq is disabled after a SCSI host has been allocated by the
ib_srp driver and before the same SCSI host is freed:

unreferenced object 0xffff8803a168c568 (size 256):
  backtrace:
    [<ffffffff81620c95>] kmemleak_alloc+0x45/0xa0
    [<ffffffff811bb104>] __kmalloc_node+0x1e4/0x400
    [<ffffffff81309fe4>] blk_mq_alloc_tag_set+0xb4/0x230
    [<ffffffff814731b7>] scsi_mq_setup_tags+0xc7/0xd0
    [<ffffffff81469c26>] scsi_add_host_with_dma+0x216/0x2d0
    [<ffffffffa064bef5>] srp_create_target+0xe55/0x13d0 [ib_srp]
    [<ffffffff8143ce23>] dev_attr_store+0x13/0x20
    [<ffffffff8125f030>] sysfs_kf_write+0x40/0x50
    [<ffffffff8125e397>] kernfs_fop_write+0x137/0x1c0
    [<ffffffff811d8c13>] __vfs_write+0x23/0x140
    [<ffffffff811d92e0>] vfs_write+0xb0/0x190
    [<ffffffff811da5b4>] SyS_write+0x44/0xa0
    [<ffffffff8162c8a5>] entry_SYSCALL_64_fastpath+0x18/0xa8

Fixes: 9aa9cc4221 ("scsi: remove the disable_blk_mq host flag")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2016-09-26 20:58:42 -04:00
Nikolay Aleksandrov
2cf750704b ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route
Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
instead of the previous dst_pid which was copied from in_skb's portid.
Since the skb is new the portid is 0 at that point so the packets are sent
to the kernel and we get scheduling while atomic or a deadlock (depending
on where it happens) by trying to acquire rtnl two times.
Also since this is RTM_GETROUTE, it can be triggered by a normal user.

Here's the sleeping while atomic trace:
[ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
[ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
[ 7858.212881] 2 locks held by swapper/0/0:
[ 7858.213013]  #0:  (((&mrt->ipmr_expire_timer))){+.-...}, at: [<ffffffff810fbbf5>] call_timer_fn+0x5/0x350
[ 7858.213422]  #1:  (mfc_unres_lock){+.....}, at: [<ffffffff8161e005>] ipmr_expire_process+0x25/0x130
[ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
[ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 7858.214108]  0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
[ 7858.214412]  ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
[ 7858.214716]  000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
[ 7858.215251] Call Trace:
[ 7858.215412]  <IRQ>  [<ffffffff813a7804>] dump_stack+0x85/0xc1
[ 7858.215662]  [<ffffffff810a4a72>] ___might_sleep+0x192/0x250
[ 7858.215868]  [<ffffffff810a4b9f>] __might_sleep+0x6f/0x100
[ 7858.216072]  [<ffffffff8165bea3>] mutex_lock_nested+0x33/0x4d0
[ 7858.216279]  [<ffffffff815a7a5f>] ? netlink_lookup+0x25f/0x460
[ 7858.216487]  [<ffffffff8157474b>] rtnetlink_rcv+0x1b/0x40
[ 7858.216687]  [<ffffffff815a9a0c>] netlink_unicast+0x19c/0x260
[ 7858.216900]  [<ffffffff81573c70>] rtnl_unicast+0x20/0x30
[ 7858.217128]  [<ffffffff8161cd39>] ipmr_destroy_unres+0xa9/0xf0
[ 7858.217351]  [<ffffffff8161e06f>] ipmr_expire_process+0x8f/0x130
[ 7858.217581]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
[ 7858.217785]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
[ 7858.217990]  [<ffffffff810fbc95>] call_timer_fn+0xa5/0x350
[ 7858.218192]  [<ffffffff810fbbf5>] ? call_timer_fn+0x5/0x350
[ 7858.218415]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
[ 7858.218656]  [<ffffffff810fde10>] run_timer_softirq+0x260/0x640
[ 7858.218865]  [<ffffffff8166379b>] ? __do_softirq+0xbb/0x54f
[ 7858.219068]  [<ffffffff816637c8>] __do_softirq+0xe8/0x54f
[ 7858.219269]  [<ffffffff8107a948>] irq_exit+0xb8/0xc0
[ 7858.219463]  [<ffffffff81663452>] smp_apic_timer_interrupt+0x42/0x50
[ 7858.219678]  [<ffffffff816625bc>] apic_timer_interrupt+0x8c/0xa0
[ 7858.219897]  <EOI>  [<ffffffff81055f16>] ? native_safe_halt+0x6/0x10
[ 7858.220165]  [<ffffffff810d64dd>] ? trace_hardirqs_on+0xd/0x10
[ 7858.220373]  [<ffffffff810298e3>] default_idle+0x23/0x190
[ 7858.220574]  [<ffffffff8102a20f>] arch_cpu_idle+0xf/0x20
[ 7858.220790]  [<ffffffff810c9f8c>] default_idle_call+0x4c/0x60
[ 7858.221016]  [<ffffffff810ca33b>] cpu_startup_entry+0x39b/0x4d0
[ 7858.221257]  [<ffffffff8164f995>] rest_init+0x135/0x140
[ 7858.221469]  [<ffffffff81f83014>] start_kernel+0x50e/0x51b
[ 7858.221670]  [<ffffffff81f82120>] ? early_idt_handler_array+0x120/0x120
[ 7858.221894]  [<ffffffff81f8243f>] x86_64_start_reservations+0x2a/0x2c
[ 7858.222113]  [<ffffffff81f8257c>] x86_64_start_kernel+0x13b/0x14a

Fixes: 2942e90050 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-25 23:41:39 -04:00
Linus Torvalds
08895a8b6b Linux 4.8-rc8 2016-09-25 18:47:13 -07:00
Linus Torvalds
4c04b4b534 Al Viro has been looking at the tracefs code, and has pointed out
some issues. This contains one fix by me and one by Al. I'm sure that
 he'll come up with more but for now I tested these patches and they
 don't appear to have any negative impact on tracing.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJX6FvrAAoJEKKk/i67LK/8EuIH/Arf6vJidYsmbe57WQp8PU3I
 bldem6ePj6zgZ2ZqPlSGCs1J2DcK4Bh3lPVxdx7rRKVWSd/Zoj+i83hvObusR8M7
 Qs1G92bJTvvVO3aPfiN0GvKGdKfGn45L+j0BcBauiTRKqnj3PkhOhIP2/ks0ewSk
 qeq7R3xxo/FDs26AHS69Hm0PIIw7btyhXNX4GB3Il7IIA5/nUknw3C+bjVj86tYX
 R4iElcHEhplgoSjKuLgNIRZGUnEFtsm/fnohYXpHacLTUKNXnTDY230x/OKc1yyB
 1vOfHS/y5s3XSJ1lcgSjYeNc51lK8NiDASaptZSUnOookKSAooUTFELNzpbc0sg=
 =+Fr3
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracefs fixes from Steven Rostedt:
 "Al Viro has been looking at the tracefs code, and has pointed out some
  issues.  This contains one fix by me and one by Al.  I'm sure that
  he'll come up with more but for now I tested these patches and they
  don't appear to have any negative impact on tracing"

* tag 'trace-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  fix memory leaks in tracing_buffers_splice_read()
  tracing: Move mutex to protect against resetting of seq data
2016-09-25 18:40:13 -07:00
Dave Chinner
90b75db649 fault_in_multipages_readable() throws set-but-unused error
When building XFS with -Werror, it now fails with:

  include/linux/pagemap.h: In function 'fault_in_multipages_readable':
  include/linux/pagemap.h:602:16: error: variable 'c' set but not used [-Werror=unused-but-set-variable]
    volatile char c;
                  ^

This is a regression caused by commit e23d4159b1 ("fix
fault_in_multipages_...() on architectures with no-op access_ok()").
Fix it by re-adding the "(void)c" trick taht was previously used to make
the compiler think the variable is used.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-25 18:16:44 -07:00