Commit 4c3b89effc ("powerpc/powernv: Add sanity checks to
pnv_pci_get_{gpu|npu}_dev") introduced explicit warnings in
pnv_pci_get_npu_dev() when a PCIe device has no associated device-tree
node. However not all PCIe devices have an of_node and
pnv_pci_get_npu_dev() gets indirectly called at least once for every
PCIe device in the system. This results in spurious WARN_ON()'s so
remove it.
The same situation should not exist for pnv_pci_get_gpu_dev() as any
NPU based PCIe device requires a device-tree node.
Fixes: 4c3b89effc ("powerpc/powernv: Add sanity checks to pnv_pci_get_{gpu|npu}_dev")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The PPC_DT_CPU_FTRs is a bit misplaced in menuconfig, it shows up with
other general kernel options. It's really more at home in the "Platform
Support" section, so move it there.
Also enable it by default, for Book3s 64. It does mostly nothing unless
the device tree properties are found, and we will want it enabled
eventually in distro kernels, so turn it on to start getting more
testing.
Fixes: 5a61ef74f2 ("powerpc/64s: Support new device tree binding for discovering CPU features")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Supporting 512TB requires us to do a order 3 allocation for level 1 page
table (pgd). This results in page allocation failures with certain workloads.
For now limit 4k linux page size config to 64TB.
Fixes: f6eedbba7a ("powerpc/mm/hash: Increase VA range to 128TB")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fix error path if we can't copy user structure on CXL_IOCTL_START_WORK
ioctl. We shouldn't unlock the context status mutex as it was not
locked (yet).
Fixes: 0712dc7e73 ("cxl: Fix issues when unmapping contexts")
Cc: stable@vger.kernel.org # v3.19+
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 8d911904f3 ('powerpc/perf: Add restrictions to PMC5 in power9 DD1')
was added to restrict the use of PMC5 in Power9 DD1. Intention was to disable
the use of PMC5 using raw event code. But instead of updating the
power9_isa207_pmu structure (used on DD1), the commit incorrectly updated the
power9_pmu structure. Fix it.
Fixes: 8d911904f3 ("powerpc/perf: Add restrictions to PMC5 in power9 DD1")
Reported-by: Shriya <shriyak@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Tested-by: Shriya <shriyak@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In commit 8c27226119 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
switched to the generic implementation of cpu_to_node(), which uses a percpu
variable to hold the NUMA node for each CPU.
Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
of our percpu areas, leading to a chicken and egg problem. In practice what
happens is when we are setting up the percpu areas, cpu_to_node() reports that
all CPUs are on node 0, so we allocate all percpu areas on node 0.
This is visible in the dmesg output, as all pcpu allocs being in group 0:
pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47
To fix it we need an early_cpu_to_node() which can run prior to percpu being
setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
in. With the patch dmesg output shows two groups, 0 and 1:
pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47
We can also check the data_offset in the paca of various CPUs, with the fix we
see:
CPU 0: data_offset = 0x0ffe8b0000
CPU 24: data_offset = 0x1ffe5b0000
And we can see from dmesg that CPU 24 has an allocation on node 1:
node 0: [mem 0x0000000000000000-0x0000000fffffffff]
node 1: [mem 0x0000001000000000-0x0000001fffffffff]
Cc: stable@vger.kernel.org # v3.16+
Fixes: 8c27226119 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
During an eeh call to cxl_remove can result in double free_irq of
psl,slice interrupts. This can happen if perst_reloads_same_image == 1
and call to cxl_configure_adapter() fails during slot_reset
callback. In such a case we see a kernel oops with following back-trace:
Oops: Kernel access of bad area, sig: 11 [#1]
Call Trace:
free_irq+0x88/0xd0 (unreliable)
cxl_unmap_irq+0x20/0x40 [cxl]
cxl_native_release_psl_irq+0x78/0xd8 [cxl]
pci_deconfigure_afu+0xac/0x110 [cxl]
cxl_remove+0x104/0x210 [cxl]
pci_device_remove+0x6c/0x110
device_release_driver_internal+0x204/0x2e0
pci_stop_bus_device+0xa0/0xd0
pci_stop_and_remove_bus_device+0x28/0x40
pci_hp_remove_devices+0xb0/0x150
pci_hp_remove_devices+0x68/0x150
eeh_handle_normal_event+0x140/0x580
eeh_handle_event+0x174/0x360
eeh_event_handler+0x1e8/0x1f0
This patch fixes the issue of double free_irq by checking that
variables that hold the virqs (err_hwirq, serr_hwirq, psl_virq) are
not '0' before un-mapping and resetting these variables to '0' when
they are un-mapped.
Cc: stable@vger.kernel.org
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently tsk->thread.load_tm is not initialized in the task creation
and can contain garbage on a new task.
This is an undesired behaviour, since it affects the timing to enable
and disable the transactional memory laziness (disabling and enabling
the MSR TM bit, which affects TM reclaim and recheckpoint in the
scheduling process).
Fixes: 5d176f751e ("powerpc: tm: Enable transactional memory (TM) lazily for userspace")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently tsk->thread->load_vec and load_fp are not initialized during
task creation, which can lead to garbage values in these variables (non-zero
values).
These variables will be checked later in restore_math() to validate if the
FP and vector registers are being utilized. Since these values might be
non-zero, the restore_math() will continue to save the FP and vectors even if
they were never utilized by the userspace application. load_fp and load_vec
counters will then overflow (they wrap at 255) and the FP and Altivec will be
finally disabled, but before that condition is reached (counter overflow)
several context switches will have restored FP and vector registers without
need, causing a performance degradation.
Fixes: 70fe3d980f ("powerpc: Restore FPU/VEC/VSX if previously used")
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Gustavo Romero <gusbromero@gmail.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We are running low on CPU feature bits, so we only want to use them when
it's really necessary.
CPU_FTR_SUBCORE is only used in one place, and only in C, so we don't
need it in order to make asm patching work. It can only be set on
"Power8" CPUs, which in practice means POWER8, POWER8E and POWER8NVL.
There are no plans to implement it on future CPUs, but if there ever
were we could retrofit it then.
Although KVM uses subcores, it never looks at the CPU feature, it either
looks at the ISA level or the threads_per_subcore value.
So drop the CPU feature and do a PVR check instead. Drop the device tree
"subcore" feature as we no longer support doing anything with it, and we
will drop it from skiboot too.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When adding or removing memory, the aa_index (affinity value) for the
memblock must also be converted to match the endianness of the rest
of the 'ibm,dynamic-memory' property. Otherwise, subsequent retrieval
of the attribute will likely lead to non-existent nodes, followed by
using the default node in the code inappropriately.
Fixes: 5f97b2a0d1 ("powerpc/pseries: Implement memory hotplug add in the kernel")
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If a process dumps core while it has SPU contexts active then we have
code to also dump information about the SPU contexts.
Unfortunately it's been broken for 3 1/2 years, and we didn't notice. In
commit 7b1f4020d0 ("spufs: get rid of dump_emit() wrappers") the nread
variable was removed and rc used instead. That means when the loop exits
successfully, rc has the number of bytes read, but it's then used as the
return value for the function, which should return 0 on success.
So fix it by setting rc = 0 before returning in the success case.
Fixes: 7b1f4020d0 ("spufs: get rid of dump_emit() wrappers")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Provide a dt_cpu_ftrs= cmdline option to disable the dt_cpu_ftrs CPU
feature discovery, and fall back to the "cputable" based version.
Also allow control of advertising unknown features to userspace and
with this parameter, and remove the clunky CONFIG option.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add explicit early check of bootargs in dt_cpu_ftrs_init()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Providing "scv" support to userspace requires kernel support, so it
must be advertised as independently to the base ISA 3 instruction set.
The darn instruction relies on firmware enablement, so it has been
decided to split this out from the core ISA 3 feature as well.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit ac29c64089 ("powerpc/mm: Replace _PAGE_USER with
_PAGE_PRIVILEGED") swapped _PAGE_USER for _PAGE_PRIVILEGED, and
introduced check_pte_access() which denied kernel access to
non-_PAGE_PRIVILEGED pages.
However, it didn't add _PAGE_PRIVILEGED to the hash fault handler
for spufs' kernel accesses, so the DMAs required to establish SPE
memory no longer work.
This change adds _PAGE_PRIVILEGED to the hash fault handler for
kernel accesses.
Fixes: ac29c64089 ("powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Reported-by: Sombat Tragolgosol <sombat3960@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently if you disable CONFIG_PPC_RADIX_MMU you'll crash on boot on
a P9. This is because we still set MMU_FTR_TYPE_RADIX via
ibm,pa-features and MMU_FTR_TYPE_RADIX is what's used for code patching
in much of the asm code (ie. slb_miss_realmode)
This patch fixes the problem by stopping MMU_FTR_TYPE_RADIX from being
set from ibm.pa-features.
We may eventually end up removing the CONFIG_PPC_RADIX_MMU option
completely but until then this fixes the issue.
Fixes: 17a3dd2f5f ("powerpc/mm/radix: Use firmware feature to enable Radix MMU")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
opal_npu_destroy_context() should be called with the NPU PHB, not the
PCIe PHB.
Fixes: 1ab66d1fba ("powerpc/powernv: Introduce address translation services for Nvlink2")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The tm-resched-dscr test has started failing sometimes, depending on
what compiler it's built with, eg:
test: tm_resched_dscr
Check DSCR TM context switch: tm-resched-dscr: tm-resched-dscr.c:76: test_body: Assertion `rv' failed.
!! child died by signal 6
When it fails we see that the compiler doesn't initialise rv to 1 before
entering the inline asm block. Although that's counter intuitive, it
is allowed because we tell the compiler that the inline asm will write
to rv (using "=r"), meaning the original value is irrelevant.
Marking it as a read/write parameter would presumably work, but it seems
simpler to fix it by setting the initial value of rv in the inline asm.
Fixes: 96d0161086 ("powerpc: Correct DSCR during TM context switch")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Michael Neuling <mikey@neuling.org>
virt_addr_valid() is supposed to tell you if it's OK to call virt_to_page() on
an address. What this means in practice is that it should only return true for
addresses in the linear mapping which are backed by a valid PFN.
We are failing to properly check that the address is in the linear mapping,
because virt_to_pfn() will return a valid looking PFN for more or less any
address. That bug is actually caused by __pa(), used in virt_to_pfn().
eg: __pa(0xc000000000010000) = 0x10000 # Good
__pa(0xd000000000010000) = 0x10000 # Bad!
__pa(0x0000000000010000) = 0x10000 # Bad!
This started happening after commit bdbc29c19b ("powerpc: Work around gcc
miscompilation of __pa() on 64-bit") (Aug 2013), where we changed the definition
of __pa() to work around a GCC bug. Prior to that we subtracted PAGE_OFFSET from
the value passed to __pa(), meaning __pa() of a 0xd or 0x0 address would give
you something bogus back.
Until we can verify if that GCC bug is no longer an issue, or come up with
another solution, this commit does the minimal fix to make virt_addr_valid()
work, by explicitly checking that the address is in the linear mapping region.
Fixes: bdbc29c19b ("powerpc: Work around gcc miscompilation of __pa() on 64-bit")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Tested-by: Breno Leitao <breno.leitao@gmail.com>
The page table dump code doesn't know about huge pages, so currently
it crashes (or walks random memory, usually leading to a crash), if it
finds a huge page. On Book3S we only see huge pages in the Linux page
tables when we're using the P9 Radix MMU.
Teaching the code to properly handle huge pages is a bit more involved,
so for now just prevent the crash.
Cc: stable@vger.kernel.org # v4.10+
Fixes: 8eb07b1870 ("powerpc/mm: Dump linux pagetables")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 22d8b3dec2 ("powerpc/kprobes: Emulate instructions on kprobe
handler re-entry") enabled emulating instructions on kprobe re-entry,
rather than single-stepping always. However, we didn't update the single
stepping code to only be run if the emulation fails. Also, we missed
re-enabling preemption if the instruction emulation was successful. Fix
those issues.
Fixes: 22d8b3dec2 ("powerpc/kprobes: Emulate instructions on kprobe handler re-entry")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 17ed4c8f81 ("powerpc/powernv: Recover correct PACA on wakeup
from a stop on P9 DD1") promises to set the NAPSTATELOST bit in paca
after recovering the correct paca for the thread waking up from stop1
on DD1, so that the GPRs can be correctly restored on the stop exit
path. However, it loads the value 1 into r3, but stores the value in
r0 into NAPSTATELOST(r13).
Fix this by correctly set the NAPSTATELOST bit in paca after
recovering the paca on POWER9 DD1.
Fixes: 17ed4c8f81 ("powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1")
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Test that the VMX checkpointed register state is maintained when a VMX
unavailable exception is taken during a transaction.
Thanks to Breno Leitao <brenohl@br.ibm.com> and
Gustavo Bueno Romero <gromero@br.ibm.com> for the original test this
is based heavily on.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
[mpe: Add to .gitignore, always build it 64-bit to fix build errors]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In commit dc3106690b ("powerpc: tm: Always use fp_state and vr_state
to store live registers"), a section of code was removed that copied
the current state to checkpointed state. That code should not have been
removed.
When an FP (Floating Point) unavailable is taken inside a transaction,
we need to abort the transaction. This is because at the time of the
tbegin, the FP state is bogus so the state stored in the checkpointed
registers is incorrect. To fix this, we treclaim (to get the
checkpointed GPRs) and then copy the thread_struct FP live state into
the checkpointed state. We then trecheckpoint so that the FP state is
correctly restored into the CPU.
The copying of the FP registers from live to checkpointed is what was
missing.
This simplifies the logic slightly from the original patch.
tm_reclaim_thread() will now always write the checkpointed FP
state. Either the checkpointed FP state will be written as part of
the actual treclaim (in tm.S), or it'll be a copy of the live
state. Which one we use is based on MSR[FP] from userspace.
Similarly for VMX.
Fixes: dc3106690b ("powerpc: tm: Always use fp_state and vr_state to store live registers")
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: cyrilbur@gmail.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On powerpc we can build the kernel with two different ABIs for mcount(), which
is used by ftrace. Kernels built with one ABI do not know how to load modules
built with the other ABI. The new style ABI is called "mprofile-kernel", for
want of a better name.
Currently if we build a module using the old style ABI, and the kernel with
mprofile-kernel, when we load the module we'll oops something like:
# insmod autofs4-no-mprofile-kernel.ko
ftrace-powerpc: Unexpected instruction f8810028 around bl _mcount
------------[ cut here ]------------
WARNING: CPU: 6 PID: 3759 at ../kernel/trace/ftrace.c:2024 ftrace_bug+0x2b8/0x3c0
CPU: 6 PID: 3759 Comm: insmod Not tainted 4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74f269 #11
...
NIP [c0000000001eaa48] ftrace_bug+0x2b8/0x3c0
LR [c0000000001eaff8] ftrace_process_locs+0x4a8/0x590
Call Trace:
alloc_pages_current+0xc4/0x1d0 (unreliable)
ftrace_process_locs+0x4a8/0x590
load_module+0x1c8c/0x28f0
SyS_finit_module+0x110/0x140
system_call+0x38/0xfc
...
ftrace failed to modify
[<d000000002a31024>] 0xd000000002a31024
actual: 35:65:00:48
We can avoid this by including in the vermagic whether the kernel/module was
built with mprofile-kernel. Which results in:
# insmod autofs4-pg.ko
autofs4: version magic
'4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74f269 SMP mod_unload modversions '
should be
'4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74f269-dirty SMP mod_unload modversions mprofile-kernel'
insmod: ERROR: could not insert module autofs4-pg.ko: Invalid module format
Fixes: 8c50b72a3b ("powerpc/ftrace: Add Kconfig & Make glue for mprofile-kernel")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Jessica Yu <jeyu@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull some more input subsystem updates from Dmitry Torokhov:
"An updated xpad driver with a few more recognized device IDs, and a
new psxpad-spi driver, allowing connecting Playstation 1 and 2 joypads
via SPI bus"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: cros_ec_keyb - remove extraneous 'const'
Input: add support for PlayStation 1/2 joypads connected via SPI
Input: xpad - add USB IDs for Mad Catz Brawlstick and Razer Sabertooth
Input: xpad - sync supported devices with xboxdrv
Input: xpad - sort supported devices by USB ID
Pull UML fixes from Richard Weinberger:
"No new stuff, just fixes"
* 'for-linus-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Add missing NR_CPUS include
um: Fix to call read_initrd after init_bootmem
um: Include kbuild.h instead of duplicating its macros
um: Fix PTRACE_POKEUSER on x86_64
um: Set number of CPUs
um: Fix _print_addr()
Merge misc fixes from Andrew Morton:
"15 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, docs: update memory.stat description with workingset* entries
mm: vmscan: scan until it finds eligible pages
mm, thp: copying user pages must schedule on collapse
dax: fix PMD data corruption when fault races with write
dax: fix data corruption when fault races with write
ext4: return to starting transaction in ext4_dax_huge_fault()
mm: fix data corruption due to stale mmap reads
dax: prevent invalidation of mapped DAX entries
Tigran has moved
mm, vmalloc: fix vmalloc users tracking properly
mm/khugepaged: add missed tracepoint for collapse_huge_page_swapin
gcov: support GCC 7.1
mm, vmstat: Remove spurious WARN() during zoneinfo print
time: delete current_fs_time()
hwpoison, memcg: forcibly uncharge LRU pages
Commit 4b4cea91691d ("mm: vmscan: fix IO/refault regression in cache
workingset transition") introduced three new entries in memory stat
file:
- workingset_refault
- workingset_activate
- workingset_nodereclaim
This commit adds a corresponding description to the cgroup v2 docs.
Link: http://lkml.kernel.org/r/1494530293-31236-1-git-send-email-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have encountered need_resched warnings in __collapse_huge_page_copy()
while doing {clear,copy}_user_highpage() over HPAGE_PMD_NR source pages.
mm->mmap_sem is held for write, but the iteration is well bounded.
Reschedule as needed.
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1705101426380.109808@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is based on a patch from Jan Kara that fixed the equivalent race in
the DAX PTE fault path.
Currently DAX PMD read fault can race with write(2) in the following
way:
CPU1 - write(2) CPU2 - read fault
dax_iomap_pmd_fault()
->iomap_begin() - sees hole
dax_iomap_rw()
iomap_apply()
->iomap_begin - allocates blocks
dax_iomap_actor()
invalidate_inode_pages2_range()
- there's nothing to invalidate
grab_mapping_entry()
- we add huge zero page to the radix tree
and map it to page tables
The result is that hole page is mapped into page tables (and thus zeros
are seen in mmap) while file has data written in that place.
Fix the problem by locking exception entry before mapping blocks for the
fault. That way we are sure invalidate_inode_pages2_range() call for
racing write will either block on entry lock waiting for the fault to
finish (and unmap stale page tables after that) or read fault will see
already allocated blocks by write(2).
Fixes: 9f141d6ef6 ("dax: Call ->iomap_begin without entry lock during dax fault")
Link: http://lkml.kernel.org/r/20170510172700.18991-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently DAX read fault can race with write(2) in the following way:
CPU1 - write(2) CPU2 - read fault
dax_iomap_pte_fault()
->iomap_begin() - sees hole
dax_iomap_rw()
iomap_apply()
->iomap_begin - allocates blocks
dax_iomap_actor()
invalidate_inode_pages2_range()
- there's nothing to invalidate
grab_mapping_entry()
- we add zero page in the radix tree
and map it to page tables
The result is that hole page is mapped into page tables (and thus zeros
are seen in mmap) while file has data written in that place.
Fix the problem by locking exception entry before mapping blocks for the
fault. That way we are sure invalidate_inode_pages2_range() call for
racing write will either block on entry lock waiting for the fault to
finish (and unmap stale page tables after that) or read fault will see
already allocated blocks by write(2).
Fixes: 9f141d6ef6
Link: http://lkml.kernel.org/r/20170510085419.27601-5-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
DAX will return to locking exceptional entry before mapping blocks for a
page fault to fix possible races with concurrent writes. To avoid lock
inversion between exceptional entry lock and transaction start, start
the transaction already in ext4_dax_huge_fault().
Fixes: 9f141d6ef6
Link: http://lkml.kernel.org/r/20170510085419.27601-4-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, we didn't invalidate page tables during invalidate_inode_pages2()
for DAX. That could result in e.g. 2MiB zero page being mapped into
page tables while there were already underlying blocks allocated and
thus data seen through mmap were different from data seen by read(2).
The following sequence reproduces the problem:
- open an mmap over a 2MiB hole
- read from a 2MiB hole, faulting in a 2MiB zero page
- write to the hole with write(3p). The write succeeds but we
incorrectly leave the 2MiB zero page mapping intact.
- via the mmap, read the data that was just written. Since the zero
page mapping is still intact we read back zeroes instead of the new
data.
Fix the problem by unconditionally calling invalidate_inode_pages2_range()
in dax_iomap_actor() for new block allocations and by properly
invalidating page tables in invalidate_inode_pages2_range() for DAX
mappings.
Fixes: c6dcf52c23
Link: http://lkml.kernel.org/r/20170510085419.27601-3-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm,dax: Fix data corruption due to mmap inconsistency",
v4.
This series fixes data corruption that can happen for DAX mounts when
page faults race with write(2) and as a result page tables get out of
sync with block mappings in the filesystem and thus data seen through
mmap is different from data seen through read(2).
The series passes testing with t_mmap_stale test program from Ross and
also other mmap related tests on DAX filesystem.
This patch (of 4):
dax_invalidate_mapping_entry() currently removes DAX exceptional entries
only if they are clean and unlocked. This is done via:
invalidate_mapping_pages()
invalidate_exceptional_entry()
dax_invalidate_mapping_entry()
However, for page cache pages removed in invalidate_mapping_pages()
there is an additional criteria which is that the page must not be
mapped. This is noted in the comments above invalidate_mapping_pages()
and is checked in invalidate_inode_page().
For DAX entries this means that we can can end up in a situation where a
DAX exceptional entry, either a huge zero page or a regular DAX entry,
could end up mapped but without an associated radix tree entry. This is
inconsistent with the rest of the DAX code and with what happens in the
page cache case.
We aren't able to unmap the DAX exceptional entry because according to
its comments invalidate_mapping_pages() isn't allowed to block, and
unmap_mapping_range() takes a write lock on the mapping->i_mmap_rwsem.
Since we essentially never have unmapped DAX entries to evict from the
radix tree, just remove dax_invalidate_mapping_entry().
Fixes: c6dcf52c23 ("mm: Invalidate DAX radix tree entries only if appropriate")
Link: http://lkml.kernel.org/r/20170510085419.27601-2-jack@suse.cz
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org> [4.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 1f5307b1e0 ("mm, vmalloc: properly track vmalloc users") has
pulled asm/pgtable.h include dependency to linux/vmalloc.h and that
turned out to be a bad idea for some architectures. E.g. m68k fails
with
In file included from arch/m68k/include/asm/pgtable_mm.h:145:0,
from arch/m68k/include/asm/pgtable.h:4,
from include/linux/vmalloc.h:9,
from arch/m68k/kernel/module.c:9:
arch/m68k/include/asm/mcf_pgtable.h: In function 'nocache_page':
>> arch/m68k/include/asm/mcf_pgtable.h:339:43: error: 'init_mm' undeclared (first use in this function)
#define pgd_offset_k(address) pgd_offset(&init_mm, address)
as spotted by kernel build bot. nios2 fails for other reason
In file included from include/asm-generic/io.h:767:0,
from arch/nios2/include/asm/io.h:61,
from include/linux/io.h:25,
from arch/nios2/include/asm/pgtable.h:18,
from include/linux/mm.h:70,
from include/linux/pid_namespace.h:6,
from include/linux/ptrace.h:9,
from arch/nios2/include/uapi/asm/elf.h:23,
from arch/nios2/include/asm/elf.h:22,
from include/linux/elf.h:4,
from include/linux/module.h:15,
from init/main.c:16:
include/linux/vmalloc.h: In function '__vmalloc_node_flags':
include/linux/vmalloc.h:99:40: error: 'PAGE_KERNEL' undeclared (first use in this function); did you mean 'GFP_KERNEL'?
which is due to the newly added #include <asm/pgtable.h>, which on nios2
includes <linux/io.h> and thus <asm/io.h> and <asm-generic/io.h> which
again includes <linux/vmalloc.h>.
Tweaking that around just turns out a bigger headache than necessary.
This patch reverts 1f5307b1e0 and reimplements the original fix in a
different way. __vmalloc_node_flags can stay static inline which will
cover vmalloc* functions. We only have one external user
(kvmalloc_node) and we can export __vmalloc_node_flags_caller and
provide the caller directly. This is much simpler and it doesn't really
need any games with header files.
[akpm@linux-foundation.org: coding-style fixes]
[mhocko@kernel.org: revert old comment]
Link: http://lkml.kernel.org/r/20170509211054.GB16325@dhcp22.suse.cz
Fixes: 1f5307b1e0 ("mm, vmalloc: properly track vmalloc users")
Link: http://lkml.kernel.org/r/20170509153702.GR6481@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Tobias Klauser <tklauser@distanz.ch>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One return case of `__collapse_huge_page_swapin()` does not invoke
tracepoint while every other return case does. This commit adds a
tracepoint invocation for the case.
Link: http://lkml.kernel.org/r/20170507101813.30187-1-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After commit e2ecc8a79e ("mm, vmstat: print non-populated zones in
zoneinfo"), /proc/zoneinfo will show unpopulated zones.
A memoryless node, having no populated zones at all, was previously
ignored, but will now trigger the WARN() in is_zone_first_populated().
Remove this warning, as its only purpose was to warn of a situation that
has since been enabled.
Aside: The "per-node stats" are still printed under the first populated
zone, but that's not necessarily the first stanza any more. I'm not
sure which criteria is more important with regard to not breaking
parsers, but it looks a little weird to the eye.
Fixes: e2ecc8a79e ("mm, vmstat: print node-based stats in zoneinfo file")
Link: http://lkml.kernel.org/r/1493854905-10918-1-git-send-email-arbab@linux.vnet.ibm.com
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All uses of the current_fs_time() function have been replaced by other
time interfaces.
And, its use cases can be fulfilled by current_time() or ktime_get_*
variants.
Link: http://lkml.kernel.org/r/1491613030-11599-13-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Laurent Dufour has noticed that hwpoinsoned pages are kept charged. In
his particular case he has hit a bad_page("page still charged to
cgroup") when onlining a hwpoison page. While this looks like something
that shouldn't happen in the first place because onlining hwpages and
returning them to the page allocator makes only little sense it shows a
real problem.
hwpoison pages do not get freed usually so we do not uncharge them (at
least not since commit 0a31bc97c8 ("mm: memcontrol: rewrite uncharge
API")). Each charge pins memcg (since e8ea14cc6e ("mm: memcontrol:
take a css reference for each charged page")) as well and so the
mem_cgroup and the associated state will never go away. Fix this leak
by forcibly uncharging a LRU hwpoisoned page in delete_from_lru_cache().
We also have to tweak uncharge_list because it cannot rely on zero ref
count for these pages.
[akpm@linux-foundation.org: coding-style fixes]
Fixes: 0a31bc97c8 ("mm: memcontrol: rewrite uncharge API")
Link: http://lkml.kernel.org/r/20170502185507.GB19165@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull libnvdimm fixes from Dan Williams:
"Incremental fixes and a small feature addition on top of the main
libnvdimm 4.12 pull request:
- Geert noticed that tinyconfig was bloated by BLOCK selecting DAX.
The size regression is fixed by moving all dax helpers into the
dax-core and only specifying "select DAX" for FS_DAX and
dax-capable drivers. He also asked for clarification of the
NR_DEV_DAX config option which, on closer look, does not need to be
a config option at all. Mike also throws in a DEV_DAX_PMEM fixup
for good measure.
- Ben's attention to detail on -stable patch submissions caught a
case where the recent fixes to arch_copy_from_iter_pmem() missed a
condition where we strand dirty data in the cache. This is tagged
for -stable and will also be included in the rework of the pmem api
to a proposed {memcpy,copy_user}_flushcache() interface for 4.13.
- Vishal adds a feature that missed the initial pull due to pending
review feedback. It allows the kernel to clear media errors when
initializing a BTT (atomic sector update driver) instance on a pmem
namespace.
- Ross noticed that the dax_device + dax_operations conversion broke
__dax_zero_page_range(). The nvdimm unit tests fail to check this
path, but xfstests immediately trips over it. No excuse for missing
this before submitting the 4.12 pull request.
These all pass the nvdimm unit tests and an xfstests spot check. The
set has received a build success notification from the kbuild robot"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
filesystem-dax: fix broken __dax_zero_page_range() conversion
libnvdimm, btt: ensure that initializing metadata clears poison
libnvdimm: add an atomic vs process context flag to rw_bytes
x86, pmem: Fix cache flushing for iovec write < 8 bytes
device-dax: kill NR_DEV_DAX
block, dax: move "select DAX" from BLOCK to FS_DAX
device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX
This contains a one-liner change that has a significant impact:
disabling the build of OSS. It's been unmaintained for long time,
and we'd like to drop the stuff. Finally, as the first step, stop
the build. Let's see whether it works without much complaints.
Other than that, there are two small fixes for HD-audio.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEECxfAB4MH3rD5mfB6bDGAVD0pKaQFAlkVr2YOHHRpd2FpQHN1
c2UuZGUACgkQbDGAVD0pKaQPqRAAoTbIYoB30Wzrg/CraNwTC0ks0pVF1Pjk6mQF
2phnpwbUZ/Hj2bD+4ydlaeNO93pXJHlyvoQwfxlb/tJEg8pVZ2PwyLZ/nfKAGDHg
BJHpGuv3CUvrU/OzZ2xXGfKeiMibhE7Bfjnl/hriztiCNdeEVE6tQU5hRczSUzk8
zgo3fmDsmP4NZqBR9DnrfTsti+HWcclAYKbxaHPxOfVOkKpd5waVQu4AAU5JgBhE
mifvey0QaRUhpDs8ETlDZhvLJICbB6Ybb2l2hK+HyQBwSOSUjHNBXlnsBXpeXUVJ
Ay/hJ4/u0PfeNafxZm66fTEHHTvyXCiCfjStKvIqGkjkAgI73r2H5qrdk4FhyFDI
ovOGMR8j5YKH4smcU13Yx8qoQqmpjNJiL1eplFsRv+BImiIAADH8AMhdKySOQa9/
wmw6qWcwo+LHkxu+l+FCjaQlsZbRhaBmYRnSG4A0IU/lTWhcFk0BYa98Hj/TpjoB
Yo1JmUhYW80h424p0eqV/vl/VDCiUpMct4YcvK3nNDi/1uuLC8qC8jr+qH8Bssg8
9xZjaXDKf3Hf2x7r++kU2xjb1NEWH5WMBFzpXEuN6I+nY/BJG5i/JrXlvWBAG+qZ
Cn3rOAE2ACM9Vs6wtIhaDU7aMoaCb/GlkkmcF7y3XRSmUF53yVICoCNLZ1zFuJrh
Pr2CjJ4=
=Na0H
-----END PGP SIGNATURE-----
Merge tag 'sound-fix-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This contains a one-liner change that has a significant impact:
disabling the build of OSS. It's been unmaintained for long time, and
we'd like to drop the stuff. Finally, as the first step, stop the
build. Let's see whether it works without much complaints.
Other than that, there are two small fixes for HD-audio"
* tag 'sound-fix-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
sound: Disable the build of OSS drivers
ALSA: hda: Fix cpu lockup when stopping the cmd dmas
ALSA: hda - Add mute led support for HP EliteBook 840 G3
* New battery driver for AXP20X and AXP22X PMICs
* Improve max17042_battery for usage on x86
* Misc small cleanups & fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE72YNB0Y/i3JqeVQT2O7X88g7+poFAlkU2QYACgkQ2O7X88g7
+poudxAAjAzrD6WGMtH+++53j8sKTel1eyuKImsv/4TgkWjyBeXzeeEtHX4T8qCJ
HtfGf/JB7sYCWdSFXCxYmOy051r6InGd3VxC2mMvkuz6p8ya1YAR41oHwXEBdsa5
TcYhIe9GNHEhdENTLsKywXOA3gh1REjU9ljCUJJwAxUpqDfU6pRSoFqwB0SAu6zC
/FjGAcooyOyKT9tsZuJ68Xw01thtsowMLNKD6gkXSiMfH2wVhlEijoQ3OwJzW423
mvxukptrDaMiXWjjwY46/ExfEmoEmDo8le5vnB6m3OUOkTmsj2A7RI0GwYPjUM6S
ZCrGn4yNeq3c277rAixY8twhS9zLVM3vuFWOKeT9zSIleJ7eFu9er6U2jJZvA17C
BDoZg4RWFQrYv6YcMsADS8Wk4jYwhahdApNbG2RcUrC3qH+QTqGw6/7LCJ9X3dXs
tSWszwSw9pGIPNnzzhwI7nNREFTfy8W+0qRRyUw2LWEVTJBxzUYmF8L3hFnhr/94
JqFdDWqSCEZpi0sgjp3IxDAlZJ9p/lTFN7iA8UdK912kBA+1Mo4JQqd6piXxYvJM
yCjv4zDclpDa7QbV286y/FOLy/qnUrQ+8qIGvSgAJh5v32mJc8uJqP+4g8401D5T
6YfzboSUQ1s4hofFh1wIO+swq5iTcLJrhZyaCWC9qgKQVWx8tSw=
=lF9I
-----END PGP SIGNATURE-----
Merge tag 'for-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull more power-supply updates from Sebastian Reichel:
"The power-supply subsystem has a few more changes for the v4.12 merge
window:
- New battery driver for AXP20X and AXP22X PMICs
- Improve max17042_battery for usage on x86
- Misc small cleanups & fixes"
* tag 'for-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (34 commits)
power: supply: cpcap-charger: Keep trickle charger bits disabled
power: supply: cpcap-charger: Fix enable for 3.8V charge setting
power: supply: cpcap-charger: Fix charge voltage configuration
power: supply: cpcap-charger: Fix charger name
power: supply: twl4030-charger: make twl4030_bci_property_is_writeable static
power: supply: sbs-battery: Add alert callback
mailmap: add Sebastian Reichel
power: supply: avoid unused twl4030-madc.h
power: supply: sbs-battery: Correct supply status with current draw
power: supply: sbs-battery: Don't ignore the first external power change
power: supply: pda_power: move from timer to delayed_work
power: supply: max17042_battery: Add support for the SCOPE property
power: supply: max17042_battery: Add support for the CHARGE_NOW property
power: supply: max17042_battery: Add support for the CHARGE_FULL_DESIGN property
power: supply: max17042_battery: mAh readings depend on r_sns value
power: supply: max17042_battery: Add support for the VOLT_MIN property
power: supply: max17042_battery: Add support for the TECHNOLOGY attribute
power: supply: max17042_battery: Add external_power_changed callback
power: supply: max17042_battery: Add support for the STATUS property
power: supply: max17042_battery: Add default platform_data fallback data
...
Pull thermal management updates from Zhang Rui:
- Fix a problem where orderly_shutdown() is called for multiple times
due to multiple critical overheating events raised in a short period
by platform thermal driver. (Keerthy)
- Introduce a backup thermal shutdown mechanism, which invokes
kernel_power_off()/emergency_restart() directly, after
orderly_shutdown() being issued for certain amount of time(specified
via Kconfig). This is useful in certain conditions that userspace may
be unable to power off the system in a clean manner and leaves the
system in a critical state, like in the middle of driver probing
phase. (Keerthy)
- Introduce a new interface in thermal devfreq_cooling code so that the
driver can provide more precise data regarding actual power to the
thermal governor every time the power budget is calculated. (Lukasz
Luba)
- Introduce BCM 2835 soc thermal driver and northstar thermal driver,
within a new sub-folder. (Rafał Miłecki)
- Introduce DA9062/61 thermal driver. (Steve Twiss)
- Remove non-DT booting on TI-SoC driver. Also add support to fetching
coefficients from DT. (Keerthy)
- Refactorf RCAR Gen3 thermal driver. (Niklas Söderlund)
- Small fix on MTK and intel-soc-dts thermal driver. (Dawei Chien,
Brian Bian)
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (25 commits)
thermal: core: Add a back up thermal shutdown mechanism
thermal: core: Allow orderly_poweroff to be called only once
Thermal: Intel SoC DTS: Change interrupt request behavior
trace: thermal: add another parameter 'power' to the tracing function
thermal: devfreq_cooling: add new interface for direct power read
thermal: devfreq_cooling: refactor code and add get_voltage function
thermal: mt8173: minor mtk_thermal.c cleanups
thermal: bcm2835: move to the broadcom subdirectory
thermal: broadcom: ns: specify myself as MODULE_AUTHOR
thermal: da9062/61: Thermal junction temperature monitoring driver
Documentation: devicetree: thermal: da9062/61 TJUNC temperature binding
thermal: broadcom: add Northstar thermal driver
dt-bindings: thermal: add support for Broadcom's Northstar thermal
thermal: bcm2835: add thermal driver for bcm2835 SoC
dt-bindings: Add thermal zone to bcm2835-thermal example
thermal: rcar_gen3_thermal: add suspend and resume support
thermal: rcar_gen3_thermal: store device match data in private structure
thermal: rcar_gen3_thermal: enable hardware interrupts for trip points
thermal: rcar_gen3_thermal: record and check number of TSCs found
thermal: rcar_gen3_thermal: check that TSC exists before memory allocation
...