Commit Graph

4560 Commits

Author SHA1 Message Date
Mahesh Salgaonkar
94675cceac powerpc/pseries: Defer the logging of rtas error to irq work queue.
rtas_log_buf is a buffer to hold RTAS event data that are communicated
to kernel by hypervisor. This buffer is then used to pass RTAS event
data to user through proc fs. This buffer is allocated from
vmalloc (non-linear mapping) area.

On Machine check interrupt, register r3 points to RTAS extended event
log passed by hypervisor that contains the MCE event. The pseries
machine check handler then logs this error into rtas_log_buf. The
rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a
page fault (vector 0x300) while accessing it. Since machine check
interrupt handler runs in NMI context we can not afford to take any
page fault. Page faults are not honored in NMI context and causes
kernel panic. Apart from that, as Nick pointed out,
pSeries_log_error() also takes a spin_lock while logging error which
is not safe in NMI context. It may endup in deadlock if we get another
MCE before releasing the lock. Fix this by deferring the logging of
rtas error to irq work queue.

Current implementation uses two different buffers to hold rtas error
log depending on whether extended log is provided or not. This makes
bit difficult to identify which buffer has valid data that needs to
logged later in irq work. Simplify this using single buffer, one per
paca, and copy rtas log to it irrespective of whether extended log is
provided or not. Allocate this buffer below RMA region so that it can
be accessed in real mode mce handler.

Fixes: b96672dd84 ("powerpc: Machine check interrupt is a non-maskable interrupt")
Cc: stable@vger.kernel.org # v4.14+
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:29 +10:00
Benjamin Herrenschmidt
e27e0a9465 powerpc/xive: Remove xive_kexec_teardown_cpu()
It's identical to xive_teardown_cpu() so just use the latter

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:28 +10:00
Nicholas Piggin
5a6099346c powerpc/64s/radix: tlb do not flush on page size when fullmm
When the mm is being torn down there will be a full PID flush so
there is no need to flush the TLB on page size changes.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-08-07 21:49:26 +10:00
Shilpasri G Bhat
04baaf28f4 powerpc/powernv: Add support to enable sensor groups
Adds support to enable/disable a sensor group at runtime. This
can be used to select the sensor groups that needs to be copied to
main memory by OCC. Sensor groups like power, temperature, current,
voltage, frequency, utilization can be enabled/disabled at runtime.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31 19:56:45 +10:00
Akshay Adiga
1961acad2f powernv/cpuidle: Use parsed device tree values for cpuidle_init
Export pnv_idle_states and nr_pnv_idle_states so that its accessible to
cpuidle driver. Use properties from pnv_idle_states structure for powernv
cpuidle_init.

Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31 19:56:44 +10:00
Akshay Adiga
9c7b185ab2 powernv/cpuidle: Parse dt idle properties into global structure
Device-tree parsing happens twice, once while deciding idle state to be
used for hotplug and once during cpuidle init. Hence, parsing the device
tree and caching it will reduce code duplication. Parsing code has been
moved to pnv_parse_cpuidle_dt() from pnv_probe_idle_states(). In addition
to the properties in the device tree the number of available states is
also required.

Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-31 19:56:44 +10:00
Michael Ellerman
a984506c54 powerpc/mm: Don't report PUDs as memory leaks when using kmemleak
Paul Menzel reported that kmemleak was producing reports such as:

  unreferenced object 0xc0000000f8b80000 (size 16384):
    comm "init", pid 1, jiffies 4294937416 (age 312.240s)
    hex dump (first 32 bytes):
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    backtrace:
      [<00000000d997deb7>] __pud_alloc+0x80/0x190
      [<0000000087f2e8a3>] move_page_tables+0xbac/0xdc0
      [<00000000091e51c2>] shift_arg_pages+0xc0/0x210
      [<00000000ab88670c>] setup_arg_pages+0x22c/0x2a0
      [<0000000060871529>] load_elf_binary+0x41c/0x1648
      [<00000000ecd9d2d4>] search_binary_handler.part.11+0xbc/0x280
      [<0000000034e0cdd7>] __do_execve_file.isra.13+0x73c/0x940
      [<000000005f953a6e>] sys_execve+0x58/0x70
      [<000000009700a858>] system_call+0x5c/0x70

Indicating that a PUD was being leaked.

However what's really happening is that kmemleak is not able to
recognise the references from the PGD to the PUD, because they are not
fully qualified pointers.

We can confirm that in xmon, eg:

Find the task struct for pid 1 "init":
  0:mon> P
       task_struct     ->thread.ksp    PID   PPID S  P CMD
  c0000001fe7c0000 c0000001fe803960      1      0 S 13 systemd

Dump virtual address 0 to find the PGD:
  0:mon> dv 0 c0000001fe7c0000
  pgd  @ 0xc0000000f8b01000

Dump the memory of the PGD:
  0:mon> d c0000000f8b01000
  c0000000f8b01000 00000000f8b90000 0000000000000000  |................|
  c0000000f8b01010 0000000000000000 0000000000000000  |................|
  c0000000f8b01020 0000000000000000 0000000000000000  |................|
  c0000000f8b01030 0000000000000000 00000000f8b80000  |................|
                                    ^^^^^^^^^^^^^^^^

There we can see the reference to our supposedly leaked PUD. But
because it's missing the leading 0xc, kmemleak won't recognise it.

We can confirm it's still in use by translating an address that is
mapped via it:
  0:mon> dv 7fff94000000 c0000001fe7c0000
  pgd  @ 0xc0000000f8b01000
  pgdp @ 0xc0000000f8b01038 = 0x00000000f8b80000 <--
  pudp @ 0xc0000000f8b81ff8 = 0x00000000037c4000
  pmdp @ 0xc0000000037c5ca0 = 0x00000000fbd89000
  ptep @ 0xc0000000fbd89000 = 0xc0800001d5ce0386
  Maps physical address = 0x00000001d5ce0000
  Flags = Accessed Dirty Read Write

The fix is fairly simple. We need to tell kmemleak to ignore PUD
allocations and never report them as leaks. We can also tell it not to
scan the PGD, because it will never find pointers in there. However it
will still notice if we allocate a PGD and then leak it.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:21 +10:00
Christophe Leroy
405cb4024e powerpc: split asm/tlbflush.h
Split asm/tlbflush.h into:
asm/nohash/tlbflush.h
asm/book3s/32/tlbflush.h
asm/book3s/64/tlbflush.h (already existing)

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:21 +10:00
Christophe Leroy
45ef5992e0 powerpc: remove unnecessary inclusion of asm/tlbflush.h
asm/tlbflush.h is only needed for:
- using functions xxx_flush_tlb_xxx()
- using MMU_NO_CONTEXT
- including asm-generic/pgtable.h

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:20 +10:00
Christophe Leroy
7bc396958c powerpc/44x: remove page.h from mmu-44x.h
mmu-44x.h doesn't need asm/page.h if PAGE_SHIFT are replaced by CONFIG_PPC_XX_PAGES

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:20 +10:00
Christophe Leroy
0c295d0e9c powerpc/nohash: fix hash related comments in pgtable.h
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:19 +10:00
Christophe Leroy
62b8426578 powerpc: fix includes in asm/processor.h
Remove superflous includes and add missing ones

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:19 +10:00
Christophe Leroy
6b62266911 powerpc/book3s: Remove PPC_PIN_SIZE
PPC_PIN_SIZE is specific to the 44x and is defined in mmu.h

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:19 +10:00
Christophe Leroy
b5ac51d747 powerpc: declare set_breakpoint() static
set_breakpoint() is only used in process.c so make it static

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:18 +10:00
Christophe Leroy
e8cb7a55eb powerpc: remove superflous inclusions of asm/fixmap.h
Files not using fixmap consts or functions don't need asm/fixmap.h

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:18 +10:00
Christophe Leroy
2c86cd188f powerpc: clean inclusions of asm/feature-fixups.h
files not using feature fixup don't need asm/feature-fixups.h
files using feature fixup need asm/feature-fixups.h

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:17 +10:00
Christophe Leroy
5c35a02c54 powerpc: clean the inclusion of stringify.h
Only include linux/stringify.h is files using __stringify()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:17 +10:00
Christophe Leroy
ec0c464cdb powerpc: move ASM_CONST and stringify_in_c() into asm-const.h
This patch moves ASM_CONST() and stringify_in_c() into
dedicated asm-const.h, then cleans all related inclusions.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: asm-compat.h should include asm-const.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:16 +10:00
Christophe Leroy
36a7eeaff7 powerpc/405: move PPC405_ERR77 in asm-405.h
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:48:13 +10:00
Christophe Leroy
8c58259bba powerpc: remove unneeded inclusions of cpu_has_feature.h
Files not using cpu_has_feature() don't need cpu_has_feature.h

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:47:54 +10:00
Christophe Leroy
db0a2b633d powerpc: remove kdump.h from page.h
page.h doesn't need kdump.h

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-30 22:47:53 +10:00
Nicholas Piggin
17cc1dd492 powerpc/powernv: implement opal_put_chars_atomic
The RAW console does not need writes to be atomic, so relax
opal_put_chars to be able to do partial writes, and implement an
_atomic variant which does not take a spinlock. This API is used
in xmon, so the less locking that is used, the better chance there
is that a crash can be debugged.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:57 +10:00
Nicholas Piggin
d2a2262e68 powerpc/powernv: Implement and use opal_flush_console
A new console flushing firmware API was introduced to replace event
polling loops, and implemented in opal-kmsg with affddff69c
("powerpc/powernv: Add a kmsg_dumper that flushes console output on
panic"), to flush the console in the panic path.

The OPAL console driver has other situations where interrupts are off
and it needs to flush the console synchronously. These still use a
polling loop.

So move the opal-kmsg flush code to opal_flush_console, and use the
new function in opal-kmsg and opal_put_chars.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:56 +10:00
Simon Guo
d58badfb7c powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision
This patch add VMX primitives to do memcmp() in case the compare size
is equal or greater than 4K bytes. KSM feature can benefit from this.

Test result with following test program(replace the "^>" with ""):
------
># cat tools/testing/selftests/powerpc/stringloops/memcmp.c
>#include <malloc.h>
>#include <stdlib.h>
>#include <string.h>
>#include <time.h>
>#include "utils.h"
>#define SIZE (1024 * 1024 * 900)
>#define ITERATIONS 40

int test_memcmp(const void *s1, const void *s2, size_t n);

static int testcase(void)
{
        char *s1;
        char *s2;
        unsigned long i;

        s1 = memalign(128, SIZE);
        if (!s1) {
                perror("memalign");
                exit(1);
        }

        s2 = memalign(128, SIZE);
        if (!s2) {
                perror("memalign");
                exit(1);
        }

        for (i = 0; i < SIZE; i++)  {
                s1[i] = i & 0xff;
                s2[i] = i & 0xff;
        }
        for (i = 0; i < ITERATIONS; i++) {
		int ret = test_memcmp(s1, s2, SIZE);

		if (ret) {
			printf("return %d at[%ld]! should have returned zero\n", ret, i);
			abort();
		}
	}

        return 0;
}

int main(void)
{
        return test_harness(testcase, "memcmp");
}
------
Without this patch (but with the first patch "powerpc/64: Align bytes
before fall back to .Lshort in powerpc64 memcmp()." in the series):
	4.726728762 seconds time elapsed                                          ( +-  3.54%)
With VMX patch:
	4.234335473 seconds time elapsed                                          ( +-  2.63%)
		There is ~+10% improvement.

Testing with unaligned and different offset version (make s1 and s2 shift
random offset within 16 bytes) can archieve higher improvement than 10%..

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:03:21 +10:00
Simon Guo
f1ecbaf466 powerpc: add vcmpequd/vcmpequb ppc instruction macro
Some old tool chains don't know about instructions like vcmpequd.

This patch adds .long macro for vcmpequd and vcmpequb, which is
a preparation to optimize ppc64 memcmp with VMX instructions.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:03:20 +10:00
Aneesh Kumar K.V
a833280b4a powerpc/mm/hash: Add hpte_get_old_v and use that instead of opencoding
No functional change

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:03:18 +10:00
Aneesh Kumar K.V
7d4340bb92 powerpc/mm: Increase MAX_PHYSMEM_BITS to 128TB with SPARSEMEM_VMEMMAP config
We do this only with VMEMMAP config so that our page_to_[nid/section] etc are not
impacted.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:03:17 +10:00
Nicholas Piggin
5b73151fff powerpc: NMI IPI make NMI IPIs fully sychronous
There is an asynchronous aspect to smp_send_nmi_ipi. The caller waits
for all CPUs to call in to the handler, but it does not wait for
completion of the handler. This is a needless complication, so remove
it and always wait synchronously.

The synchronous wait allows the caller to easily time out and clear
the wait for completion (zero nmi_ipi_busy_count) in the case of badly
behaved handlers. This would have prevented the recent smp_send_stop
NMI IPI bug from causing the system to hang.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:03:14 +10:00
Nicholas Piggin
9b81c0211c powerpc/64s: make PACA_IRQ_HARD_DIS track MSR[EE] closely
When the masked interrupt handler clears MSR[EE] for an interrupt in
the PACA_IRQ_MUST_HARD_MASK set, it does not set PACA_IRQ_HARD_DIS.
This makes them get out of synch.

With that taken into account, it's only low level irq manipulation
(and interrupt entry before reconcile) where they can be out of synch.
This makes the code less surprising.

It also allows the IRQ replay code to rely on the IRQ_HARD_DIS value
and not have to mtmsrd again in this case (e.g., for an external
interrupt that has been masked). The bigger benefit might just be
that there is not such an element of surprise in these two bits of
state.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:03:14 +10:00
Ram Pai
07f522d203 powerpc/pkeys: make protection key 0 less special
Applications need the ability to associate an address-range with some
key and latter revert to its initial default key. Pkey-0 comes close to
providing this function but falls short, because the current
implementation disallows applications to explicitly associate pkey-0 to
the address range.

Lets make pkey-0 less special and treat it almost like any other key.
Thus it can be explicitly associated with any address range, and can be
freed. This gives the application more flexibility and power.  The
ability to free pkey-0 must be used responsibily, since pkey-0 is
associated with almost all address-range by default.

Even with this change pkey-0 continues to be slightly more special
from the following point of view.
(a) it is implicitly allocated.
(b) it is the default key assigned to any address-range.
(c) its permissions cannot be modified by userspace.

NOTE: (c) is specific to powerpc only. pkey-0 is associated by default
with all pages including kernel pages, and pkeys are also active in
kernel mode. If any permission is denied on pkey-0, the kernel running
in the context of the application will be unable to operate.

Tested on powerpc.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
[mpe: Drop #define PKEY_0 0 in favour of plain old 0]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 21:43:24 +10:00
Ram Pai
4a4a5e5d2a powerpc/pkeys: key allocation/deallocation must not change pkey registers
Key allocation and deallocation has the side effect of programming the
UAMOR/AMR/IAMR registers. This is wrong, since its the responsibility of
the application and not that of the kernel, to modify the permission on
the key.

Do not modify the pkey registers at key allocation/deallocation.

This patch also fixes a bug where a sys_pkey_free() resets the UAMOR
bits of the key, thus making its permissions unmodifiable from user
space. Later if the same key gets reallocated from a different thread
this thread will no longer be able to change the permissions on the key.

Fixes: cf43d3b264 ("powerpc: Enable pkey subsystem")
Cc: stable@vger.kernel.org # v4.16+
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 21:34:08 +10:00
Michael Ellerman
ce57c6610c Merge branch 'topic/ppc-kvm' into next
Merge in some commits we're sharing with the KVM tree.

I manually propagated the change from commit d3d4ffaae4
("powerpc/powernv/ioda2: Reduce upper limit for DMA window size") into
pci-ioda-tce.c.

Conflicts:
        arch/powerpc/include/asm/cputable.h
        arch/powerpc/platforms/powernv/pci-ioda.c
        arch/powerpc/platforms/powernv/pci.h
2018-07-19 14:37:57 +10:00
Alexey Kardashevskiy
a68bd1267b powerpc/powernv/ioda: Allocate indirect TCE levels on demand
At the moment we allocate the entire TCE table, twice (hardware part and
userspace translation cache). This normally works as we normally have
contigous memory and the guest will map entire RAM for 64bit DMA.

However if we have sparse RAM (one example is a memory device), then
we will allocate TCEs which will never be used as the guest only maps
actual memory for DMA. If it is a single level TCE table, there is nothing
we can really do but if it a multilevel table, we can skip allocating
TCEs we know we won't need.

This adds ability to allocate only first level, saving memory.

This changes iommu_table::free() to avoid allocating of an extra level;
iommu_table::set() will do this when needed.

This adds @alloc parameter to iommu_table::exchange() to tell the callback
if it can allocate an extra level; the flag is set to "false" for
the realmode KVM handlers of H_PUT_TCE hcalls and the callback returns
H_TOO_HARD.

This still requires the entire table to be counted in mm::locked_vm.

To be conservative, this only does on-demand allocation when
the usespace cache table is requested which is the case of VFIO.

The example math for a system replicating a powernv setup with NVLink2
in a guest:
16GB RAM mapped at 0x0
128GB GPU RAM window (16GB of actual RAM) mapped at 0x244000000000

the table to cover that all with 64K pages takes:
(((0x244000000000 + 0x2000000000) >> 16)*8)>>20 = 4556MB

If we allocate only necessary TCE levels, we will only need:
(((0x400000000 + 0x400000000) >> 16)*8)>>20 = 4MB (plus some for indirect
levels).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16 22:53:11 +10:00
Alexey Kardashevskiy
090bad39b2 powerpc/powernv: Add indirect levels to it_userspace
We want to support sparse memory and therefore huge chunks of DMA windows
do not need to be mapped. If a DMA window big enough to require 2 or more
indirect levels, and a DMA window is used to map all RAM (which is
a default case for 64bit window), we can actually save some memory by
not allocation TCE for regions which we are not going to map anyway.

The hardware tables alreary support indirect levels but we also keep
host-physical-to-userspace translation array which is allocated by
vmalloc() and is a flat array which might use quite some memory.

This converts it_userspace from vmalloc'ed array to a multi level table.

As the format becomes platform dependend, this replaces the direct access
to it_usespace with a iommu_table_ops::useraddrptr hook which returns
a pointer to the userspace copy of a TCE; future extension will return
NULL if the level was not allocated.

This should not change non-KVM handling of TCE tables and it_userspace
will not be allocated for non-KVM tables.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16 22:53:10 +10:00
Alexey Kardashevskiy
00a5c58d94 KVM: PPC: Make iommu_table::it_userspace big endian
We are going to reuse multilevel TCE code for the userspace copy of
the TCE table and since it is big endian, let's make the copy big endian
too.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16 22:53:09 +10:00
Nicholas Piggin
2bf1071a8d powerpc/64s: Remove POWER9 DD1 support
POWER9 DD1 was never a product. It is no longer supported by upstream
firmware, and it is not effectively supported in Linux due to lack of
testing.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au>
[mpe: Remove arch_make_huge_pte() entirely]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-16 11:37:21 +10:00
Joel Stanley
e11b64b1ef powerpc: Remove Power8 DD1 from cputable
This was added to support an early version of Power8 that did not have
working doorbells. These machines were not publicly available, and all of
the internal users have long since upgraded.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-12 21:08:09 +10:00
Alastair D'Silva
8bf6b91a51 Revert "powerpc/powernv: Add support for the cxl kernel api on the real phb"
Remove abandonned capi support for the Mellanox CX4.

This reverts commit 4361b03430.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-02 23:54:32 +10:00
Mauro S. M. Rodrigues
ee8c446fed powerpc/eeh: Avoid misleading message "EEH: no capable adapters found"
Due to recent refactoring in EEH in:
commit b9fde58db7 ("powerpc/powernv: Rework EEH initialization on
powernv")
a misleading message was seen in the kernel message buffer:

[    0.108431] EEH: PowerNV platform initialized
[    0.589979] EEH: No capable adapters found

This happened due to the removal of the initialization delay for powernv
platform.

Even though the EEH infrastructure for the devices is eventually
initialized and still works just fine the eeh device probe step is
postponed in order to assure the PEs are created. Later
pnv_eeh_post_init does the probe devices job but at that point the
message was already shown right after eeh_init flow.

This patch introduces a new flag EEH_POSTPONED_PROBE to represent that
temporary state and avoid the message mentioned above and showing the
follow one instead:

[    0.107724] EEH: PowerNV platform initialized
[    4.844825] EEH: PCI Enhanced I/O Error Handling Enabled

Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Tested-by:Venkat Rao B <vrbagal1@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-02 23:54:26 +10:00
Aneesh Kumar K.V
941c06d585 powerpc/mm/32: Fix pgtable_page_dtor call
Commit 667416f385 ("powerpc/mm: Fix kernel crash on page table free")
added a call for pgtable_page_dtor in the rcu page table free routine. We missed
the fact that for 32 bit platforms we did call the 'dtor' early. Drop the extra
call for pgtable_page_dtor. We remove the call from __pte_free_tlb so that we
do the page table free and 'dtor' call together. This should help when we
switch these platforms to pte fragments.

Fixes: 667416f385 ("powerpc/mm: Fix kernel crash on page table free")
Reported-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-26 23:43:14 +10:00
Breno Leitao
b2f82565f2 powerpc: Wire up io_pgetevents
Wire up io_pgetevents system call on powerpc.

io_pgetevents is a new syscall to read asynchronous I/O events from the
completion queue.

Tested with libaio branch aio-poll[1] and the io_pgetevents test (#22) passed
on both ppc64 LE and BE modes.

[1] https://pagure.io/libaio/branch/aio-poll

CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-23 21:43:21 +10:00
Aneesh Kumar K.V
fadd03c615 powerpc/mm/hash/4k: Free hugetlb page table caches correctly.
With 4k page size for hugetlb we allocate hugepage directories from its on slab
cache. With patch 0c4d26802 ("powerpc/book3s64/mm: Simplify the rcu callback for page table free")
we missed to free these allocated hugepd tables.

Update pgtable_free to handle hugetlb hugepd directory table.

Fixes: 0c4d268029 ("powerpc/book3s64/mm: Simplify the rcu callback for page table free")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Add CONFIG_HUGETLB_PAGE guard to fix build break]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-20 09:13:25 +10:00
Michael Ellerman
e08ecba17b powerpc/64s: Fix build failures with CONFIG_NMI_IPI=n
I broke the build when CONFIG_NMI_IPI=n with my recent commit to add
arch_trigger_cpumask_backtrace(), eg:

  stacktrace.c:(.text+0x1b0): undefined reference to `.smp_send_safe_nmi_ipi'

We should rework the CONFIG symbols here in future to avoid these
double barrelled ifdefs but for now they fix the build.

Fixes: 5cc05910f2 ("powerpc/64s: Wire up arch_trigger_cpumask_backtrace()")
Reported-by: Christophe LEROY <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-19 23:03:50 +10:00
Paolo Bonzini
09027ab73b Merge tag 'kvm-ppc-next-4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD 2018-06-14 17:42:54 +02:00
Linus Torvalds
be779f03d5 Merge tag 'kbuild-v4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:

 - fix some bugs introduced by the recent Kconfig syntax extension

 - add some symbols about compiler information in Kconfig, such as
   CC_IS_GCC, CC_IS_CLANG, GCC_VERSION, etc.

 - test compiler capability for the stack protector in Kconfig, and
   clean-up Makefile

 - test compiler capability for GCC-plugins in Kconfig, and clean-up
   Makefile

 - allow to enable GCC-plugins for COMPILE_TEST

 - test compiler capability for KCOV in Kconfig and correct dependency

 - remove auto-detect mode of the GCOV format, which is now more nicely
   handled in Kconfig

 - test compiler capability for mprofile-kernel on PowerPC, and clean-up
   Makefile

 - misc cleanups

* tag 'kbuild-v4.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  linux/linkage.h: replace VMLINUX_SYMBOL_STR() with __stringify()
  kconfig: fix localmodconfig
  sh: remove no-op macro VMLINUX_SYMBOL()
  powerpc/kbuild: move -mprofile-kernel check to Kconfig
  Documentation: kconfig: add recommended way to describe compiler support
  gcc-plugins: disable GCC_PLUGIN_STRUCTLEAK_BYREF_ALL for COMPILE_TEST
  gcc-plugins: allow to enable GCC_PLUGINS for COMPILE_TEST
  gcc-plugins: test plugin support in Kconfig and clean up Makefile
  gcc-plugins: move GCC version check for PowerPC to Kconfig
  kcov: test compiler capability in Kconfig and correct dependency
  gcov: remove CONFIG_GCOV_FORMAT_AUTODETECT
  arm64: move GCC version check for ARCH_SUPPORTS_INT128 to Kconfig
  kconfig: add CC_IS_CLANG and CLANG_VERSION
  kconfig: add CC_IS_GCC and GCC_VERSION
  stack-protector: test compiler capability in Kconfig and drop AUTO mode
  kbuild: fix endless syncconfig in case arch Makefile sets CROSS_COMPILE
2018-06-13 08:40:34 -07:00
Nicholas Piggin
abba759796 powerpc/kbuild: move -mprofile-kernel check to Kconfig
This eliminates the workaround that requires disabling
-mprofile-kernel by default in Kconfig.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-06-11 09:16:29 +09:00
Linus Torvalds
d82991a868 Merge branch 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull restartable sequence support from Thomas Gleixner:
 "The restartable sequences syscall (finally):

  After a lot of back and forth discussion and massive delays caused by
  the speculative distraction of maintainers, the core set of
  restartable sequences has finally reached a consensus.

  It comes with the basic non disputed core implementation along with
  support for arm, powerpc and x86 and a full set of selftests

  It was exposed to linux-next earlier this week, so it does not fully
  comply with the merge window requirements, but there is really no
  point to drag it out for yet another cycle"

* 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rseq/selftests: Provide Makefile, scripts, gitignore
  rseq/selftests: Provide parametrized tests
  rseq/selftests: Provide basic percpu ops test
  rseq/selftests: Provide basic test
  rseq/selftests: Provide rseq library
  selftests/lib.mk: Introduce OVERRIDE_TARGETS
  powerpc: Wire up restartable sequences system call
  powerpc: Add syscall detection for restartable sequences
  powerpc: Add support for restartable sequences
  x86: Wire up restartable sequence system call
  x86: Add support for restartable sequences
  arm: Wire up restartable sequences system call
  arm: Add syscall detection for restartable sequences
  arm: Add restartable sequences support
  rseq: Introduce restartable sequences system call
  uapi/headers: Provide types_32_64.h
2018-06-10 10:17:09 -07:00
Laurent Dufour
3010a5ea66 mm: introduce ARCH_HAS_PTE_SPECIAL
Currently the PTE special supports is turned on in per architecture
header files.  Most of the time, it is defined in
arch/*/include/asm/pgtable.h depending or not on some other per
architecture static definition.

This patch introduce a new configuration variable to manage this
directly in the Kconfig files.  It would later replace
__HAVE_ARCH_PTE_SPECIAL.

Here notes for some architecture where the definition of
__HAVE_ARCH_PTE_SPECIAL is not obvious:

arm
 __HAVE_ARCH_PTE_SPECIAL which is currently defined in
arch/arm/include/asm/pgtable-3level.h which is included by
arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set.
So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE.

powerpc
__HAVE_ARCH_PTE_SPECIAL is defined in 2 files:
 - arch/powerpc/include/asm/book3s/64/pgtable.h
 - arch/powerpc/include/asm/pte-common.h
The first one is included if (PPC_BOOK3S & PPC64) while the second is
included in all the other cases.
So select ARCH_HAS_PTE_SPECIAL all the time.

sparc:
__HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) &&
defined(__arch64__) which are defined through the compiler in
sparc/Makefile if !SPARC32 which I assume to be if SPARC64.
So select ARCH_HAS_PTE_SPECIAL if SPARC64

There is no functional change introduced by this patch.

Link: http://lkml.kernel.org/r/1523433816-14460-2-git-send-email-ldufour@linux.vnet.ibm.com
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Suggested-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <albert@sifive.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Christophe LEROY <christophe.leroy@c-s.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:35 -07:00
Linus Torvalds
c90fca951e Merge tag 'powerpc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
 "Notable changes:

   - Support for split PMD page table lock on 64-bit Book3S (Power8/9).

   - Add support for HAVE_RELIABLE_STACKTRACE, so we properly support
     live patching again.

   - Add support for patching barrier_nospec in copy_from_user() and
     syscall entry.

   - A couple of fixes for our data breakpoints on Book3S.

   - A series from Nick optimising TLB/mm handling with the Radix MMU.

   - Numerous small cleanups to squash sparse/gcc warnings from Mathieu
     Malaterre.

   - Several series optimising various parts of the 32-bit code from
     Christophe Leroy.

   - Removal of support for two old machines, "SBC834xE" and "C2K"
     ("GEFanuc,C2K"), which is why the diffstat has so many deletions.

  And many other small improvements & fixes.

  There's a few out-of-area changes. Some minor ftrace changes OK'ed by
  Steve, and a fix to our powernv cpuidle driver. Then there's a series
  touching mm, x86 and fs/proc/task_mmu.c, which cleans up some details
  around pkey support. It was ack'ed/reviewed by Ingo & Dave and has
  been in next for several weeks.

  Thanks to: Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Al
  Viro, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Arnd
  Bergmann, Balbir Singh, Cédric Le Goater, Christophe Leroy, Christophe
  Lombard, Colin Ian King, Dave Hansen, Fabio Estevam, Finn Thain,
  Frederic Barrat, Gautham R. Shenoy, Haren Myneni, Hari Bathini, Ingo
  Molnar, Jonathan Neuschäfer, Josh Poimboeuf, Kamalesh Babulal,
  Madhavan Srinivasan, Mahesh Salgaonkar, Mark Greer, Mathieu Malaterre,
  Matthew Wilcox, Michael Neuling, Michal Suchanek, Naveen N. Rao,
  Nicholas Piggin, Nicolai Stange, Olof Johansson, Paul Gortmaker, Paul
  Mackerras, Peter Rosin, Pridhiviraj Paidipeddi, Ram Pai, Rashmica
  Gupta, Ravi Bangoria, Russell Currey, Sam Bobroff, Samuel
  Mendoza-Jonas, Segher Boessenkool, Shilpasri G Bhat, Simon Guo,
  Souptick Joarder, Stewart Smith, Thiago Jung Bauermann, Torsten Duwe,
  Vaibhav Jain, Wei Yongjun, Wolfram Sang, Yisheng Xie, YueHaibing"

* tag 'powerpc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (251 commits)
  powerpc/64s/radix: Fix missing ptesync in flush_cache_vmap
  cpuidle: powernv: Fix promotion from snooze if next state disabled
  powerpc: fix build failure by disabling attribute-alias warning in pci_32
  ocxl: Fix missing unlock on error in afu_ioctl_enable_p9_wait()
  powerpc-opal: fix spelling mistake "Uniterrupted" -> "Uninterrupted"
  powerpc: fix spelling mistake: "Usupported" -> "Unsupported"
  powerpc/pkeys: Detach execute_only key on !PROT_EXEC
  powerpc/powernv: copy/paste - Mask SO bit in CR
  powerpc: Remove core support for Marvell mv64x60 hostbridges
  powerpc/boot: Remove core support for Marvell mv64x60 hostbridges
  powerpc/boot: Remove support for Marvell mv64x60 i2c controller
  powerpc/boot: Remove support for Marvell MPSC serial controller
  powerpc/embedded6xx: Remove C2K board support
  powerpc/lib: optimise PPC32 memcmp
  powerpc/lib: optimise 32 bits __clear_user()
  powerpc/time: inline arch_vtime_task_switch()
  powerpc/Makefile: set -mcpu=860 flag for the 8xx
  powerpc: Implement csum_ipv6_magic in assembly
  powerpc/32: Optimise __csum_partial()
  powerpc/lib: Adjust .balign inside string functions for PPC32
  ...
2018-06-07 10:23:33 -07:00
Boqun Feng
bb862b021d powerpc: Wire up restartable sequences system call
Wire up the rseq system call on powerpc.

This provides an ABI improving the speed of a user-space getcpu
operation on powerpc by skipping the getcpu system call on the fast
path, as well as improving the speed of user-space operations on per-cpu
data compared to using load-reservation/store-conditional atomics.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Chris Lameter <cl@linux.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrew Hunter <ahh@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: linux-api@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lkml.kernel.org/r/20180602124408.8430-11-mathieu.desnoyers@efficios.com
2018-06-06 11:58:34 +02:00