Commit Graph

4707 Commits

Author SHA1 Message Date
Christophe Leroy
709cf19c57 powerpc/8xx: Use patch_site for perf counters setup
The 8xx TLB miss routines are patched when (de)activating
perf counters.

This patch uses the new patch_site functionality in order
to get a better code readability and avoid a label mess when
dumping the code with 'objdump -d'

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-26 21:58:58 +11:00
Christophe Leroy
1a210878bf powerpc/8xx: Use patch_site for memory setup patching
The 8xx TLB miss routines are patched at startup at several places.

This patch uses the new patch_site functionality in order
to get a better code readability and avoid a label mess when
dumping the code with 'objdump -d'

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-26 21:58:58 +11:00
Christophe Leroy
082e2869fc powerpc/code-patching: Add a helper to get the address of a patch_site
This patch adds a helper to get the address of a patch_site.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Call it "patch site" addr]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-26 21:58:58 +11:00
Christophe Leroy
cc4ebf5c0a Revert "powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP"
This reverts commit 4f94b2c746.

That commit was buggy, as it used rlwinm instead of rlwimi.
Instead of fixing that bug, we revert the previous commit in order to
reduce the dependency between L1 entries and L2 entries

Fixes: 4f94b2c746 ("powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-26 21:58:58 +11:00
Christophe Leroy
0f99153def powerpc/msi: Fix compile error on mpc83xx
mpic_get_primary_version() is not defined when not using MPIC.
The compile error log like:

arch/powerpc/sysdev/built-in.o: In function `fsl_of_msi_probe':
fsl_msi.c:(.text+0x150c): undefined reference to `fsl_mpic_primary_get_version'

Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Reported-by: Radu Rendec <radu.rendec@gmail.com>
Fixes: 807d38b73b ("powerpc/mpic: Add get_version API both for internal and external use")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-21 19:32:07 +11:00
Christophe Leroy
abcff86df2 powerpc/time: Only set CONFIG_ARCH_HAS_SCALED_CPUTIME on PPC64
scaled cputime is only meaningfull when the processor has
SPURR and/or PURR, which means only on PPC64.

Removing it on PPC32 significantly reduces the size of
vtime_account_system() and vtime_account_idle() on an 8xx:

Before:
00000000 l     F .text	000000a8 vtime_delta
00000280 g     F .text	0000010c vtime_account_system
0000038c g     F .text	00000048 vtime_account_idle

After:
(vtime_delta gets inlined inside the two functions)
000001d8 g     F .text	000000a0 vtime_account_system
00000278 g     F .text	00000038 vtime_account_idle

In terms of performance, we also get approximatly 7% improvement on
task switch. The following small benchmark app is run with perf stat:

void *thread(void *arg)
{
	int i;

	for (i = 0; i < atoi((char*)arg); i++)
		pthread_yield();
}

int main(int argc, char **argv)
{
	pthread_t th1, th2;

	pthread_create(&th1, NULL, thread, argv[1]);
	pthread_create(&th2, NULL, thread, argv[1]);
	pthread_join(th1, NULL);
	pthread_join(th2, NULL);

	return 0;
}

Before the patch:

 Performance counter stats for 'chrt -f 98 ./sched 100000' (50 runs):

       8228.476465      task-clock (msec)         #    0.954 CPUs utilized            ( +-  0.23% )
            200004      context-switches          #    0.024 M/sec                    ( +-  0.00% )

After the patch:

 Performance counter stats for 'chrt -f 98 ./sched 100000' (50 runs):

       7649.070444      task-clock (msec)         #    0.955 CPUs utilized            ( +-  0.27% )
            200004      context-switches          #    0.026 M/sec                    ( +-  0.00% )

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Christophe Leroy
fb978ca207 powerpc/kgdb: add kgdb_arch_set/remove_breakpoint()
Generic implementation fails to remove breakpoints after init
when CONFIG_STRICT_KERNEL_RWX is selected:

[   13.251285] KGDB: BP remove failed: c001c338
[   13.259587] kgdbts: ERROR PUT: end of test buffer on 'do_fork_test' line 8 expected OK got $E14#aa
[   13.268969] KGDB: re-enter exception: ALL breakpoints killed
[   13.275099] CPU: 0 PID: 1 Comm: init Not tainted 4.18.0-g82bbb913ffd8 
[   13.282836] Call Trace:
[   13.285313] [c60e1ba0] [c0080ef0] kgdb_handle_exception+0x6f4/0x720 (unreliable)
[   13.292618] [c60e1c30] [c000e97c] kgdb_handle_breakpoint+0x3c/0x98
[   13.298709] [c60e1c40] [c000af54] program_check_exception+0x104/0x700
[   13.305083] [c60e1c60] [c000e45c] ret_from_except_full+0x0/0x4
[   13.310845] [c60e1d20] [c02a22ac] run_simple_test+0x2b4/0x2d4
[   13.316532] [c60e1d30] [c0081698] put_packet+0xb8/0x158
[   13.321694] [c60e1d60] [c00820b4] gdb_serial_stub+0x230/0xc4c
[   13.327374] [c60e1dc0] [c0080af8] kgdb_handle_exception+0x2fc/0x720
[   13.333573] [c60e1e50] [c000e928] kgdb_singlestep+0xb4/0xcc
[   13.339068] [c60e1e70] [c000ae1c] single_step_exception+0x90/0xac
[   13.345100] [c60e1e80] [c000e45c] ret_from_except_full+0x0/0x4
[   13.350865] [c60e1f40] [c000e11c] ret_from_syscall+0x0/0x38
[   13.356346] Kernel panic - not syncing: Recursive entry to debugger

This patch creates powerpc specific version of
kgdb_arch_set_breakpoint() and kgdb_arch_remove_breakpoint()
using patch_instruction()

Fixes: 1e0fc9d1eb ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Christophe Leroy
8114c36ea6 powerpc/mm: Trace tlbia instruction
Add a trace point for tlbia (Translation Lookaside Buffer Invalidate
All) instruction.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Naveen N. Rao
7cd01b08d3 powerpc: Add support for function error injection
We implement regs_set_return_value() and override_function_with_return()
for this purpose.

On powerpc, a return from a function (blr) just branches to the location
contained in the link register. So, we can just update pt_regs rather
than redirecting execution to a dummy function that returns.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:43 +11:00
Michael Ellerman
6ce7bff045 powerpc/aout: Fix struct user definition to use user_pt_regs
I'm pretty sure this is dead code, it's only used by the a.out core
dump code, and we don't support a.out. We should remove it.

But while it's in the tree it should be using the ABI version of
pt_regs which is called user_pt_regs in the kernel, because the whole
struct is written to the core dump and so its size shouldn't change.

Note this isn't a uapi header so we don't need an ifdef.

Fixes: 002af9391b ("powerpc: Split user/kernel definitions of struct pt_regs")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19 15:09:04 +11:00
Michael Ellerman
22a3d03d69 powerpc/uapi: Fix sigcontext definition to use user_pt_regs
My recent patch to split pt_regs between user and kernel missed
the usage in struct sigcontext.

Because this is a user visible struct it should be using the user
visible definition, which when we're building for the kernel is called
struct user_pt_regs.

As far as I can see this hasn't actually caused a bug (yet), because
we don't use the sizeof() the sigcontext->regs anywhere. But we should
still fix it to avoid confusion and future bugs.

Fixes: 002af9391b ("powerpc: Split user/kernel definitions of struct pt_regs")
Reported-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19 15:09:04 +11:00
Christophe Leroy
a0e102914a powerpc/io: remove old GCC version implementation
GCC 4.6 is the minimum supported now.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19 00:56:17 +11:00
Oliver O'Halloran
4c5d87db49 powerpc/pseries: PAPR persistent memory support
This patch implements support for discovering storage class memory
devices at boot and for handling hotplug of new regions via RTAS
hotplug events.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Fix CONFIG_MEMORY_HOTPLUG=n build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19 00:56:17 +11:00
Christophe Leroy
bde1a1335c powerpc/book3e: redefine pte_mkprivileged() and pte_mkuser()
Book3e defines both _PAGE_USER and _PAGE_PRIVILEGED, so the nohash
default pte_mkprivileged() and pte_mkuser() are not usable.

This patch redefines them for book3e.

In theorie, only pte_mkprivileged() needs to be redefined because
_PAGE_USER includes _PAGE_PRIVILEGED, but it is less confusing
to redefine both.

Fixes: a0da4bc166 ("powerpc/mm: Allow platforms to redefine some helpers")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19 00:56:17 +11:00
Aneesh Kumar K.V
b9fb4480a3 powerpc/mm: Make pte_pgprot return all pte bits
Other archs do the same and instead of adding required pte bits (which
got masked out) in __ioremap_at(), make sure we filter only pfn bits
out.

Fixes: 26973fa5ac ("powerpc/mm: use pte helpers in generic code")
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-19 00:56:17 +11:00
Aneesh Kumar K.V
4ffe713b75 powerpc/mm: Increase the max addressable memory to 2PB
Currently we limit the max addressable memory to 128TB. This patch increase the
limit to 2PB. We can have devices like nvdimm which adds memory above 512TB
limit.

We still don't support regular system ram above 512TB. One of the challenge with
that is the percpu allocator, that allocates per node memory and use the max
distance between them as the percpu offsets. This means with large gap in
address space ( system ram above 1PB) we will run out of vmalloc space to map
the percpu allocation.

In order to support addressable memory above 512TB, kernel should be able to
linear map this range. To do that with hash translation we now add 4 context
to kernel linear map region. Our per context addressable range is 512TB. We
still keep VMALLOC and VMEMMAP region to old size. SLB miss handlers is updated
to validate these limit.

We also limit this update to SPARSEMEM_VMEMMAP and SPARSEMEM_EXTREME

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Aneesh Kumar K.V
c9f80734cd powerpc/mm/hash: Rename get_ea_context to get_user_context
We will be adding get_kernel_context later. Update function name to indicate
this handle context allocation user space address.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Nicholas Piggin
e15a4fea4d powerpc/64s/hash: Add some SLB debugging tests
This adds CONFIG_DEBUG_VM checks to ensure:
  - The kernel stack is in the SLB after it's flushed and bolted.
  - We don't insert an SLB for an address that is aleady in the SLB.
  - The kernel SLB miss handler does not take an SLB miss.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Nicholas Piggin
94ee42727c powerpc/64s/hash: Simplify slb_flush_and_rebolt()
slb_flush_and_rebolt() is misleading, it is called in virtual mode, so
it can not possibly change the stack, so it should not be touching the
shadow area. And since vmalloc is no longer bolted, it should not
change any bolted mappings at all.

Change the name to slb_flush_and_restore_bolted(), and have it just
load the kernel stack from what's currently in the shadow SLB area.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Nicholas Piggin
5434ae7462 powerpc/64s/hash: Add a SLB preload cache
When switching processes, currently all user SLBEs are cleared, and a
few (exec_base, pc, and stack) are preloaded. In trivial testing with
small apps, this tends to miss the heap and low 256MB segments, and it
will also miss commonly accessed segments on large memory workloads.

Add a simple round-robin preload cache that just inserts the last SLB
miss into the head of the cache and preloads those at context switch
time. Every 256 context switches, the oldest entry is removed from the
cache to shrink the cache and require fewer slbmte if they are unused.

Much more could go into this, including into the SLB entry reclaim
side to track some LRU information etc, which would require a study of
large memory workloads. But this is a simple thing we can do now that
is an obvious win for common workloads.

With the full series, process switching speed on the context_switch
benchmark on POWER9/hash (with kernel speculation security masures
disabled) increases from 140K/s to 178K/s (27%).

POWER8 does not change much (within 1%), it's unclear why it does not
see a big gain like POWER9.

Booting to busybox init with 256MB segments has SLB misses go down
from 945 to 69, and with 1T segments 900 to 21. These could almost all
be eliminated by preloading a bit more carefully with ELF binary
loading.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Nicholas Piggin
425d331462 powerpc/64s/hash: Provide arch_setup_exec() hooks for hash slice setup
This will be used by the SLB code in the next patch, but for now this
sets the slb_addr_limit to the correct size for 32-bit tasks.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Nicholas Piggin
126b11b294 powerpc/64s/hash: Add SLB allocation status bitmaps
Add 32-entry bitmaps to track the allocation status of the first 32
SLB entries, and whether they are user or kernel entries. These are
used to allocate free SLB entries first, before resorting to the round
robin allocator.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Nicholas Piggin
48e7b76957 powerpc/64s/hash: Convert SLB miss handlers to C
This patch moves SLB miss handlers completely to C, using the standard
exception handler macros to set up the stack and branch to C.

This can be done because the segment containing the kernel stack is
always bolted, so accessing it with relocation on will not cause an
SLB exception.

Arbitrary kernel memory must not be accessed when handling kernel
space SLB misses, so care should be taken there. However user SLB
misses can access any kernel memory, which can be used to move some
fields out of the paca (in later patches).

User SLB misses could quite easily reconcile IRQs and set up a first
class kernel environment and exit via ret_from_except, however that
doesn't seem to be necessary at the moment, so we only do that if a
bad fault is encountered.

[ Credit to Aneesh for bug fixes, error checks, and improvements to
  bad address handling, etc ]

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Disallow tracing for all of slb.c for now.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Nicholas Piggin
4c2de74cc8 powerpc/64: Interrupts save PPR on stack rather than thread_struct
PPR is the odd register out when it comes to interrupt handling, it is
saved in current->thread.ppr while all others are saved on the stack.

The difficulty with this is that accessing thread.ppr can cause a SLB
fault, but the SLB fault handler implementation in C change had
assumed the normal exception entry handlers would not cause an SLB
fault.

Fix this by allocating room in the interrupt stack to save PPR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Michael Ellerman
002af9391b powerpc: Split user/kernel definitions of struct pt_regs
We use a shared definition for struct pt_regs in uapi/asm/ptrace.h.
That means the layout of the structure is ABI, ie. we can't change it.

That would be fine if it was only used to describe the user-visible
register state of a process, but it's also the struct we use in the
kernel to describe the registers saved in an interrupt frame.

We'd like more flexibility in the content (and possibly layout) of the
kernel version of the struct, but currently that's not possible.

So split the definition into a user-visible definition which remains
unchanged, and a kernel internal one.

At the moment they're still identical, and we check that at build
time. That's because we have code (in ptrace etc.) that assumes that
they are the same. We will fix that code in future patches, and then
we can break the strict symmetry between the two structs.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
1b2443a547 powerpc/book3s64: Avoid multiple endian conversion in pte helpers
In the same spirit as already done in pte query helpers,
this patch changes pte setting helpers to perform endian
conversions on the constants rather than on the pte value.

In the meantime, it changes pte_access_permitted() to use
pte helpers for the same reason.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
ff00552578 powerpc/8xx: change name of a few page flags to avoid confusion
_PAGE_PRIVILEGED corresponds to the SH bit which doesn't protect
against user access but only disables ASID verification on kernel
accesses. User access is controlled with _PMD_USER flag.

Name it _PAGE_SH instead of _PAGE_PRIVILEGED

_PAGE_HUGE corresponds to the SPS bit which doesn't really tells
that's it is a huge page but only that it is not a 4k page.

Name it _PAGE_SPS instead of _PAGE_HUGE

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
5662315384 powerpc/mm: Get rid of pte-common.h
Do not include pte-common.h in nohash/32/pgtable.h

As that was the last includer, get rid of pte-common.h

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
cbcbbf4afd powerpc/mm: Define platform default caches related flags
Cache related flags like _PAGE_COHERENT and _PAGE_WRITETHRU
are defined on most platforms. The platforms not defining
them don't define any alternative. So we can give them a NUL
value directly for those platforms directly.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
a0da4bc166 powerpc/mm: Allow platforms to redefine some helpers
The 40xx defines _PAGE_HWWRITE while others don't.
The 8xx defines _PAGE_RO instead of _PAGE_RW.
The 8xx defines _PAGE_PRIVILEGED instead of _PAGE_USER.
The 8xx defines _PAGE_HUGE and _PAGE_NA while others don't.

Lets those platforms redefine pte_write(), pte_wrprotect() and
pte_mkwrite() and get _PAGE_RO and _PAGE_HWWRITE off the common
helpers.

Lets the 8xx redefine pte_user(), pte_mkprivileged() and pte_mkuser()
and get rid of _PAGE_PRIVILEGED and _PAGE_USER default values.

Lets the 8xx redefine pte_mkhuge() and get rid of
_PAGE_HUGE default value.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
6c5d2d3fd3 powerpc/nohash/64: do not include pte-common.h
nohash/64 only uses book3e PTE flags, so it doesn't need pte-common.h

This also allows to drop PAGE_SAO and H_PAGE_4K_PFN from pte_common.h
as they are only used by PPC64

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
d82fd29c5a powerpc/mm: Distribute platform specific PAGE and PMD flags and definitions
The base kernel PAGE_XXXX definition sets are more or less platform
specific. Lets distribute them close to platform _PAGE_XXX flags
definition, and customise them to their exact platform flags.

Also defines _PAGE_PSIZE and _PTE_NONE_MASK for each platform
allthough they are defined as 0.

Do the same with _PMD flags like _PMD_USER and _PMD_PRESENT_MASK

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
e0f57031ca powerpc/mm: Move pte_user() into nohash/pgtable.h
Now the pte-common.h is only for nohash platforms, lets
move pte_user() helper out of pte-common.h to put it
together with other helpers.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
b2133bd7a5 powerpc/book3s/32: do not include pte-common.h
As done for book3s/64, add necessary flags/defines in
book3s/32/pgtable.h and do not include pte-common.h

It allows in the meantime to remove all related hash
definitions from pte-common.h and to also remove
_PAGE_EXEC default as _PAGE_EXEC is defined on all
platforms except book3s/32.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
f4805785f0 powerpc/mm: move __P and __S tables in the common pgtable.h
__P and __S flags are the same for all platform and should remain
as is in the future, so avoid duplication.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
093d7ca229 powerpc/mm: drop unused page flags
The following page flags in pte-common.h can be dropped:

_PAGE_ENDIAN is only used in mm/fsl_booke_mmu.c and is defined in
asm/nohash/32/pte-fsl-booke.h

_PAGE_4K_PFN is nowhere defined nor used

_PAGE_READ, _PAGE_WRITE and _PAGE_PTE are only defined and used
in book3s/64

The following page flags in book3s/64/pgtable.h can be dropped as
they are not used on this platform nor by common code.

_PAGE_NA, _PAGE_RO, _PAGE_USER and _PAGE_PSIZE

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
26973fa5ac powerpc/mm: use pte helpers in generic code
Get rid of platform specific _PAGE_XXXX in powerpc common code and
use helpers instead.

mm/dump_linuxpagetables.c will be handled separately

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
daba790242 powerpc/mm: add pte helpers to query and change pte flags
In order to avoid using generic _PAGE_XXX flags in powerpc
core functions, define helpers for all needed flags:
- pte_mkuser() and pte_mkprivileged() to set/unset and/or
unset/set _PAGE_USER and/or _PAGE_PRIVILEGED
- pte_hashpte() to check if _PAGE_HASHPTE is set.
- pte_ci() check if cache is inhibited (already existing on book3s/64)
- pte_exprotect() to protect against execution
- pte_exec() and pte_mkexec() to query and set page execution
- pte_mkpte() to set _PAGE_PTE flag.
- pte_hw_valid() to check _PAGE_PRESENT since pte_present does
something different on book3s/64.

On book3s/32 there is no exec protection, so pte_mkexec() and
pte_exprotect() are nops and pte_exec() returns always true.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
aa9cd505e3 powerpc/mm: move some nohash pte helpers in nohash/[32:64]/pgtable.h
In order to allow their use in nohash/32/pgtable.h, we have to move the
following helpers in nohash/[32:64]/pgtable.h:
- pte_mkwrite()
- pte_mkdirty()
- pte_mkyoung()
- pte_wrprotect()

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
d81e6f8b7c powerpc/mm: don't use _PAGE_EXEC in book3s/32
book3s/32 doesn't define _PAGE_EXEC, so no need to use it.

All other platforms define _PAGE_EXEC so no need to check
it is not NUL when not book3s/32.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
c766ee7223 powerpc: handover page flags with a pgprot_t parameter
In order to avoid multiple conversions, handover directly a
pgprot_t to map_kernel_page() as already done for radix.

Do the same for __ioremap_caller() and __ioremap_at().

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
56f3c1413f powerpc/mm: properly set PAGE_KERNEL flags in ioremap()
Set PAGE_KERNEL directly in the caller and do not rely on a
hack adding PAGE_KERNEL flags when _PAGE_PRESENT is not set.

As already done for PPC64, use pgprot_cache() helpers instead of
_PAGE_XXX flags in PPC32 ioremap() derived functions.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
86c391bd5f powerpc/32: Add ioremap_wt() and ioremap_coherent()
Other arches have ioremap_wt() to map IO areas write-through.
Implement it on PPC as well in order to avoid drivers using
__ioremap(_PAGE_WRITETHRU)

Also implement ioremap_coherent() to avoid drivers using
__ioremap(_PAGE_COHERENT)

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Gautham R. Shenoy
425752c63b powerpc: Detect the presence of big-cores via "ibm, thread-groups"
On IBM POWER9, the device tree exposes a property array identifed by
"ibm,thread-groups" which will indicate which groups of threads share
a particular set of resources.

As of today we only have one form of grouping identifying the group of
threads in the core that share the L1 cache, translation cache and
instruction data flow.

This patch adds helper functions to parse the contents of
"ibm,thread-groups" and populate a per-cpu variable to cache
information about siblings of each CPU that share the L1, traslation
cache and instruction data-flow.

It also defines a new global variable named "has_big_cores" which
indicates if the cores on this configuration have multiple groups of
threads that share L1 cache.

For each online CPU, it maintains a cpu_smallcore_mask, which
indicates the online siblings which share the L1-cache with it.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Sam Bobroff
fef7f90552 powerpc/eeh: Cleanup eeh_ops.wait_state()
The wait_state member of eeh_ops does not need to be platform
dependent; it's just logic around eeh_ops.get_state(). Therefore,
merge the two (slightly different!) platform versions into a new
function, eeh_wait_state() and remove the eeh_ops member.

While doing this, also correct:
* The wait logic, so that it never waits longer than max_wait.
* The wait logic, so that it never waits less than
  EEH_STATE_MIN_WAIT_TIME.
* One call site where the result is treated like a bit field before
  it's checked for negative error values.
* In pseries_eeh_get_state(), rename the "state" parameter to "delay"
  because that's what it is.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Sam Bobroff
e762bb891a powerpc/eeh: Cleanup eeh_pe_state_mark()
Currently, eeh_pe_state_mark() marks a PE (and it's children) with a
state and then performs additional processing if that state included
EEH_PE_ISOLATED.

The state parameter is always a constant at the call site, so
rearrange eeh_pe_state_mark() into two functions and just call the
appropriate one at each site.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Sam Bobroff
54644927a0 powerpc/eeh: Cleanup eeh_enabled()
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Sam Bobroff
80e65b0094 powerpc/eeh: Cleanup list_head field names
Instances of struct eeh_pe are placed in a tree structure using the
fields "child_list" and "child", so place these next to each other
in the definition.

The field "child" is a list entry, so remove the unnecessary and
misleading use of the list initializer, LIST_HEAD(), on it.

The eeh_dev struct contains two list entry fields, called "list" and
"rmv_list". Rename them to "entry" and "rmv_entry" and, as above, stop
initializing them with LIST_HEAD().

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Sam Bobroff
b95a46062b powerpc/eeh: Cleanup unused field in eeh_dev
The 'bus' member of struct eeh_dev is assigned to once but never used,
so remove it.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00
Sam Bobroff
bffc0176e7 powerpc/eeh: Cleanup EEH_POSTPONED_PROBE
Currently a flag, EEH_POSTPONED_PROBE, is used to prevent an incorrect
message "EEH: No capable adapters found" from being displayed during
the boot of powernv systems.

It is necessary because, on powernv, the call to eeh_probe_devices()
made from eeh_init() is too early and EEH can't yet be enabled. A
second call is made later from eeh_pnv_post_init(), which succeeds.

(On pseries, the first call succeeds because PCI devices are set up
early enough and no second call is made.)

This can be simplified by moving the early call to eeh_probe_devices()
from eeh_init() (where it's seen by both platforms) to
pSeries_final_fixup(), so that each platform only calls
eeh_probe_devices() once, at a point where it can succeed.
This is slightly later in the boot sequence, but but still early
enough and it is now in the same place in the sequence for both
platforms (the pcibios_fixup hook).

The display of the message can be cleaned up as well, by moving it
into eeh_probe_devices().

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13 22:21:25 +11:00