Today, on the 8xx the TLB handlers do SW tablewalk by doing all
the calculation in ASM, in order to match with the Linux page
table structure.
The 8xx offers hardware assistance which allows significant size
reduction of the TLB handlers, hence also reduces the time spent
in the handlers.
However, using this HW assistance implies some constraints on the
page table structure:
- Regardless of the main page size used (4k or 16k), the
level 1 table (PGD) contains 1024 entries and each PGD entry covers
a 4Mbytes area which is managed by a level 2 table (PTE) containing
also 1024 entries each describing a 4k page.
- 16k pages require 4 identifical entries in the L2 table
- 512k pages PTE have to be spread every 128 bytes in the L2 table
- 8M pages PTE are at the address pointed by the L1 entry and each
8M page require 2 identical entries in the PGD.
This patch modifies the TLB handlers to use HW assistance for 4K PAGES.
Before that patch, the mean time spent in TLB miss handlers is:
- ITLB miss: 80 ticks
- DTLB miss: 62 ticks
After that patch, the mean time spent in TLB miss handlers is:
- ITLB miss: 72 ticks
- DTLB miss: 54 ticks
So the improvement is 10% for ITLB and 13% for DTLB misses
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In preparation of making use of hardware assistance in TLB handlers,
this patch temporarily disables 16K pages and hugepages. The reason
is that when using HW assistance in 4K pages mode, the linux model
fit with the HW model for 4K pages and 8M pages.
However for 16K pages and 512K mode some additional work is needed
to get linux model fit with HW model.
For the 8M pages, they will naturaly come back when we switch to
HW assistance, without any additional handling.
In order to keep the following patch smaller, the removal of the
current special handling for 8M pages gets removed here as well.
Therefore the 4K pages mode will be implemented first and without
support for 512k hugepages. Then the 512k hugepages will be brought
back. And the 16K pages will be implemented in the following step.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In order to simplify time critical exceptions handling 8xx
specific SW perf counters, this patch moves the counters into
the beginning of memory. This is possible because .text is readable
and the counters are never modified outside of the handlers.
By doing this, we avoid having to set a second register with
the upper part of the address of the counters.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
pgtable_cache_add() gracefully handles the case when a cache that
size already exists by returning early with the following test:
if (PGT_CACHE(shift))
return; /* Already have a cache of this size */
It is then not needed to test the existence of the cache before.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Instead of opencoding cache handling for the special case
of hugepage tables having a single pte_t element, this
patch makes use of the common pgtable_cache helpers
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
hugepages uses a cache of order 0. Lets allow page tables
of order 0 in the common part in order to avoid open coding
in hugetlb
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In order to allow the 8xx to handle pte_fragments, this patch
extends the use of pte_fragments to PPC32 platforms.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In order to handle pte_fragment functions with single fragment
without adding pte_frag in all mm_context_t, this patch creates
two helpers which do nothing on platforms using a single fragment.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch move pgtable_t into platform headers.
It gets rid of the CONFIG_PPC_64K_PAGES case for PPC64
as nohash/64 doesn't support CONFIG_PPC_64K_PAGES.
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The purpose of this patch is to move platform specific
mmu-xxx.h files in platform directories like pte-xxx.h files.
In the meantime this patch creates common nohash and
nohash/32 + nohash/64 mmu.h files for future common parts.
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There is no point in taking the page table lock as pte_frag or
pmd_frag are always NULL when we have only one fragment.
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In preparation of next patch which generalises the use of
pte_fragment_alloc() for all, this patch moves the related functions
in a place that is common to all subarches.
The 8xx will need that for supporting 16k pages, as in that mode
page tables still have a size of 4k.
Since pte_fragment with only once fragment is not different
from what is done in the general case, we can easily migrate all
subarchs to pte fragments.
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
commit 1bc54c0311 ("powerpc: rework 4xx PTE access and TLB miss")
introduced non atomic PTE updates and started the work of removing
PTE updates in TLB miss handlers, but kept PTE_ATOMIC_UPDATES for the
8xx with the following comment:
/* Until my rework is finished, 8xx still needs atomic PTE updates */
commit fe11dc3f96 ("powerpc/8xx: Update TLB asm so it behaves as
linux mm expects") removed all PTE updates done in TLB miss handlers
Therefore, atomic PTE updates are not needed anymore for the 8xx
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There is a plan to build the kernel with -Wimplicit-fallthrough and these
places in the code produced warnings, but because we build arch/powerpc
with -Werror, they became errors. Fix them up.
This patch produces no change in behaviour, but should be reviewed in
case these are actually bugs not intentional fallthoughs.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit f384796c40 ("powerpc/mm: Add support for handling > 512TB address
in SLB miss") removed function slb_miss_bad_addr(struct pt_regs *regs), but
kept its declaration in the prototype file. This patch simply removes the
function definition.
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When booting a pseries kernel with PREEMPT enabled, it dumps the
following warning:
BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
caller is pseries_processor_idle_init+0x5c/0x22c
CPU: 13 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc3-00090-g12201a0128bc-dirty #828
Call Trace:
[c000000429437ab0] [c0000000009c8878] dump_stack+0xec/0x164 (unreliable)
[c000000429437b00] [c0000000005f2f24] check_preemption_disabled+0x154/0x160
[c000000429437b90] [c000000000cab8e8] pseries_processor_idle_init+0x5c/0x22c
[c000000429437c10] [c000000000010ed4] do_one_initcall+0x64/0x300
[c000000429437ce0] [c000000000c54500] kernel_init_freeable+0x3f0/0x500
[c000000429437db0] [c0000000000112dc] kernel_init+0x2c/0x160
[c000000429437e20] [c00000000000c1d0] ret_from_kernel_thread+0x5c/0x6c
This happens because the code calls get_lppaca() which calls
get_paca() and it checks if preemption is disabled through
check_preemption_disabled().
Preemption should be disabled because the per CPU variable may make no
sense if there is a preemption (and a CPU switch) after it reads the
per CPU data and when it is used.
In this device driver specifically, it is not a problem, because this
code just needs to have access to one lppaca struct, and it does not
matter if it is the current per CPU lppaca struct or not (i.e. when
there is a preemption and a CPU migration).
That said, the most appropriate fix seems to be related to avoiding
the debug_smp_processor_id() call at get_paca(), instead of calling
preempt_disable() before get_paca().
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently xmon needs to get devtree_lock (through rtas_token()) during its
invocation (at crash time). If there is a crash while devtree_lock is being
held, then xmon tries to get the lock but spins forever and never get into
the interactive debugger, as in the following case:
int *ptr = NULL;
raw_spin_lock_irqsave(&devtree_lock, flags);
*ptr = 0xdeadbeef;
This patch avoids calling rtas_token(), thus trying to get the same lock,
at crash time. This new mechanism proposes getting the token at
initialization time (xmon_init()) and just consuming it at crash time.
This would allow xmon to be possible invoked independent of devtree_lock
being held or not.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
PPC_STD_MMU_32 and PPC_STD_MMU are not used anymore. Remove them.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Today we have:
config PPC_BOOK3S
def_bool y
depends on PPC_BOOK3S_32 || PPC_BOOK3S_64
config PPC_STD_MMU
def_bool y
depends on PPC_BOOK3S
PPC_STD_MMU is therefore redundant with PPC_BOOK3S. Lets remove it.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Today we have:
config PPC_BOOK3S_32
bool "512x/52xx/6xx/7xx/74xx/82xx/83xx/86xx"
[depends on PPC32 within a choice]
config PPC_BOOK3S
def_bool y
depends on PPC_BOOK3S_32 || PPC_BOOK3S_64
config PPC_STD_MMU
def_bool y
depends on PPC_BOOK3S
config PPC_STD_MMU_32
def_bool y
depends on PPC_STD_MMU && PPC32
PPC_STD_MMU_32 is therefore redundant with PPC_BOOK3S_32.
In order to make the code clearer, lets use preferably PPC_BOOK3S_32.
This will allow to remove CONFIG_PPC_STD_MMU_32 in a later patch.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
asm/book3s/32/pgtable.h is only included when CONFIG_PPC_BOOK3S_32 is set.
Whenever CONFIG_PPC_BOOK3S_32 is set, CONFIG_PPC_STD_MMU_32 is set as well.
This patch removes useless CONFIG_PPC_STD_MMU_32 #ifdefs
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
CONFIG_6xx is not used anymore. Remove it.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Today we have:
config PPC_BOOK3S_32
bool "512x/52xx/6xx/7xx/74xx/82xx/83xx/86xx"
[depends on PPC32 within a choice]
config PPC_BOOK3S
def_bool y
depends on PPC_BOOK3S_32 || PPC_BOOK3S_64
config 6xx
def_bool y
depends on PPC32 && PPC_BOOK3S
6xx is therefore redundant with PPC_BOOK3S_32.
In order to make the code clearer, lets use preferably PPC_BOOK3S_32.
This will allow to remove CONFIG_6xx in a later patch.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Today, powerpc has three CONFIG labels which means exactly the same:
- CONFIG_6xx
- CONFIG_PPC_BOOK3S_32
- CONFIG_PPC_STD_MMU_32
By consistency with PPC64, CONFIG_PPC_BOOK3S_32 is the preferred one.
Using a label with includes _PPC_ also makes it clearer that it is
linked to powerpc.
In preparation of the removal of CONFIG_6xx, this patch replaces it
by CONFIG_PPC_BOOK3S_32
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Remove directly accessing device_node.type pointer and use the
accessors instead. This will eventually allow removing the type
pointer.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Remove directly accessing device_node.type pointer and use the
accessors instead. This will eventually allow removing the type
pointer.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In preparation to remove the node name pointer from struct
device_node, convert printf users to use the %pOFn format specifier.
Convert the open coded iterating thru child nodes to
for_each_child_of_node() while we're here.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Remove directly accessing device_node.type pointer and use the
accessors instead. This will eventually allow removing the type
pointer.
Replace the open coded iterating over child nodes with
for_each_child_of_node() while we're here.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Remove directly accessing device_node.type pointer and use the
accessors instead. This will eventually allow removing the type
pointer.
In the process, the of_stdout pointer can be used instead of finding
the stdout node again.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch adds new defconfig options for powerpc KVM guest
and guest.config with additional config symbols enabled,
which is to build kernel to boot without initramfs and can be used
as place holder for guest specific additional config symbols in future.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch adds missing config symbols for ppc64_defconfig
to enable cgroups, memhotplug, numa balancing and XFS
in core kernel image.
Signed-off-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
CONFIG_NR_CPUS is not set in ppc64_defconfig, So it gets default of 32
which is quite small for modern powerpc systems. Instead set a default
of 2048 like other powerpc defconfigs.
Signed-off-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Update ppc64_defconfig with savedefconfig. No symbols are added or
removed, this is 100% movement.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In commit 539df7fcb3 ("powerpc/configs: Enable function trace by
default") we added:
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
To ppc64_defconfig, powernv_defconfig and pseries_defconfig.
But only CONFIG_FUNCTION_TRACER=y is required, CONFIG_FTRACE is
default y if DEBUG_KERNEL is enabled, which we have. And then
CONFIG_FUNCTION_GRAPH_TRACER is default y when CONFIG_FUNCTION_TRACER
is enabled.
The extra symbols were already removed from powernv_defconfig in
commit 9a018fb1e1 ("powerpc/config: powernv_defconfig updates").
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There is no need to have the 'void __iomem *cpld_base' variable static
since new value always be assigned before use it.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There is no need to have the 'intoffset' variable static since new value
always be assigned before use it.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When compiled for 64-bit, the PD_HUGE constant is a 64-bit integer.
Mark it as an unsigned long.
This squashes over a thousand sparse warnings on my minimal T4240RDB
(e6500, ppc64be) config, of the following 2 forms:
arch/powerpc/include/asm/hugetlb.h:52:49: warning: constant 0x8000000000000000 is so big it is unsigned long
arch/powerpc/include/asm/nohash/pgtable.h:269:49: warning: constant 0x8000000000000000 is so big it is unsigned long
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Type qualifier on return type is ignored. Remove warning in W=1:
arch/powerpc/include/asm/book3s/64/pgtable.h:1268:25: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers]
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When both `CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y` and `CONFIG_UBSAN=y`
are set, link step typically produce numberous warnings about orphan
section:
+ powerpc-linux-gnu-ld -EB -m elf32ppc -Bstatic --orphan-handling=warn --build-id --gc-sections -X -o .tmp_vmlinux1 -T ./arch/powerpc/kernel/vmlinux.lds --who
le-archive built-in.a --no-whole-archive --start-group lib/lib.a --end-group
powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_data393' from `init/main.o' being placed in section `.data..Lubsan_data393'.
powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_data394' from `init/main.o' being placed in section `.data..Lubsan_data394'.
...
powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_type11' from `init/main.o' being placed in section `.data..Lubsan_type11'.
powerpc-linux-gnu-ld: warning: orphan section `.data..Lubsan_type12' from `init/main.o' being placed in section `.data..Lubsan_type12'.
...
This commit remove those warnings produced at W=1.
Link: https://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg135407.html
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Current core-pkey selftest fails if the test runs without privileges to
write into the core pattern file (/proc/sys/kernel/core_pattern). This
causes the test to fail and give the impression that the subsystem being
tested is broken, when, in fact, the test is being executed without the
proper privileges. This is the current error:
test: core_pkey
tags: git_version:v4.19-3-g9e3363be9bce-dirty
Error writing to core_pattern file: Permission denied
failure: core_pkey
This patch simply skips this test if it runs without the proper privileges,
avoiding this undesired failure.
CC: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
CC: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch creates a new macro that skips a test and prints a message to
stderr. This is useful to give an idea why the tests is being skipped,
other than just skipping the test blindly.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Some ptrace selftests are passing input operands using a constraint that
can allocate any register for the operand, and using these registers on
load/store operations.
If the register allocated by the compiler happens to be zero (r0), it might
cause an invalid memory address access, since load and store operations
consider the content of 0x0 address if the base register is r0, instead of
the content of the r0 register. For example:
r1 := 0xdeadbeef
r0 := 0xdeadbeef
ld r2, 0(1) /* will load into r2 the content of r1 address */
ld r2, 0(0) /* will load into r2 the content of 0x0 */
In order to avoid this possible problem, the inline assembly constraint
should be aware that these registers will be used as a base register, thus,
r0 should not be allocated.
Other than that, this patch removes inline assembly operands that are not
used by the tests.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Function huge_ptep_set_access_flags() has the 'extern' keyword in the
function definition and also in the function declaration. This causes a
warning in 'sparse' since the 'extern' storage class should not be used
in the function definition.
arch/powerpc/mm/pgtable.c:232:12: warning: function 'huge_ptep_set_access_flags' with external linkage has definition
This patch removes the keyword from the definition part. It also removes
the extern keyword from the declaration part, since checkpatch --strict
complains about it.
Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>