Commit Graph

53 Commits

Author SHA1 Message Date
Linus Torvalds
192f0f8e9d powerpc updates for 5.3
Notable changes:
 
  - Removal of the NPU DMA code, used by the out-of-tree Nvidia driver, as well
    as some other functions only used by drivers that haven't (yet?) made it
    upstream.
 
  - A fix for a bug in our handling of hardware watchpoints (eg. perf record -e
    mem: ...) which could lead to register corruption and kernel crashes.
 
  - Enable HAVE_ARCH_HUGE_VMAP, which allows us to use large pages for vmalloc
    when using the Radix MMU.
 
  - A large but incremental rewrite of our exception handling code to use gas
    macros rather than multiple levels of nested CPP macros.
 
 And the usual small fixes, cleanups and improvements.
 
 Thanks to:
   Alastair D'Silva, Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju
   T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Cédric Le Goater,
   Christian Lamparter, Christophe Leroy, Christophe Lombard, Christoph Hellwig,
   Daniel Axtens, Denis Efremov, Enrico Weigelt, Frederic Barrat, Gautham R.
   Shenoy, Geert Uytterhoeven, Geliang Tang, Gen Zhang, Greg Kroah-Hartman, Greg
   Kurz, Gustavo Romero, Krzysztof Kozlowski, Madhavan Srinivasan, Masahiro
   Yamada, Mathieu Malaterre, Michael Neuling, Nathan Lynch, Naveen N. Rao,
   Nicholas Piggin, Nishad Kamdar, Oliver O'Halloran, Qian Cai, Ravi Bangoria,
   Sachin Sant, Sam Bobroff, Satheesh Rajendran, Segher Boessenkool, Shaokun
   Zhang, Shawn Anastasio, Stewart Smith, Suraj Jitindar Singh, Thiago Jung
   Bauermann, YueHaibing.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJdKVoLAAoJEFHr6jzI4aWA0kIP/A6shIbbE7H5W2hFrqt/PPPK
 3+VrvPKbOFF+W6hcE/RgSZmEnUo0svdNjHUd/eMfFS1vb/uRt2QDdrsHUNNwURQL
 M2mcLXFwYpnjSjb/XMgDbHpAQxjeGfTdYLonUIejN7Rk8KQUeLyKQ3SBn6kfMc46
 DnUUcPcjuRGaETUmVuZZ4e40ZWbJp8PKDrSJOuUrTPXMaK5ciNbZk5mCWXGbYl6G
 BMQAyv4ld/417rNTjBEP/T2foMJtioAt4W6mtlgdkOTdIEZnFU67nNxDBthNSu2c
 95+I+/sML4KOp1R4yhqLSLIDDbc3bg3c99hLGij0d948z3bkSZ8bwnPaUuy70C4v
 U8rvl/+N6C6H3DgSsPE/Gnkd8DnudqWY8nULc+8p3fXljGwww6/Qgt+6yCUn8BdW
 WgixkSjKgjDmzTw8trIUNEqORrTVle7cM2hIyIK2Q5T4kWzNQxrLZ/x/3wgoYjUa
 1KwIzaRo5JKZ9D3pJnJ5U+knE2/90rJIyfcp0W6ygyJsWKi2GNmq1eN3sKOw0IxH
 Tg86RENIA/rEMErNOfP45sLteMuTR7of7peCG3yumIOZqsDVYAzerpvtSgip2cvK
 aG+9HcYlBFOOOF9Dabi8GXsTBLXLfwiyjjLSpA9eXPwW8KObgiNfTZa7ujjTPvis
 4mk9oukFTFUpfhsMmI3T
 =3dBZ
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Notable changes:

   - Removal of the NPU DMA code, used by the out-of-tree Nvidia driver,
     as well as some other functions only used by drivers that haven't
     (yet?) made it upstream.

   - A fix for a bug in our handling of hardware watchpoints (eg. perf
     record -e mem: ...) which could lead to register corruption and
     kernel crashes.

   - Enable HAVE_ARCH_HUGE_VMAP, which allows us to use large pages for
     vmalloc when using the Radix MMU.

   - A large but incremental rewrite of our exception handling code to
     use gas macros rather than multiple levels of nested CPP macros.

  And the usual small fixes, cleanups and improvements.

  Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Andreas Schwab,
  Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann,
  Athira Rajeev, Cédric Le Goater, Christian Lamparter, Christophe
  Leroy, Christophe Lombard, Christoph Hellwig, Daniel Axtens, Denis
  Efremov, Enrico Weigelt, Frederic Barrat, Gautham R. Shenoy, Geert
  Uytterhoeven, Geliang Tang, Gen Zhang, Greg Kroah-Hartman, Greg Kurz,
  Gustavo Romero, Krzysztof Kozlowski, Madhavan Srinivasan, Masahiro
  Yamada, Mathieu Malaterre, Michael Neuling, Nathan Lynch, Naveen N.
  Rao, Nicholas Piggin, Nishad Kamdar, Oliver O'Halloran, Qian Cai, Ravi
  Bangoria, Sachin Sant, Sam Bobroff, Satheesh Rajendran, Segher
  Boessenkool, Shaokun Zhang, Shawn Anastasio, Stewart Smith, Suraj
  Jitindar Singh, Thiago Jung Bauermann, YueHaibing"

* tag 'powerpc-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (163 commits)
  powerpc/powernv/idle: Fix restore of SPRN_LDBAR for POWER9 stop state.
  powerpc/eeh: Handle hugepages in ioremap space
  ocxl: Update for AFU descriptor template version 1.1
  powerpc/boot: pass CONFIG options in a simpler and more robust way
  powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h
  powerpc/irq: Don't WARN continuously in arch_local_irq_restore()
  powerpc/module64: Use symbolic instructions names.
  powerpc/module32: Use symbolic instructions names.
  powerpc: Move PPC_HA() PPC_HI() and PPC_LO() to ppc-opcode.h
  powerpc/module64: Fix comment in R_PPC64_ENTRY handling
  powerpc/boot: Add lzo support for uImage
  powerpc/boot: Add lzma support for uImage
  powerpc/boot: don't force gzipped uImage
  powerpc/8xx: Add microcode patch to move SMC parameter RAM.
  powerpc/8xx: Use IO accessors in microcode programming.
  powerpc/8xx: replace #ifdefs by IS_ENABLED() in microcode.c
  powerpc/8xx: refactor programming of microcode CPM params.
  powerpc/8xx: refactor printing of microcode patch name.
  powerpc/8xx: Refactor microcode write
  powerpc/8xx: refactor writing of CPM microcode arrays
  ...
2019-07-13 16:08:36 -07:00
Aneesh Kumar K.V
d6eacedd1f powerpc/book3s: Use config independent helpers for page table walk
Even when we have HugeTLB and THP disabled, kernel linear map can still be
mapped with hugepages. This is only an issue with radix translation because hash
MMU doesn't map kernel linear range in linux page table and other kernel
map areas are not mapped using hugepage.

Add config independent helpers and put WARN_ON() when we don't expect things
to be mapped via hugepages.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-05 00:43:50 +10:00
Linus Torvalds
fa1827d773 powerpc fixes for 5.2 #4
One fix for a regression introduced by our 32-bit KASAN support, which broke
 booting on machines with "bootx" early debugging enabled.
 
 A fix for a bug which broke kexec on 32-bit, introduced by changes to the 32-bit
 STRICT_KERNEL_RWX support in v5.1.
 
 Finally two fixes going to stable for our THP split/collapse handling,
 discovered by Nick. The first fixes random crashes and/or corruption in guests
 under sufficient load.
 
 Thanks to:
   Nicholas Piggin, Christophe Leroy, Aaro Koskinen, Mathieu Malaterre.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJdBOivAAoJEFHr6jzI4aWA/T0P/1pmr4JLWzXOPQOGOwS2mmu5
 HhXBFuDflpGiWS31syOhfKhiE2qlwdIcGaclSo1wAgUnMKp+sxVEagF6DEt484r3
 DXs3eRyrGu5vQT7Q6yReuT3Kw2ZR474a5ob00WGAQosBKyJF4ZHWz16ETVWMdAMQ
 TknEEU3hOUnMWWIEvLnZOKT7eJcmzj5IYy1OtLHBjWiHHizGC8PSxdVhiRcD/O6R
 F6C7XrFb7RRj5ran6gxwMbcTvjgu922TSQPOCw93qnXYWLfvWDUXC4yCqY21oHnr
 b3zgJNgIdSoMYxE8pfOH7Y+eaJrbzgnhlS5OJNEz/4NOfGQnJXYcSF8QO6eeVoKM
 L2SkT1Ov+QMmZQjMC5e9OAe7DFHfM59RYFg11eaUqfiaObRsmwu8rqjngITV5Ede
 Ydq3W39XQkjB3aQ4qb0MnEBbVgyQ/y6/T5hoHRlvnb5byFk5Pd3jpub56sq87UGM
 M4GD3YdmT5eBqzGxApddyIiS839PZpdw7g/Ivtp2GYCtiNNZcJqaXvN7IwfcVF3c
 YJrCfhNTUJPjICIL9k9v2RQovGuGZqtM3BHc8PmyVvDxTqtEbh8B9GEg+QC58xp/
 s+UI2sEd/Vg6NVKluIJHau7aEmJlaxYIshqsIhYvPgJRN6KrFMhdfPNx7D+zMlu8
 nbM6Z1n2VFk9Cxb0vA9K
 =MXJI
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "One fix for a regression introduced by our 32-bit KASAN support, which
  broke booting on machines with "bootx" early debugging enabled.

  A fix for a bug which broke kexec on 32-bit, introduced by changes to
  the 32-bit STRICT_KERNEL_RWX support in v5.1.

  Finally two fixes going to stable for our THP split/collapse handling,
  discovered by Nick. The first fixes random crashes and/or corruption
  in guests under sufficient load.

  Thanks to: Nicholas Piggin, Christophe Leroy, Aaro Koskinen, Mathieu
  Malaterre"

* tag 'powerpc-5.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX
  powerpc/64s: __find_linux_pte() synchronization vs pmdp_invalidate()
  powerpc/64s: Fix THP PMD collapse serialisation
  powerpc: Fix kexec failure on book3s/32
2019-06-15 07:29:32 -10:00
Nicholas Piggin
a00196a272 powerpc/64s: __find_linux_pte() synchronization vs pmdp_invalidate()
The change to pmdp_invalidate() to mark the pmd with _PAGE_INVALID
broke the synchronisation against lock free lookups,
__find_linux_pte()'s pmd_none() check no longer returns true for such
cases.

Fix this by adding a check for this condition as well.

Fixes: da7ad366b4 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
Cc: stable@vger.kernel.org # v4.20+
Suggested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-07 16:28:28 +10:00
Thomas Gleixner
2874c5fd28 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:26:32 -07:00
Christophe Leroy
26e66b08c3 powerpc/mm: flatten function __find_linux_pte() step 3
__find_linux_pte() is full of if/else which is hard to
follow allthough the handling is pretty simple.

Previous patches left a { } block. This patch removes it.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-03 01:20:24 +10:00
Christophe Leroy
e2fb251188 powerpc/mm: flatten function __find_linux_pte() step 2
__find_linux_pte() is full of if/else which is hard to
follow allthough the handling is pretty simple.

Previous patch left { } blocks. This patch removes the first one
by shifting its content to the left.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-03 01:20:24 +10:00
Christophe Leroy
fab9a1165b powerpc/mm: flatten function __find_linux_pte() step 1
__find_linux_pte() is full of if/else which is hard to
follow allthough the handling is pretty simple.

This patch flattens the function by getting rid of as much if/else
as possible. In order to ease the review, this is done in three steps.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-03 01:20:24 +10:00
Christophe Leroy
0caed4de50 powerpc/mm: move __find_linux_pte() out of hugetlbpage.c
__find_linux_pte() is the only function in hugetlbpage.c
which is compiled in regardless on CONFIG_HUGETLBPAGE

This patch moves it in pgtable.c.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-03 01:20:23 +10:00
Christophe Leroy
385e89d5b2 powerpc/mm: add exec protection on powerpc 603
The 603 doesn't have a HASH table, TLB misses are handled by
software. It is then possible to generate page fault when
_PAGE_EXEC is not set like in nohash/32.

There is one "reserved" PTE bit available, this patch uses
it for _PAGE_EXEC.

In order to support it, set_pte_filter() and
set_access_flags_filter() are made common, and the handling
is made dependent on MMU_FTR_HPTE_TABLE

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:32 +11:00
Breno Leitao
bce85a167b powerpc/mm: Remove extern from function definition
Function huge_ptep_set_access_flags() has the 'extern' keyword in the
function definition and also in the function declaration. This causes a
warning in 'sparse' since the 'extern' storage class should not be used
in the function definition.

	arch/powerpc/mm/pgtable.c:232:12: warning: function 'huge_ptep_set_access_flags' with external linkage has definition

This patch removes the keyword from the definition part. It also removes
the extern keyword from the declaration part, since checkpatch --strict
complains about it.

Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-11-25 17:11:21 +11:00
Aneesh Kumar K.V
dd0e144a63 powerpc/mm: Fix WARN_ON with THP NUMA migration
WARNING: CPU: 12 PID: 4322 at /arch/powerpc/mm/pgtable-book3s64.c:76 set_pmd_at+0x4c/0x2b0
 Modules linked in:
 CPU: 12 PID: 4322 Comm: qemu-system-ppc Tainted: G        W         4.19.0-rc3-00758-g8f0c636b0542 #36
 NIP:  c0000000000872fc LR: c000000000484eec CTR: 0000000000000000
 REGS: c000003fba876fe0 TRAP: 0700   Tainted: G        W          (4.19.0-rc3-00758-g8f0c636b0542)
 MSR:  900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 24282884  XER: 00000000
 CFAR: c000000000484ee8 IRQMASK: 0
 GPR00: c000000000484eec c000003fba877268 c000000001f0ec00 c000003fbd229f80
 GPR04: 00007c8fe8e00000 c000003f864c5a38 860300853e0000c0 0000000000000080
 GPR08: 0000000080000000 0000000000000001 0401000000000080 0000000000000001
 GPR12: 0000000000002000 c000003fffff5400 c000003fce292000 00007c9024570000
 GPR16: 0000000000000000 0000000000ffffff 0000000000000001 c000000001885950
 GPR20: 0000000000000000 001ffffc0004807c 0000000000000008 c000000001f49d05
 GPR24: 00007c8fe8e00000 c0000000020f2468 ffffffffffffffff c000003fcd33b090
 GPR28: 00007c8fe8e00000 c000003fbd229f80 c000003f864c5a38 860300853e0000c0
 NIP [c0000000000872fc] set_pmd_at+0x4c/0x2b0
 LR [c000000000484eec] do_huge_pmd_numa_page+0xb1c/0xc20
 Call Trace:
 [c000003fba877268] [c00000000045931c] mpol_misplaced+0x1bc/0x230 (unreliable)
 [c000003fba8772c8] [c000000000484eec] do_huge_pmd_numa_page+0xb1c/0xc20
 [c000003fba877398] [c00000000040d344] __handle_mm_fault+0x5e4/0x2300
 [c000003fba8774d8] [c00000000040f400] handle_mm_fault+0x3a0/0x420
 [c000003fba877528] [c0000000003ff6f4] __get_user_pages+0x2e4/0x560
 [c000003fba877628] [c000000000400314] get_user_pages_unlocked+0x104/0x2a0
 [c000003fba8776c8] [c000000000118f44] __gfn_to_pfn_memslot+0x284/0x6a0
 [c000003fba877748] [c0000000001463a0] kvmppc_book3s_radix_page_fault+0x360/0x12d0
 [c000003fba877838] [c000000000142228] kvmppc_book3s_hv_page_fault+0x48/0x1300
 [c000003fba877988] [c00000000013dc08] kvmppc_vcpu_run_hv+0x1808/0x1b50
 [c000003fba877af8] [c000000000126b44] kvmppc_vcpu_run+0x34/0x50
 [c000003fba877b18] [c000000000123268] kvm_arch_vcpu_ioctl_run+0x288/0x2d0
 [c000003fba877b98] [c00000000011253c] kvm_vcpu_ioctl+0x1fc/0x8c0
 [c000003fba877d08] [c0000000004e9b24] do_vfs_ioctl+0xa44/0xae0
 [c000003fba877db8] [c0000000004e9c44] ksys_ioctl+0x84/0xf0
 [c000003fba877e08] [c0000000004e9cd8] sys_ioctl+0x28/0x80

We removed the pte_protnone check earlier with the understanding that we
mark the pte invalid before the set_pte/set_pmd usage. But the huge pmd
autonuma still use the set_pmd_at directly. This is ok because a protnone pte
won't have translation cache in TLB.

Fixes: da7ad366b4 ("powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-20 13:26:47 +11:00
Christophe Leroy
26973fa5ac powerpc/mm: use pte helpers in generic code
Get rid of platform specific _PAGE_XXXX in powerpc common code and
use helpers instead.

mm/dump_linuxpagetables.c will be handled separately

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Christophe Leroy
d81e6f8b7c powerpc/mm: don't use _PAGE_EXEC in book3s/32
book3s/32 doesn't define _PAGE_EXEC, so no need to use it.

All other platforms define _PAGE_EXEC so no need to check
it is not NUL when not book3s/32.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14 18:04:09 +11:00
Aneesh Kumar K.V
da7ad366b4 powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit
With this patch we use 0x8000000000000000UL (_PAGE_PRESENT) to indicate a valid
pgd/pud/pmd entry. We also switch the p**_present() to look at this bit.

With pmd_present, we have a special case. We need to make sure we consider a
pmd marked invalid during THP split as present. Right now we clear the
_PAGE_PRESENT bit during a pmdp_invalidate. Inorder to consider this special
case we add a new pte bit _PAGE_INVALID (mapped to _RPAGE_SW0). This bit is
only used with _PAGE_PRESENT cleared. Hence we are not really losing a pte bit
for this special case. pmd_present is also updated to look at _PAGE_INVALID.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-03 15:39:58 +10:00
Aneesh Kumar K.V
ed515b6898 powerpc/mm/hugetlb: Update hugetlb related locks
With split pmd page table lock enabled, we don't use mm->page_table_lock when
updating pmd entries. This patch update hugetlb path to use the right lock
when inserting huge page directory entries into page table.

ex: if we are using hugepd and inserting hugepd entry at the pmd level, we
use pmd_lockptr, which based on config can be split pmd lock.

For update huge page directory entries itself we use mm->page_table_lock. We
do have a helper huge_pte_lockptr() for that.

Fixes: 675d99529 ("powerpc/book3s64: Enable split pmd ptlock")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:37 +10:00
Aneesh Kumar K.V
bd5050e38a powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang
When relaxing access (read -> read_write update), pte needs to be marked invalid
to handle a nest MMU bug. We also need to do a tlb flush after the pte is
marked invalid before updating the pte with new access bits.

We also move tlb flush to platform specific __ptep_set_access_flags. This will
help us to gerid of unnecessary tlb flush on BOOK3S 64 later. We don't do that
in this patch. This also helps in avoiding multiple tlbies with coprocessor
attached.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:34 +10:00
Aneesh Kumar K.V
e4c1112c3f powerpc/mm: Change function prototype
In later patch, we use the vma and psize to do tlb flush. Do the prototype
update in separate patch to make the review easy.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:34 +10:00
Aneesh Kumar K.V
f069ff396d powerpc/mm/hugetlb: Update huge_ptep_set_access_flags to call __ptep_set_access_flags directly
In a later patch, we want to update __ptep_set_access_flags take page size
arg. This makes ptep_set_access_flags only work with mmu_virtual_psize.
To simplify the code make huge_ptep_set_access_flags directly call
__ptep_set_access_flags so that we can compute the hugetlb page size in
hugetlb function.

Now that ptep_set_access_flags won't be called for hugetlb remove
the is_vm_hugetlb_page() check and add the assert of pte lock
unconditionally.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:33 +10:00
Christophe Leroy
812fadcb94 powerpc/mm: extend _PAGE_PRIVILEGED to all CPUs
commit ac29c64089 ("powerpc/mm: Replace _PAGE_USER with
_PAGE_PRIVILEGED") introduced _PAGE_PRIVILEGED for BOOK3S/64

This patch generalises _PAGE_PRIVILEGED for all CPUs, allowing
to have either _PAGE_PRIVILEGED or _PAGE_USER or both.

PPC_8xx has a _PAGE_SHARED flag which is set for and only for
all non user pages. Lets rename it _PAGE_PRIVILEGED to remove
confusion as it has nothing to do with Linux shared pages.

On BookE, there's a _PAGE_BAP_SR which has to be set for kernel
pages: defining _PAGE_PRIVILEGED as _PAGE_BAP_SR will make
this generic

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:47:13 +11:00
Gavin Shan
c618f6b188 powerpc/mm: Fix typo in set_pte_at()
This fixes the typo about the _PAGE_PTE in set_pte_at() by changing
"tryint" to "trying to".

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-17 22:16:25 +11:00
Aneesh Kumar K.V
b3603e174f powerpc/mm: update radix__ptep_set_access_flag to not do full mm tlb flush
When we are updating a pte, we just need to flush the tlb mapping
that pte. Right now we do a full mm flush because we don't track the page
size. Now that we have page size details in pte use that to do the
optimized flush

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-28 22:44:33 +11:00
Aneesh Kumar K.V
c6d1a767b9 powerpc/mm/radix: Use different pte update sequence for different POWER9 revs
POWER9 DD1 requires pte to be marked invalid (V=0) before updating
it with the new value. This makes this distinction for the different
revisions.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:10 +10:00
Aneesh Kumar K.V
703b41ad1a powerpc/mm: remove flush_tlb_page_nohash
This should be same as flush_tlb_page except for hash32. For hash32
I guess the existing code is wrong, because we don't seem to be
flushing tlb for Hash != 0 case at all. Fix this by switching to
calling flush_tlb_page() which does the right thing by flushing
tlb for both hash and nohash case with hash32

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:13 +10:00
Aneesh Kumar K.V
4dfb88ca9b powerpc/mm: Update pte filter for radix
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:53:52 +10:00
Aneesh Kumar K.V
30bda41aba powerpc/mm: Drop WIMG in favour of new constants
PowerISA 3.0 introduces two pte bits with the below meaning for radix:
  00 -> Normal Memory
  01 -> Strong Access Order (SAO)
  10 -> Non idempotent I/O (Cache inhibited and guarded)
  11 -> Tolerant I/O (Cache inhibited)

We drop the existing WIMG bits in the Linux page table in favour of the
above constants. We loose _PAGE_WRITETHRU with this conversion. We only
use writethru via pgprot_cached_wthru() which is used by
fbdev/controlfb.c which is Apple control display and also PPC32.

With respect to _PAGE_COHERENCE, we have been marking hpte always
coherent for some time now. htab_convert_pte_flags() always added
HPTE_R_M.

NOTE: KVM changes need closer review.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:33 +10:00
Aneesh Kumar K.V
ac29c64089 powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED
_PAGE_PRIVILEGED means the page can be accessed only by the kernel. This
is done to keep pte bits similar to PowerISA 3.0 Radix PTE format. User
pages are now marked by clearing _PAGE_PRIVILEGED bit.

Previously we allowed the kernel to have a privileged page in the lower
address range (USER_REGION). With this patch such access is denied.

We also prevent a kernel access to a non-privileged page in higher
address range (ie, REGION_ID != 0).

Both the above access scenarios should never happen.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:26 +10:00
Aneesh Kumar K.V
c7d54842de powerpc/mm: Use _PAGE_READ to indicate Read access
This splits the _PAGE_RW bit into _PAGE_READ and _PAGE_WRITE. It also
removes the dependency on _PAGE_USER for implying read only. Few things
to note here is that, we have read implied with write and execute
permission. Hence we should always find _PAGE_READ set on hash pte
fault.

We still can't switch PROT_NONE to !(_PAGE_RWX). Auto numa depends on
marking a prot none pte _PAGE_WRITE. (For more details look at
b191f9b106 "mm: numa: preserve PTE write permissions across a NUMA
hinting fault")

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:21 +10:00
Alexey Kardashevskiy
e9ab1a1caf powerpc: Make vmalloc_to_phys() public
This makes vmalloc_to_phys() public as there will be another user
(KVM in-kernel VFIO acceleration) for it soon. As this new user
can be compiled as a module, this exports the symbol.

As a little optimization, this changes the helper to call
vmalloc_to_pfn() instead of vmalloc_to_page() as the size of the
struct page may not be power-of-two aligned which will make gcc use
multiply instructions instead of shifts.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-16 13:44:26 +11:00
Aneesh Kumar K.V
6a119eae94 powerpc/mm: Add a _PAGE_PTE bit
For a pte entry we will have _PAGE_PTE set. Our pte page
address have a minimum alignment requirement of HUGEPD_SHIFT_MASK + 1.
We use the lower 7 bits to indicate hugepd. ie.

For pmd and pgd we can find:
1) _PAGE_PTE set pte -> indicate PTE
2) bits [2..6] non zero -> indicate hugepd.
   They also encode the size. We skip bit 1 (_PAGE_PRESENT).
3) othewise pointer to next table.

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:14 +11:00
Mel Gorman
8a0516ed8b mm: convert p[te|md]_numa users to p[te|md]_protnone_numa
Convert existing users of pte_numa and friends to the new helper.  Note
that the kernel is broken after this patch is applied until the other page
table modifiers are also altered.  This patch layout is to make review
easier.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:08 -08:00
Anton Blanchard
e51df2c170 powerpc: Make a bunch of things static
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-25 23:14:41 +10:00
Paul Gortmaker
c141611fb1 powerpc: Delete non-required instances of include <linux/init.h>
None of these files are actually using any __init type directives
and hence don't need to include <linux/init.h>.  Most are just a
left over from __devinit and __cpuinit removal, or simply due to
code getting copied from one driver to the next.

The one instance where we add an include for init.h covers off
a case where that file was implicitly getting it from another
header which itself didn't need it.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15 13:46:44 +11:00
Aneesh Kumar K.V
8937ba48dc powerpc/mm: Only check for _PAGE_PRESENT in set_pte/pmd functions
We want to make sure we don't use these function when updating a pte
or pmd entry that have a valid hpte entry, because these functions
don't invalidate them. So limit the check to _PAGE_PRESENT bit.
Numafault core changes use these functions for updating _PAGE_NUMA bits.
That should be ok because when _PAGE_NUMA is set we can be sure that
hpte entries are not present.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-09 11:40:29 +11:00
LEROY Christophe
79df1b374b powerpc/8xx: Revert commit e0908085fc
The commit e0908085fc ("powerpc/8xx: Fix
regression introduced by cache coherency rewrite") is not needed
anymore.  The issue was because dcbst wrongly sets the store bit when
causing a DTLB error, but this is now fixed by commit
0a2ab51ffb ("powerpc/8xx: Fixup DAR from
buggy dcbX instructions.") which handles the buggy dcbx instructions on
data page faults on the 8xx.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[scottwood@freescale.com: fix commit message]
Signed-off-by: Scott Wood <scottwood@freescale.com>
2013-10-28 21:11:18 -05:00
Aneesh Kumar K.V
a00e7bea0d powerpc: disable assert_pte_locked for collapse_huge_page
With THP we set pmd to none, before we do pte_clear. Hence we can't
walk page table to get the pte lock ptr and verify whether it is locked.
THP do take pte lock before calling pte_clear. So we don't change the locking
rules here. It is that we can't use page table walking to check whether
pte locks are held with THP.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:57 +10:00
Becky Bruce
41151e77a4 powerpc: Hugetlb for BookE
Enable hugepages on Freescale BookE processors.  This allows the kernel to
use huge TLB entries to map pages, which can greatly reduce the number of
TLB misses and the amount of TLB thrashing experienced by applications with
large memory footprints.  Care should be taken when using this on FSL
processors, as the number of large TLB entries supported by the core is low
(16-64) on current processors.

The supported set of hugepage sizes include 4m, 16m, 64m, 256m, and 1g.
Page sizes larger than the max zone size are called "gigantic" pages and
must be allocated on the command line (and cannot be deallocated).

This is currently only fully implemented for Freescale 32-bit BookE
processors, but there is some infrastructure in the code for
64-bit BooKE.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-09-20 09:19:40 +10:00
Peter Zijlstra
2672391169 mm, powerpc: move the RCU page-table freeing into generic code
In case other architectures require RCU freed page-tables to implement
gup_fast() and software filled hashes and similar things, provide the
means to do so by moving the logic into generic code.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Requested-by: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:16 -07:00
Peter Zijlstra
d6bf29b44d powerpc: mmu_gather rework
Fix up powerpc to the new mmu_gather stuff.

PPC has an extra batching queue to RCU free the actual pagetable
allocations, use the ARCH extentions for that for now.

For the ppc64_tlb_batch, which tracks the vaddrs to unhash from the
hardware hash-table, keep using per-cpu arrays but flush on context switch
and use a TLF bit to track the lazy_mmu state.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:13 -07:00
Peter Zijlstra
f2e785ed5f powerpc: Use call_rcu_sched() for pagetables
PowerPC relies on IRQ-disable to guard against RCU quiecent states,
use the appropriate RCU call version.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-11-30 10:42:20 +11:00
Paul E. McKenney
a591f6b56d powerpc: Remove all rcu head initializations
Remove all rcu head inits. We don't care about the RCU head state before
passing it to call_rcu() anyway. Only leave the "on_stack" variants so
debugobjects can keep track of objects on stack.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-07-09 11:28:34 +10:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
David Gibson
a0668cdc15 powerpc/mm: Cleanup management of kmem_caches for pagetables
Currently we have a fair bit of rather fiddly code to manage the
various kmem_caches used to store page tables of various levels.  We
generally have two caches holding some combination of PGD, PUD and PMD
tables, plus several more for the special hugepage pagetables.

This patch cleans this all up by taking a different approach.  Rather
than the caches being designated as for PUDs or for hugeptes for 16M
pages, the caches are simply allocated to be a specific size.  Thus
sharing of caches between different types/levels of pagetables happens
naturally.  The pagetable size, where needed, is passed around encoded
in the same way as {PGD,PUD,PMD}_INDEX_SIZE; that is n where the
pagetable contains 2^n pointers.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-10-30 17:20:57 +11:00
Rex Feany
e0908085fc powerpc/8xx: Fix regression introduced by cache coherency rewrite
After upgrading to the latest kernel on my mpc875 userspace started
running incredibly slow (hours to get to a shell, even!).
I tracked it down to commit 8d30c14cab,
that patch removed a work-around for the 8xx. Adding it
back makes my problem go away.

Signed-off-by: Rex Feany <rfeany@mrv.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-09-24 15:56:30 +10:00
Benjamin Herrenschmidt
ea3cc330ac powerpc/mm: Cleanup handling of execute permission
This is an attempt at cleaning up a bit the way we handle execute
permission on powerpc. _PAGE_HWEXEC is gone, _PAGE_EXEC is now only
defined by CPUs that can do something with it, and the myriad of
#ifdef's in the I$/D$ coherency code is reduced to 2 cases that
hopefully should cover everything.

The logic on BookE is a little bit different than what it was though
not by much. Since now, _PAGE_EXEC will be set by the generic code
for executable pages, we need to filter out if they are unclean and
recover it. However, I don't expect the code to be more bloated than
it already was in that area due to that change.

I could boast that this brings proper enforcing of per-page execute
permissions to all BookE and 40x but in fact, we've had that now for
some time as a side effect of my previous rework in that area (and
I didn't even know it :-) We would only enable execute permission if
the page was cache clean and we would only cache clean it if we took
and exec fault. Since we now enforce that the later only work if
VM_EXEC is part of the VMA flags, we de-fact already enforce per-page
execute permissions... Unless I missed something

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-27 13:12:51 +10:00
Kumar Gala
797a747a82 powerpc/mm: Fix assert_pte_locked to work properly on uniprocessor
Since the pte_lockptr is a spinlock it gets optimized away on
uniprocessor builds so using spin_is_locked is not correct.  We can use
assert_spin_locked instead and get the proper behavior between UP and
SMP builds.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:28:32 +10:00
Benjamin Herrenschmidt
a8f7758c1c powerpc/mm: Move around mmu_gathers definition on 64-bit
The definition for the global structure mmu_gathers, used by generic code,
is currently defined in multiple places not including anything used by
64-bit Book3E. This changes it by moving to one place common to all
processors.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:25:09 +10:00
Benjamin Herrenschmidt
c7cc58a1ad powerpc/mm: Rework & cleanup page table freeing code path
That patch used to just add a hook to page table flushing but
pulling that string brought out a whole bunch of issues, so it
now does that and more:

 - We now make the RCU batching of page freeing SMP only, as I
believe it was intended initially. We make a few more things compile
to nothing on !CONFIG_SMP

 - Some macros are turned into functions, though that forced me to
out of line a few stuffs due to unsolvable include depenencies,
however it's probably better that way anyway, it's not -that-
critical code path.

 - 32-bit didn't call pte_free_finish() on tlb_flush() which means
that it wouldn't push out the batch to RCU for delayed freeing when
a bunch of page tables have been freed, they would just stay in there
until the batch gets full.

64-bit BookE will use that hook to maintain the virtually linear
page tables or the indirect entries in the TLB when using the
HW loader.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:24:56 +10:00
Michael Ellerman
30c5af435b powerpc: Use pr_devel() in do_dcache_icache_coherency()
pr_debug() can now result in code being generated even when DEBUG
is not defined. That's not really desirable in some places.

With CONFIG_DYNAMIC_DEBUG=y:

size before:
   text    data     bss     dec     hex filename
   2036     368       8    2412     96c arch/powerpc/mm/pgtable.o

size after:
   text    data     bss     dec     hex filename
   1677     248       8    1933     78d arch/powerpc/mm/pgtable.o

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-07-08 13:50:24 +10:00
Mel Gorman
af3e4aca47 powerpc: Do not assert pte_locked for hugepage PTE entries
With CONFIG_DEBUG_VM, an assertion is made when changing the protection
flags of a PTE that the PTE is locked. Huge pages use a different pagetable
format and the assertion is bogus and will always trigger with a bug looking
something like

 Unable to handle kernel paging request for data at address 0xf1a00235800006f8
 Faulting instruction address: 0xc000000000034a80
 Oops: Kernel access of bad area, sig: 11 [#1]
 SMP NR_CPUS=32 NUMA Maple
 Modules linked in: dm_snapshot dm_mirror dm_region_hash
  dm_log dm_mod loop evdev ext3 jbd mbcache sg sd_mod ide_pci_generic
  pata_amd ata_generic ipr libata tg3 libphy scsi_mod windfarm_pid
  windfarm_smu_sat windfarm_max6690_sensor windfarm_lm75_sensor
  windfarm_cpufreq_clamp windfarm_core i2c_powermac
 NIP: c000000000034a80 LR: c000000000034b18 CTR: 0000000000000003
 REGS: c000000003037600 TRAP: 0300   Not tainted (2.6.30-rc3-autokern1)
 MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 28002484  XER: 200fffff
 DAR: f1a00235800006f8, DSISR: 0000000040010000
 TASK = c0000002e54cc740[2960] 'map_high_trunca' THREAD: c000000003034000 CPU: 2
 GPR00: 4000000000000000 c000000003037880 c000000000895d30 c0000002e5a2e500
 GPR04: 00000000a0000000 c0000002edc40880 0000005700000393 0000000000000001
 GPR08: f000000011ac0000 01a00235800006e8 00000000000000f5 f1a00235800006e8
 GPR12: 0000000028000484 c0000000008dd780 0000000000001000 0000000000000000
 GPR16: fffffffffffff000 0000000000000000 00000000a0000000 c000000003037a20
 GPR20: c0000002e5f4ece8 0000000000001000 c0000002edc40880 0000000000000000
 GPR24: c0000002e5f4ece8 0000000000000000 00000000a0000000 c0000002e5f4ece8
 GPR28: 0000005700000393 c0000002e5a2e500 00000000a0000000 c000000003037880
 NIP [c000000000034a80] .assert_pte_locked+0xa4/0xd0
 LR [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4
 Call Trace:
 [c000000003037880] [c000000003037990] 0xc000000003037990 (unreliable)
 [c000000003037910] [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4
 [c0000000030379b0] [c00000000014bef8] .hugetlb_cow+0x124/0x674
 [c000000003037b00] [c00000000014c930] .hugetlb_fault+0x4e8/0x6f8
 [c000000003037c00] [c00000000013443c] .handle_mm_fault+0xac/0x828
 [c000000003037cf0] [c0000000000340a8] .do_page_fault+0x39c/0x584
 [c000000003037e30] [c0000000000057b0] handle_page_fault+0x20/0x5c
 Instruction dump:
 7d29582a 7d200074 7800d182 0b000000 3c004000 3960ffff 780007c6 796b00c4
 7d290214 7929a302 1d290068 7d6b4a14 <800b0010> 7c000074 7800d182 0b000000

This patch fixes the problem by not asseting the PTE is locked for VMAs
backed by huge pages.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-18 15:19:04 +10:00