Commit Graph

15837 Commits

Author SHA1 Message Date
Christophe Leroy
6b8cb66a6a powerpc: Fix usage of _PAGE_RO in hugepage
On some CPUs like the 8xx, _PAGE_RW hence _PAGE_WRITE is defined
as 0 and _PAGE_RO has to be set when a page is not writable

_PAGE_RO is defined by default in pte-common.h, however BOOK3S/64
doesn't include that file so _PAGE_RO has to be defined explicitly
in book3s/64/pgtable.h

Fixes: a7b9f671f2 ("powerpc32: adds handling of _PAGE_RO")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-23 07:54:22 +10:00
Russell Currey
af2e3a009e powerpc/eeh: Skip finding bus until after failure reporting
In eeh_handle_special_event(), eeh_pe_bus_get() is called before calling
eeh_report_failure() on every device under a PE.  If a PE was missing a
bus for some reason, the error would occur before reporting failure, even
though eeh_report_failure() doesn't require a bus.

Fix this by moving the bus retrieval and error check after the
eeh_report_failure() calls.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-23 07:54:21 +10:00
Russell Currey
e98ddb7716 powerpc/powernv/eeh: Skip finding bus for VF resets
When the PE used in pnv_eeh_reset() is that of a VF,
pnv_eeh_reset_vf_pe() is used.  Unlike the other reset functions called
in pnv_eeh_reset(), the VF reset doesn't require a bus, and if a bus was
missing the function would error out before resetting the VF PE.

To avoid this, reorder the VF reset function to occur before finding and
checking the bus.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-23 07:54:21 +10:00
Russell Currey
04fec21c06 powerpc/eeh: Null check uses of eeh_pe_bus_get
eeh_pe_bus_get() can return NULL if a PCI bus isn't found for a given PE.
Some callers don't check this, and can cause a null pointer dereference
under certain circumstances.

Fix this by checking NULL everywhere eeh_pe_bus_get() is called.

Fixes: 8a6b1bc70d ("powerpc/eeh: EEH core to handle special event")
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-23 07:54:20 +10:00
Nicholas Piggin
a24553dd02 powerpc/pseries: Remove unnecessary syscall trampoline
When we originally added the ability to split the exception vectors from
the kernel (commit 1f6a93e4c3 ("powerpc: Make it possible to move the
interrupt handlers away from the kernel" 2008-09-15)), the LOAD_HANDLER() macro
used an addi instruction to compute the offset of the common handler
from the kernel base address.

Using addi meant the handler had to be within 32K of the kernel base
address, due to the addi instruction taking a signed immediate value.
That necessitated creating a trampoline for the system call handler,
because system_call_common (in entry64.S) is not linked within 32K of
the kernel base address.

Later in commit 61e2390ede ("powerpc: Make load_hander handle upto 64k
offset" 2012-11-15) we changed LOAD_HANDLER to take a 64K offset, by
changing it to use ori.

Although system_call_common is not in head_64.S or exceptions-64s.S, it
is included in head-y, which causes it to be linked early in the kernel
text, so in practice it ends up below 64K. Additionally if it can't be
placed below 64K the linker will fail to build with a "relocation
truncated to fit" error.

So remove the trampoline.

Newer toolchains are able to work out that the ori in LOAD_HANDLER only
takes a 16 bit offset, and so they generate a 16 bit relocation. Older
toolchains (binutils 2.22 at least) are not so smart, so we have to add
the @l annotation to tell the assembler to generate a 16 bit relocation.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-23 07:54:20 +10:00
Nicholas Piggin
40e1b1cfb5 powerpc/pseries: Fix HV facility unavailable to use correct handler
The 0xf80 hv_facility_unavailable trampoline branches to the 0xf60
handler. This works because they both do the same thing, but it should
be fixed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-23 07:54:19 +10:00
Russell Currey
98b665da57 powerpc/powernv/pci: Add PHB register dump debugfs handle
On EEH events the kernel will print a dump of relevant registers.
If EEH is unavailable (i.e. CONFIG_EEH is disabled, a new platform
doesn't have EEH support, etc) this information isn't readily available.

Add a new debugfs handler to trigger a PHB register dump, so that this
information can be made available on demand.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-23 07:54:19 +10:00
Benjamin Herrenschmidt
3eabf88579 powerpc/64/kexec: Remove BookE special default_machine_kexec_prepare()
The only difference is now the TCE table check which doesn't need
to be ifdef'ed out, it will basically do nothing on BookE (it is
only useful for ancient IBM machines).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-23 07:54:18 +10:00
Benjamin Herrenschmidt
b970b41ea6 powerpc/64/kexec: Copy image with MMU off when possible
Currently we turn the MMU off after copying the image, and we make
sure there is no overlap between the hash table and the target pages
in that case.

That doesn't work for Radix however. In that case, the page tables
are scattered and we can't really enforce that the target of the
image isn't overlapping one of them.

So instead, let's turn the MMU off before copying the image in radix
mode. Thankfully, in radix mode, even under a hypervisor, we know we
don't have the same kind of RMA limitations that hash mode has.

While at it, also turn the MMU off early when using hash in non-LPAR
mode, that way we can get rid of the collision check completely.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-23 07:54:18 +10:00
Aneesh Kumar K.V
be34d30059 powerpc/mm: Add radix flush all with IS=3
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-23 07:54:18 +10:00
Benjamin Herrenschmidt
fe036a0605 powerpc/64/kexec: Fix MMU cleanup on radix
Just using the hash ops won't work anymore since radix will have
NULL in there. Instead create an mmu_cleanup_all() function which
will do the right thing based on the MMU mode.

For Radix, for now I clear UPRT and the PTCR, effectively switching
back to Radix with no partition table setup.

Currently set it to NULL on BookE thought it might be a good idea
to wipe the TLB there (Scott ?)

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-23 07:54:17 +10:00
Benjamin Herrenschmidt
fc48bad531 powerpc/64/kexec: NULL check "clear_all" in kexec_sequence
With Radix, it can be NULL even on !BOOKE these days so replace
the ifdef with a NULL check which is cleaner anyway.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-23 07:54:05 +10:00
Christoph Hellwig
014b44e7a4 libata: remove unused definitions from <asm/libata-portmap.h>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
2016-09-22 11:50:19 -04:00
Stephen Rothwell
f29ca38b6d ppc: there is no clear_pages to export
Fixes: 9445aa1a30 ("ppc: move exports to definitions")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michal Marek <mmarek@suse.com>
2016-09-22 14:51:45 +02:00
Nicholas Piggin
12eb901e01 powerpc/64: whitelist unresolved modversions CRCs
These are a symptom of CRC generation failure in generic
build code, and not powerpc specific.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Fixes: 9445aa1a30 ("ppc: move exports to definitions")
Signed-off-by: Michal Marek <mmarek@suse.com>
2016-09-22 14:46:31 +02:00
Russell Currey
b79331a5eb powerpc/powernv/pci: Fix m64 checks for SR-IOV and window alignment
Commit 5958d19a14 checks for prefetchable m64 BARs by comparing the
addresses instead of using resource flags.  This broke SR-IOV as the m64
check in pnv_pci_ioda_fixup_iov_resources() fails.

The condition in pnv_pci_window_alignment() also changed to checking
only IORESOURCE_MEM_64 instead of both IORESOURCE_MEM_64 and
IORESOURCE_PREFETCH.

Revert these cases to the previous behaviour, adding a new helper function
to do so.  This is named pnv_pci_is_m64_flags() to make it clear this
function is only looking at resource flags and should not be relied on for
non-SRIOV resources.

Fixes: 5958d19a14 ("Fix incorrect PE reservation attempt on some 64-bit BARs")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-21 14:04:13 +10:00
Michael Ellerman
ef24ba7091 powerpc: Remove all usages of NO_IRQ
NO_IRQ has been == 0 on powerpc for just over ten years (since commit
0ebfff1491 ("[POWERPC] Add new interrupt mapping core and change
platforms to use it")). It's also 0 on most other arches.

Although it's fairly harmless, every now and then it causes confusion
when a driver is built on powerpc and another arch which doesn't define
NO_IRQ. There's at least 6 definitions of NO_IRQ in drivers/, at least
some of which are to work around that problem.

So we'd like to remove it. This is fairly trivial in the arch code, we
just convert:

    if (irq == NO_IRQ)	to	if (!irq)
    if (irq != NO_IRQ)	to	if (irq)
    irq = NO_IRQ;	to	irq = 0;
    return NO_IRQ;	to	return 0;

And a few other odd cases as well.

At least for now we keep the #define NO_IRQ, because there is driver
code that uses NO_IRQ and the fixes to remove those will go via other
trees.

Note we also change some occurrences in PPC sound drivers, drivers/ps3,
and drivers/macintosh.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-20 20:57:12 +10:00
Ingo Molnar
b2c16e1efd Merge branch 'linus' into x86/asm, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-20 08:29:21 +02:00
Andrew Donnellan
90ce35145c powerpc/pseries: fix memory leak in queue_hotplug_event() error path
If we fail to allocate work, we don't end up using hp_errlog_copy. Free it
in the error path.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-20 16:17:54 +10:00
Pan Xinhui
11b7e154b1 powerpc/nvram: Fix an incorrect partition merge
When we merge two contiguous partitions whose signatures are marked
NVRAM_SIG_FREE, We need update prev's length and checksum, then write it
to nvram, not cur's. So lets fix this mistake now.

Also use memset instead of strncpy to set the partition's name. It's
more readable if we want to fill up with duplicate chars .

Fixes: fa2b4e54d4 ("powerpc/nvram: Improve partition removal")
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-20 16:15:42 +10:00
Pan Xinhui
0d0fecc5b5 powerpc/nvram: Fix a memory leak in err path
If kmemdup fails, We need kfree *buff* first then return -ENOMEM.
Otherwise there is a memory leak.

Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-20 16:15:33 +10:00
Nicholas Piggin
49d09bf2a6 powerpc/64s: Optimise MSR handling in exception handling
mtmsrd with L=1 only affects MSR_EE and MSR_RI bits, and we always
know what state those bits are, so the kernel MSR does not need to be
loaded when modifying them.

mtmsrd is often in the critical execution path, so avoiding dependency
on even L1 load is noticable. On a POWER8 this saves about 3 cycles
from the syscall path, and possibly a few from other exception returns
(not measured).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-20 15:56:45 +10:00
Nicholas Piggin
18e3f56b1c powerpc/64: Optimise syscall entry for virtual, relocatable case
The mflr r10 instruction was left over from when the code used LR to
branch to system_call_entry from the exception handler. That was
changed by commit 6a404806df ("powerpc: Avoid link stack corruption in
MMU on syscall entry path") to use the count register. The value is
never used now, so mflr can be removed, and r10 can be used for storage
rather than spilling to the SPR scratch register.

The scratch register spill causes a long pipeline stall due to the SPR
read after write. This change brings getppid syscall cost from 406 to
376 cycles on POWER8. getppid for non-relocatable case is 371 cycles.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-20 14:46:05 +10:00
Aneesh Kumar K.V
d5a1e42cb4 powerpc/mm: Update FORCE_MAX_ZONEORDER range to allow hugetlb w/4K
For hugetlb to work with 4K page size, we need MAX_ORDER to be 13 or
more. When switching from a 64K page size to 4K linux page size using
make oldconfig, we end up with a CONFIG_FORCE_MAX_ZONEORDER value of 9.
This results in a 16M hugepage beiing considered as a gigantic huge page
which in turn results in failure to setup hugepages if gigantic hugepage
support is not enabled.

This also results in kernel crash with 4K radix configuration. We
hit the below BUG_ON on radix:

  kernel BUG at mm/huge_memory.c:364!
  Oops: Exception in kernel mode, sig: 5 [#1]
  SMP NR_CPUS=2048 NUMA PowerNV
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-rc1-00006-gbae9cc6 #1
  task: c0000000f1af8000 task.stack: c0000000f1aec000
  NIP: c000000000c5fa0c LR: c000000000c5f9d8 CTR: c000000000c5f9a4
  REGS: c0000000f1aef920 TRAP: 0700   Not tainted (4.8.0-rc1-00006-gbae9cc6)
  MSR: 9000000102029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE,TM[E]>  CR: 24000844  XER: 00000000
  CFAR: c000000000c5f9e0 SOFTE: 1
  ....
  NIP [c000000000c5fa0c] hugepage_init+0x68/0x238
  LR [c000000000c5f9d8] hugepage_init+0x34/0x238

Fixes: a7ee539584 ("powerpc/Kconfig: Update config option based on page size")
Cc: stable@vger.kernel.org # v4.7+
Reported-by: Santhosh <santhog4@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-20 14:40:41 +10:00
Nicholas Piggin
e0e0d6b739 powerpc/64: Replay hypervisor maintenance interrupt first
The HMI (Hypervisor Maintenance Interrupt) is defined by the
architecture to be higher priority than other maskable interrupts, so
replay it first, as a best-effort to replay according to hardware
priorities.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-20 14:35:34 +10:00
Michael Ellerman
7de3b27bac powerpc: Ensure .mem(init|exit).text are within _stext/_etext
In our linker script we open code the list of text sections, because we
need to include the __ftr_alt sections, which are arch-specific.

This means we can't use TEXT_TEXT as defined in vmlinux.lds.h, and so we
don't have the MEM_KEEP() logic for memory hotplug sections.

If we build the kernel with the gold linker, and with CONFIG_MEMORY_HOTPLUG=y,
we see that functions marked __meminit can end up outside of the
_stext/_etext range, and also outside of _sinittext/_einittext, eg:

    c000000000000000 T _stext
    c0000000009e0000 A _etext
    c0000000009e3f18 T hash__vmemmap_create_mapping
    c000000000ca0000 T _sinittext
    c000000000d00844 T _einittext

This causes them to not be recognised as text by is_kernel_text(), and
prevents them being patched by jump_label (and presumably ftrace/kprobes
etc.).

Fix it by adding MEM_KEEP() directives, mirroring what TEXT_TEXT does.

This isn't a problem when CONFIG_MEMORY_HOTPLUG=n, because we use the
standard INIT_TEXT_SECTION() and EXIT_TEXT macros from vmlinux.lds.h.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-19 10:53:56 +10:00
Michael Ellerman
bea2dccccf powerpc: Don't change the section in _GLOBAL()
Currently the _GLOBAL() macro unilaterally sets the assembler section to
".text" at the start of the macro. This is rude as the caller may be
using a different section.

So let the caller decide which section to emit the code into. On big
endian we do need to switch to the ".opd" section to emit the OPD, but
do that with pushsection/popsection, thereby leaving the original
section intact.

I verified that the order of all entries in System.map is unchanged
after this patch. The actual addresses shift around slightly so you
can't just diff the System.map.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-19 10:53:55 +10:00
Nicholas Piggin
6f698df10c powerpc/kernel: Use kprobe blacklist for asm functions
Rather than forcing the whole function into the ".kprobes.text" section,
just add the symbol's address to the kprobe blacklist.

This also lets us drop the three versions of the_KPROBE macro, in
exchange for just one version of _ASM_NOKPROBE_SYMBOL - which is a good
cleanup.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-19 10:53:55 +10:00
Nicholas Piggin
03465f899b powerpc: Use kprobe blacklist for exception handlers
Currently we mark the C implementations of some exception handlers as
__kprobes. This has the effect of putting them in the ".kprobes.text"
section, which separates them from the rest of the text.

Instead we can use the blacklist macros to add the symbols to a
blacklist which kprobes will check. This allows the linker to move
exception handler functions close to callers and avoids trampolines in
larger kernels.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Reword change log a bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-19 10:53:54 +10:00
Linus Torvalds
baf009f927 powerpc fixes for 4.8 #6
Fixes for code merged this cycle:
  - Fix restore of SPRs upon wake up from hypervisor state loss from Gautham R. Shenoy
  - Fix the state of root PE from Gavin Shan
  - Detach from PE on releasing PCI device from Gavin Shan
  - Fix size of NUM_CPU_FTR_KEYS on 32-bit
  - Fix missed TCE invalidations that should fallback to OPAL
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJX3QI8AAoJEFHr6jzI4aWAmDEP/3k8Tuk30d+QhfVm++N18cZ7
 EFdpO25m+kJH83PVc3Lri6sj2z6/Bpm6Ib2V3gynExB9SxUCcAJXHqwioLkL9/PW
 aUiotxsPvlpfBFAjNbk2myB8JSbc/+8yJaojKYWqwX796bjUdRkI7rmXtfrjmX6U
 uhQQ9nvKNxThwY5eedMH9PCJ89BzgLefrExHUD171iR43qfaouLkUn/Ba+UIhC5m
 pepwePCTXHEPm8e328hYVSNEmqWRgL+UN2EUZKqXjITNtDSHCdwGTF8iifwTku54
 g/rrta8CgFD4x5chTROnOhJMkTD9MRoneVR8nE4QD6yMHj9k1huL8J8wlfnG/zbB
 Ym6MNKBYbGPMAoYfbxAcvWr/7XL+szNoR+p+VWl+rgf2Z08dQaI4zNiB3aimCs1g
 7yWW649Gd4gXyNygfeMCDWGZbVhQdQIHcNrcAKFuIRvkn3iPZ0cPa5GYxZ7o/32B
 oKAtZMsufGN0eC21hbLaRkyeYPdqEjyk+T734t05cfBCvScWkHeBapnX9gYOoCqZ
 ok7b8wXVqVFXZ+FSZ8Ec7YquUHBhHECpqofMgB6d9DqbWPlubwiA3g4YnjrpFDC7
 u4a4bVKVZy8fk3w7+2ibkIdud35zL0LqkB2ZNhOn3IYM/yBD0zgUs+bIqruTKZ2+
 AYapeGjmf+SBD3ytGtab
 =MG5t
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.8-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fixes for code merged this cycle:

   - Fix restore of SPRs upon wake up from hypervisor state loss from
     Gautham R  Shenoy
   - Fix the state of root PE from Gavin Shan
   - Detach from PE on releasing PCI device from Gavin Shan
   - Fix size of NUM_CPU_FTR_KEYS on 32-bit
   - Fix missed TCE invalidations that should fallback to OPAL"

* tag 'powerpc-4.8-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/powernv/pci: Fix missed TCE invalidations that should fallback to OPAL
  powerpc/powernv: Detach from PE on releasing PCI device
  powerpc/powernv: Fix the state of root PE
  powerpc/kernel: Fix size of NUM_CPU_FTR_KEYS on 32-bit
  powerpc/powernv: Fix restore of SPRs upon wake up from hypervisor state loss
2016-09-17 12:52:01 -07:00
Luiz Capitulino
235539b48a kvm: add stubs for arch specific debugfs support
Two stubs are added:

 o kvm_arch_has_vcpu_debugfs(): must return true if the arch
   supports creating debugfs entries in the vcpu debugfs dir
   (which will be implemented by the next commit)

 o kvm_arch_create_vcpu_debugfs(): code that creates debugfs
   entries in the vcpu debugfs dir

For x86, this commit introduces a new file to avoid growing
arch/x86/kvm/x86.c even more.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-16 16:57:47 +02:00
Michael Ellerman
ed7d9a1d7d powerpc/powernv/pci: Fix missed TCE invalidations that should fallback to OPAL
In commit f0228c4130 ("powerpc/powernv/pci: Fallback to OPAL for TCE
invalidations"), we added logic to fallback to OPAL for doing TCE
invalidations if we can't do it in Linux.

Ben sent a v2 of the patch, containing these additional call sites, but
I had already applied v1 and didn't notice. So fix them now.

Fixes: f0228c4130 ("powerpc/powernv/pci: Fallback to OPAL for TCE invalidations")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-15 17:05:11 +10:00
Ingo Molnar
d4b80afbba Merge branch 'linus' into x86/asm, to pick up recent fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-15 08:24:53 +02:00
Gavin Shan
29bf282dec powerpc/powernv: Detach from PE on releasing PCI device
The PCI hotplug can be part of EEH error recovery. The @pdn and
the device's PE number aren't removed and added afterwords. The
PE number in @pdn should be set to an invalid one. Otherwise, the
PE's device count is decreased on removing devices while failing
to be increased on adding devices. It leads to unbalanced PE's
device count and make normal PCI hotplug path broken.

Fixes: c5f7700bbd ("powerpc/powernv: Dynamically release PE")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-15 13:53:19 +10:00
Linus Torvalds
77e5bdf9f7 Merge branch 'uaccess-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess fixes from Al Viro:
 "Fixes for broken uaccess primitives - mostly lack of proper zeroing
  in copy_from_user()/get_user()/__get_user(), but for several
  architectures there's more (broken clear_user() on frv and
  strncpy_from_user() on hexagon)"

* 'uaccess-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits)
  avr32: fix copy_from_user()
  microblaze: fix __get_user()
  microblaze: fix copy_from_user()
  m32r: fix __get_user()
  blackfin: fix copy_from_user()
  sparc32: fix copy_from_user()
  sh: fix copy_from_user()
  sh64: failing __get_user() should zero
  score: fix copy_from_user() and friends
  score: fix __get_user/get_user
  s390: get_user() should zero on failure
  ppc32: fix copy_from_user()
  parisc: fix copy_from_user()
  openrisc: fix copy_from_user()
  nios2: fix __get_user()
  nios2: copy_from_user() should zero the tail of destination
  mn10300: copy_from_user() should zero on access_ok() failure...
  mn10300: failing __get_user() and get_user() should zero
  mips: copy_from_user() must zero the destination on access_ok() failure
  ARC: uaccess: get_user to zero out dest in cause of fault
  ...
2016-09-14 09:35:05 -07:00
Gavin Shan
6eaed1665f powerpc/powernv: Fix the state of root PE
The PE for root bus (root PE) can be removed because of PCI hot
remove in EEH recovery path for fenced PHB error. We need update
@phb->root_pe_populated accordingly so that the root PE can be
populated again in forthcoming PCI hot add path. Also, the PE
shouldn't be destroyed as it's global and reserved resource.

Fixes: c5f7700bbd ("powerpc/powernv: Dynamically release PE")
Reported-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-14 11:40:09 +10:00
Al Viro
224264657b ppc32: fix copy_from_user()
should clear on access_ok() failures.  Also remove the useless
range truncation logics.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-13 17:50:02 -04:00
Simon Guo
e1c0d66fcb powerpc: Set used_(vsr|vr|spe) in sigreturn path when MSR bits are active
Normally, when MSR[VSX/VR/SPE] bits == 1, the used_vsr/used_vr/used_spe
bit have already been set. However when loading a signal frame from user
space we need to explicitly set used_vsr/used_vr/used_spe to make them
consistent with the MSR bits from the signal frame.

For example, CRIU application, who utilizes sigreturn to restore
checkpointed process, will lead to the case where MSR[VSX] bit is active
in signal frame, but used_vsr bit is not set in the kernel. (the same
applies to VR/SPE).

This patch fixes this by always setting used_* bit when MSR related bits
are active in signal frame and we are doing sigreturn.

Based on a proposal by Benh.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
[mpe: Massage change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:12 +10:00
Simon Guo
261831160d powerpc/ptrace: Fix cppcheck issue in gpr32_set_common/gpr32_get_common()
The ckpt_regs usage in gpr32_set_common/gpr32_get_common() will lead to
following cppcheck error at ifndef CONFIG_PPC_TRANSACTIONAL_MEM case:

[arch/powerpc/kernel/ptrace.c:2062]:
(error) Uninitialized variable: ckpt_regs
[arch/powerpc/kernel/ptrace.c:2130]:
(error) Uninitialized variable: ckpt_regs

The problem is due to gpr32_set_common() used ckpt_regs variable which
only makes sense at #ifdef CONFIG_PPC_TRANSACTIONAL_MEM.

This patch fix this issue by passing in "regs" parameter instead.

Reported-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:12 +10:00
Colin Ian King
3daf3c2069 powerpc/32: Add missing \n and switch to pr_warn()
The message is missing a \n, add it. Switch to pr_warn(), it's shorter
and less ugly.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:11 +10:00
Aneesh Kumar K.V
ad410674f5 powerpc/mm: Update the HID bit when switching from radix to hash
Power9 DD1 requires to update the hid0 register when switching from
hash to radix.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:10 +10:00
Aneesh Kumar K.V
c6d1a767b9 powerpc/mm/radix: Use different pte update sequence for different POWER9 revs
POWER9 DD1 requires pte to be marked invalid (V=0) before updating
it with the new value. This makes this distinction for the different
revisions.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:10 +10:00
Aneesh Kumar K.V
694c495192 powerpc/mm/radix: Use different RTS encoding for different POWER9 revs
POWER9 DD1 uses RTS - 28 for the RTS value but other revisions use
RTS - 31.  This makes this distinction for the different revisions

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:09 +10:00
Aneesh Kumar K.V
7dccfbc325 powerpc/book3s: Add a cpu table entry for different POWER9 revs
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:09 +10:00
Darren Stevens
687e16bc2f powerpc/pasemi: Fix device_type of Nemo SB600 node.
The of_node for the SB600 (io-bridge) has its device_type set to
'io-bridge' Set it to 'isa' so that it can be found by
isa_bridge_find_early() instead of using patches in the kernel.

Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:08 +10:00
Darren Stevens
5024678765 powerpc/pasemi: Fix Nemo SB600 i8259 interrupts.
The device tree on the Nemo passes all of the i8259 interrupts with
numbers between 212 and 222, and points their interrupt-parent property
to the pasemi-opic, requiring custom patches to the kernel. Fix the
values so that they can be controlled by the generic ppc i8259 code.

Signed-off-by: Darren Stevens <darren@stevens-zone.net>
[mpe: Rework deeply nested if and boundary checks]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:08 +10:00
Darren Stevens
88c13e2f4f powerpc/pasemi: Add Nemo motherboard config option.
Add config option for the Nemo motherboard used in the Amigaone X1000.
This is a custom PASemi board with an AMD SB600 southbridge, and needs
some patches to it device tree. This option will be used to build these
into the kernel

Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:07 +10:00
Colin Ian King
6f95d4b2f6 powerpc/ps3: fix spelling mistake in function name
Trivial fix to spelling mistake in dev_warn message and remove
extraneous trailing whitespace at end of the message.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:06 +10:00
Michael Ellerman
57073e2781 powerpc/Makefile: Construct the UTS_MACHINE value more concisely
Use the standard Kbuild trick of foo-y to make the construction of
UTC_MACHINE less verbose.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:06 +10:00
Michael Ellerman
68201fbbb0 powerpc/Makefile: Drop CONFIG_WORD_SIZE for BITS
Commit 2578bfae84 ("[POWERPC] Create and use CONFIG_WORD_SIZE") added
CONFIG_WORD_SIZE, and suggests that other arches were going to do
likewise.

But that never happened, powerpc is the only architecture which uses it.

So switch to using a simple make variable, BITS, like x86, sh, sparc and
tile. It is also easier to spell and simpler, avoiding any confusion
about whether it's defined due to ordering of make vs kconfig.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:06 +10:00
Michael Ellerman
6abe248e16 powerpc/boot: Use $(Q) to quiet build rules not @
Some of the rules in the boot Makefile use @ to hide the command, this
means "make V=1" doesn't show them, which is confusing.

So use the Kbuild standard $(Q) which means KBUILD_VERBOSE=1 or V=1 will
work as expected.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:05 +10:00
Michael Ellerman
2ca07d7c4f powerpc/vdso64: Drop vdso64as
We can just use the standard .S -> .o rule, cmd_as_o_S.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:05 +10:00
Michael Ellerman
d312603a44 powerpc/Makefile: CROSS32AS is unused, remove it
In fact it makes no sense at all to have this defined on little endian
builds. Since we disabled the 32-bit VDSO on little endian, we don't
build any 32-bit code when building a little endian kernel.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:04 +10:00
Michael Ellerman
d8d42b0511 powerpc/64: Do load of PACAKBASE in LOAD_HANDLER
The LOAD_HANDLER macro requires that you have previously loaded "reg"
with PACAKBASE. Although that gives callers flexibility to get PACAKBASE
in some interesting way, none of the callers actually do that. So fold
the load of PACAKBASE into the macro, making it simpler for callers to
use correctly.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:04 +10:00
Michael Ellerman
27510235dd powerpc/64: Correct comment on LOAD_HANDLER()
The comment for LOAD_HANDLER() was wrong. The part about kdump has not
been true since 1f6a93e4c3 ("powerpc: Make it possible to move the
interrupt handlers away from the kernel").

Describe how it currently works, and combine the two separate comments
into one.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:03 +10:00
Paul Mackerras
f0f558b131 powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address
Currently, if userspace or the kernel accesses a completely bogus address,
for example with any of bits 46-59 set, we first take an SLB miss interrupt,
install a corresponding SLB entry with VSID 0, retry the instruction, then
take a DSI/ISI interrupt because there is no HPT entry mapping the address.
However, by the time of the second interrupt, the Come-From Address Register
(CFAR) has been overwritten by the rfid instruction at the end of the SLB
miss interrupt handler.  Since bogus accesses can often be caused by a
function return after the stack has been overwritten, the CFAR value would
be very useful as it could indicate which function it was whose return had
led to the bogus address.

This patch adds code to create a full exception frame in the SLB miss handler
in the case of a bogus address, rather than inserting an SLB entry with a
zero VSID field.  Then we call a new slb_miss_bad_addr() function in C code,
which delivers a signal for a user access or creates an oops for a kernel
access.  In the latter case the oops message will show the CFAR value at the
time of the access.

In the case of the radix MMU, a segment miss interrupt indicates an access
outside the ranges mapped by the page tables.  Previously this was handled
by the code for an unrecoverable SLB miss (one with MSR[RI] = 0), which is
not really correct.  With this patch, we now handle these interrupts with
slb_miss_bad_addr(), which is much more consistent.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:03 +10:00
Michael Ellerman
b42d9023a3 powerpc/xmon: Don't use ld on 32-bit
In commit 31cdd0c39c ("powerpc/xmon: Fix SPR read/write commands and
add command to dump SPRs") I added two uses of the "ld" instruction in
spr_access.S. "ld" is a 64-bit instruction, so shouldn't be used on
32-bit CPUs.

Replace it with PPC_LL which is a macro that gives us either "ld" or
"lwz" depending on whether we're 64 or 32-bit.

Fixes: 31cdd0c39c ("powerpc/xmon: Fix SPR read/write commands and add command to dump SPRs")
Cc: stable@vger.kernel.org # v4.7+
Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:37:02 +10:00
Daniel Axtens
0545d5436a powerpc/sparse: Add more assembler prototypes
Another set of things that are only called from assembler and so need
prototypes to keep sparse happy.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:36:58 +10:00
Daniel Axtens
d8bced27be powerpc/fadump: Set core e_flags using kernel's ELF ABI version
Firmware Assisted Dump is a facility to dump kernel core with assistance
from firmware. As part of this process the kernel ELF ABI version is
stored in the core file.

Currently fadump.h defines this to 0 if it is not already defined. This
clashes with a define in elf.h which sets it based on the current task -
not based on the kernel's ELF ABI version.

Use the compiler-provided #define _CALL_ELF which tells us the ELF ABI
version of the kernel to set e_flags, this matches what binutils does.

Remove the definition in fadump.h, which becomes unused.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:36:01 +10:00
Daniel Axtens
7c98bd7208 powerpc/sparse: Make a bunch of things static
Squash a bunch of sparse warnings by making things static.

Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-13 17:35:47 +10:00
Markus Elfring
aad9e5ba24 KVM: PPC: e500: Rename jump labels in kvmppc_e500_tlb_init()
Adjust jump labels according to the current Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-13 14:32:47 +10:00
Michael Ellerman
ffed15d3ce powerpc/kernel: Fix size of NUM_CPU_FTR_KEYS on 32-bit
The number of CPU feature keys is meant to map 1:1 to the number of CPU
feature flags defined in cputable.h, and the latter must fit in an
unsigned long.

In commit 4db7327194 ("powerpc: Add option to use jump label for
cpu_has_feature()"), I incorrectly defined NUM_CPU_FTR_KEYS to 64.

There should be no real adverse consequences of this bug, other than us
allocating too many keys.

Fix it by using BITS_PER_LONG.

Fixes: 4db7327194 ("powerpc: Add option to use jump label for cpu_has_feature()")
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-12 12:48:28 +10:00
Gautham R. Shenoy
bd00a240dc powerpc/powernv: Fix restore of SPRs upon wake up from hypervisor state loss
pnv_wakeup_tb_loss() currently expects cr4 to be "eq" if the CPU is
waking up from a complete hypervisor state loss. Hence, it currently
restores the SPR contents only if cr4 is "eq".

However, after commit bcef83a00d ("powerpc/powernv: Add platform
support for stop instruction"), on ISA v3.0 CPUs, the function
pnv_restore_hyp_resource() sets cr4 to contain the result of the
comparison between the state the CPU has woken up from and the first
deep stop state before calling pnv_wakeup_tb_loss().

Thus if the CPU woke up from a state that is deeper than the first
deep stop state, cr4 will have "gt" set and hence, pnv_wakeup_tb_loss()
will fail to restore the SPRs on waking up from such a state.

Fix the code in pnv_wakeup_tb_loss() to restore the SPR states when cr4
is "eq" or "gt".

Fixes: bcef83a00d ("powerpc/powernv: Add platform support for stop instruction")
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Shreyas B. Prabhu <shreyasbp@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-12 12:45:50 +10:00
Markus Elfring
90235dc19e KVM: PPC: e500: Use kmalloc_array() in kvmppc_e500_tlb_init()
* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "kmalloc_array".

* Replace the specification of a data structure by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12 10:13:01 +10:00
Markus Elfring
b0ac477bc4 KVM: PPC: e500: Replace kzalloc() calls by kcalloc() in two functions
* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "kcalloc".

  Suggested-by: Paolo Bonzini <pbonzini@redhat.com>

  This issue was detected also by using the Coccinelle software.

* Replace the specification of data structures by pointer dereferences
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12 10:12:56 +10:00
Markus Elfring
cfb60813fb KVM: PPC: e500: Delete an unnecessary initialisation in kvm_vcpu_ioctl_config_tlb()
The local variable "g2h_bitmap" will be set to an appropriate value
a bit later. Thus omit the explicit initialisation at the beginning.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12 10:12:51 +10:00
Markus Elfring
46d4e74792 KVM: PPC: e500: Less function calls in kvm_vcpu_ioctl_config_tlb() after error detection
The kfree() function was called in two cases by the
kvm_vcpu_ioctl_config_tlb() function during error handling
even if the passed data structure element contained a null pointer.

* Split a condition check for memory allocation failures.

* Adjust jump targets according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12 10:12:46 +10:00
Markus Elfring
f3c0ce86a8 KVM: PPC: e500: Use kmalloc_array() in kvm_vcpu_ioctl_config_tlb()
* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "kmalloc_array".

  This issue was detected by using the Coccinelle software.

* Replace the specification of a data type by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12 10:12:40 +10:00
Suresh Warrier
65e7026a6c KVM: PPC: Book3S HV: Counters for passthrough IRQ stats
Add VCPU stat counters to track affinity for passthrough
interrupts.

pthru_all: Counts all passthrough interrupts whose IRQ mappings are
           in the kvmppc_passthru_irq_map structure.
pthru_host: Counts all cached passthrough interrupts that were injected
	    from the host through kvm_set_irq (i.e. not handled in
	    real mode).
pthru_bad_aff: Counts how many cached passthrough interrupts have
               bad affinity (receiving CPU is not running VCPU that is
	       the target of the virtual interrupt in the guest).

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12 10:12:34 +10:00
Paul Mackerras
5d375199ea KVM: PPC: Book3S HV: Set server for passed-through interrupts
When a guest has a PCI pass-through device with an interrupt, it
will direct the interrupt to a particular guest VCPU.  In fact the
physical interrupt might arrive on any CPU, and then get
delivered to the target VCPU in the emulated XICS (guest interrupt
controller), and eventually delivered to the target VCPU.

Now that we have code to handle device interrupts in real mode
without exiting to the host kernel, there is an advantage to having
the device interrupt arrive on the same sub(core) as the target
VCPU is running on.  In this situation, the interrupt can be
delivered to the target VCPU without any exit to the host kernel
(using a hypervisor doorbell interrupt between threads if
necessary).

This patch aims to get passed-through device interrupts arriving
on the correct core by setting the interrupt server in the real
hardware XICS for the interrupt to the first thread in the (sub)core
where its target VCPU is running.  We do this in the real-mode H_EOI
code because the H_EOI handler already needs to look at the
emulated ICS state for the interrupt (whereas the H_XIRR handler
doesn't), and we know we are running in the target VCPU context
at that point.

We set the server CPU in hardware using an OPAL call, regardless of
what the IRQ affinity mask for the interrupt says, and without
updating the affinity mask.  This amounts to saying that when an
interrupt is passed through to a guest, as a matter of policy we
allow the guest's affinity for the interrupt to override the host's.

This is inspired by an earlier patch from Suresh Warrier, although
none of this code came from that earlier patch.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12 10:12:28 +10:00
Suresh Warrier
366274f59c KVM: PPC: Book3S HV: Update irq stats for IRQs handled in real mode
When a passthrough IRQ is handled completely within KVM real
mode code, it has to also update the IRQ stats since this
does not go through the generic IRQ handling code.

However, the per CPU kstat_irqs field is an allocated (not static)
field and so cannot be directly accessed in real mode safely.

The function this_cpu_inc_rm() is introduced to safely increment
per CPU fields (currently coded for unsigned integers only) that
are allocated and could thus be vmalloced also.

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12 10:12:23 +10:00
Suresh Warrier
644abbb254 KVM: PPC: Book3S HV: Tunable to disable KVM IRQ bypass
Add a  module parameter kvm_irq_bypass for kvm_hv.ko to
disable IRQ bypass for passthrough interrupts. The default
value of this tunable is 1 - that is enable the feature.

Since the tunable is used by built-in kernel code, we use
the module_param_cb macro to achieve this.

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12 10:12:18 +10:00
Suresh Warrier
af893c7dc9 KVM: PPC: Book3S HV: Dump irqmap in debugfs
Dump the passthrough irqmap structure associated with a
guest as part of /sys/kernel/debug/powerpc/kvm-xics-*.

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12 10:12:13 +10:00
Suresh Warrier
f7af5209b8 KVM: PPC: Book3S HV: Complete passthrough interrupt in host
In existing real mode ICP code, when updating the virtual ICP
state, if there is a required action that cannot be completely
handled in real mode, as for instance, a VCPU needs to be woken
up, flags are set in the ICP to indicate the required action.
This is checked when returning from hypercalls to decide whether
the call needs switch back to the host where the action can be
performed in virtual mode. Note that if h_ipi_redirect is enabled,
real mode code will first try to message a free host CPU to
complete this job instead of returning the host to do it ourselves.

Currently, the real mode PCI passthrough interrupt handling code
checks if any of these flags are set and simply returns to the host.
This is not good enough as the trap value (0x500) is treated as an
external interrupt by the host code. It is only when the trap value
is a hypercall that the host code searches for and acts on unfinished
work by calling kvmppc_xics_rm_complete.

This patch introduces a special trap BOOK3S_INTERRUPT_HV_RM_HARD
which is returned by KVM if there is unfinished business to be
completed in host virtual mode after handling a PCI passthrough
interrupt. The host checks for this special interrupt condition
and calls into the kvmppc_xics_rm_complete, which is made an
exported function for this reason.

[paulus@ozlabs.org - moved logic to set r12 to BOOK3S_INTERRUPT_HV_RM_HARD
 in book3s_hv_rmhandlers.S into the end of kvmppc_check_wake_reason.]

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12 10:12:07 +10:00
Suresh Warrier
e3c13e56a4 KVM: PPC: Book3S HV: Handle passthrough interrupts in guest
Currently, KVM switches back to the host to handle any external
interrupt (when the interrupt is received while running in the
guest). This patch updates real-mode KVM to check if an interrupt
is generated by a passthrough adapter that is owned by this guest.
If so, the real mode KVM will directly inject the corresponding
virtual interrupt to the guest VCPU's ICS and also EOI the interrupt
in hardware. In short, the interrupt is handled entirely in real
mode in the guest context without switching back to the host.

In some rare cases, the interrupt cannot be completely handled in
real mode, for instance, a VCPU that is sleeping needs to be woken
up. In this case, KVM simply switches back to the host with trap
reason set to 0x500. This works, but it is clearly not very efficient.
A following patch will distinguish this case and handle it
correctly in the host. Note that we can use the existing
check_too_hard() routine even though we are not in a hypercall to
determine if there is unfinished business that needs to be
completed in host virtual mode.

The patch assumes that the mapping between hardware interrupt IRQ
and virtual IRQ to be injected to the guest already exists for the
PCI passthrough interrupts that need to be handled in real mode.
If the mapping does not exist, KVM falls back to the default
existing behavior.

The KVM real mode code reads mappings from the mapped array in the
passthrough IRQ map without taking any lock.  We carefully order the
loads and stores of the fields in the kvmppc_irq_map data structure
using memory barriers to avoid an inconsistent mapping being seen by
the reader. Thus, although it is possible to miss a map entry, it is
not possible to read a stale value.

[paulus@ozlabs.org - get irq_chip from irq_map rather than pimap,
 pulled out powernv eoi change into a separate patch, made
 kvmppc_read_intr get the vcpu from the paca rather than being
 passed in, rewrote the logic at the end of kvmppc_read_intr to
 avoid deep indentation, simplified logic in book3s_hv_rmhandlers.S
 since we were always restoring SRR0/1 anyway, get rid of the cached
 array (just use the mapped array), removed the kick_all_cpus_sync()
 call, clear saved_xirr PACA field when we handle the interrupt in
 real mode, fix compilation with CONFIG_KVM_XICS=n.]

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-12 10:11:00 +10:00
Daniel Axtens
bc42f1d9f5 powerpc/cell: Drop unused iic_get_irq_host()
Sparse checking revealed that it is no longer used. The last usage was
removed in commit 2e19458312 ("[POWERPC] Cell interrupt rework") in
2006.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-10 18:46:45 +10:00
Linus Torvalds
2771fc8ed6 powerpc fixes for 4.8 #5
- Don't alias user region to other regions below PAGE_OFFSET from Paul Mackerras
  - Fix again csum_partial_copy_generic() on 32-bit from Christophe Leroy
  - Fix corrupted PE allocation bitmap on releasing PE from Gavin Shan
 
 Fixes for code merged this cycle:
  - Fix crash on releasing compound PE from Gavin Shan
  - Fix processor numbers in OPAL ICP from Benjamin Herrenschmidt
  - Fix little endian build with CONFIG_KEXEC=n from Thiago Jung Bauermann
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJX0qKlAAoJEFHr6jzI4aWAu30QAK88plLxJ2z5Lyf7axdHf7P0
 NfM6xQLyuUQ1xP3ksUB4/4eps3jINJ0bZRd4mMJ3fO/YgNsBACayB9GeE478tMbp
 1KmK94qpLk1u4BUxXNtsWJLEuzVqAPr2cGh6jddmkPXGCUx1MFatEVNVJupX+Vt9
 sJsmhLatUucZEQI6r4sK5wDOdLYIQgcgTIWW5qHH7jyJDKLGyJbNPtmQhbMWU0a5
 zBwD+paecJSGTJEVjd3UwBic+oXt8chwiZkaHLu4Rh6JQ0yVRL4If4EYCodHIpDR
 H7b0P9De9W6a+IWLjVDMhYKq9rBjjgZwcjMplkO7gBE2P+v/NGzbfORJtNXeOgKE
 /RSWufpTbpiGyUzP1Lr/j0O59ZoijRGBK8zuha5FtsTlhl909ifc6KuHO5aqHY9r
 I5o7ws+hSBM1u9cf0Bl011P4uToYzy1auMsZsjDW2SdDEFtJ+WK+0I2vp+M9Jv73
 /F48n/EWUuul5oS2Uar+V2AUADpnYPRi50OR1zVJxdJSM8bZFue4brBFfx1bI/2/
 jmK87hxNwJtYT45KiuEXr2FWMiB1iNHHxL/OEwWbitf2MfRjq8+LHbdt9FxOSj3/
 +8cw3f1zyEjNsvH380HhkUBZknKmD7z8V5Ko5Dx5h8cuRlL+QEW2GnW+1NN7VMoQ
 T7QTHRR4ziHSKdzAIlTe
 =q0Jo
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fixes marked for stable:
   - Don't alias user region to other regions below PAGE_OFFSET from
     Paul Mackerras
   - Fix again csum_partial_copy_generic() on 32-bit from Christophe
     Leroy
   - Fix corrupted PE allocation bitmap on releasing PE from Gavin Shan

  Fixes for code merged this cycle:
   - Fix crash on releasing compound PE from Gavin Shan
   - Fix processor numbers in OPAL ICP from Benjamin Herrenschmidt
   - Fix little endian build with CONFIG_KEXEC=n from Thiago Jung
     Bauermann"

* tag 'powerpc-4.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET
  powerpc/32: Fix again csum_partial_copy_generic()
  powerpc/powernv: Fix corrupted PE allocation bitmap on releasing PE
  powerpc/powernv: Fix crash on releasing compound PE
  powerpc/xics/opal: Fix processor numbers in OPAL ICP
  powerpc/pseries: Fix little endian build with CONFIG_KEXEC=n
2016-09-09 08:43:42 -07:00
Suresh Warrier
c57875f5f9 KVM: PPC: Book3S HV: Enable IRQ bypass
Add the irq_bypass_add_producer and irq_bypass_del_producer
functions. These functions get called whenever a GSI is being
defined for a guest. They create/remove the mapping between
host real IRQ numbers and the guest GSI.

Add the following helper functions to manage the
passthrough IRQ map.

kvmppc_set_passthru_irq()
  Creates a mapping in the passthrough IRQ map that maps a host
  IRQ to a guest GSI. It allocates the structure (one per guest VM)
  the first time it is called.

kvmppc_clr_passthru_irq()
  Removes the passthrough IRQ map entry given a guest GSI.
  The passthrough IRQ map structure is not freed even when the
  number of mapped entries goes to zero. It is only freed when
  the VM is destroyed.

[paulus@ozlabs.org - modified to use is_pnv_opal_msi() rather than
 requiring all passed-through interrupts to use the same irq_chip;
 changed deletion so it zeroes out the r_hwirq field rather than
 copying the last entry down and decrementing the number of entries.]

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-09 16:26:51 +10:00
Suresh Warrier
8daaafc88b KVM: PPC: Book3S HV: Introduce kvmppc_passthru_irqmap
This patch introduces an IRQ mapping structure, the
kvmppc_passthru_irqmap structure that is to be used
to map the real hardware IRQ in the host with the virtual
hardware IRQ (gsi) that is injected into a guest by KVM for
passthrough adapters.

Currently, we assume a separate IRQ mapping structure for
each guest. Each kvmppc_passthru_irqmap has a mapping arrays,
containing all defined real<->virtual IRQs.

[paulus@ozlabs.org - removed irq_chip field from struct
 kvmppc_passthru_irqmap; changed parameter for
 kvmppc_get_passthru_irqmap from struct kvm_vcpu * to struct
 kvm *, removed small cached array.]

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-09 16:26:19 +10:00
Suresh Warrier
9576730d0e KVM: PPC: select IRQ_BYPASS_MANAGER
Select IRQ_BYPASS_MANAGER for PPC when CONFIG_KVM is set.
Add the PPC producer functions for add and del producer.

[paulus@ozlabs.org - Moved new functions from book3s.c to powerpc.c
 so booke compiles; added kvm_arch_has_irq_bypass implementation.]

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-09 16:26:19 +10:00
Suresh Warrier
37f55d30df KVM: PPC: Book3S HV: Convert kvmppc_read_intr to a C function
Modify kvmppc_read_intr to make it a C function.  Because it is called
from kvmppc_check_wake_reason, any of the assembler code that calls
either kvmppc_read_intr or kvmppc_check_wake_reason now has to assume
that the volatile registers might have been modified.

This also adds in the optimization of clearing saved_xirr in the case
where we completely handle and EOI an IPI.  Without this, the next
device interrupt will require two trips through the host interrupt
handling code.

[paulus@ozlabs.org - made kvmppc_check_wake_reason create a stack frame
 when it is calling kvmppc_read_intr, which means we can set r12 to
 the trap number (0x500) after the call to kvmppc_read_intr, instead
 of using r31.  Also moved the deliver_guest_interrupt label so as to
 restore XER and CTR, plus other minor tweaks.]

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-09 16:25:25 +10:00
Paul Mackerras
99212c864e Merge branch 'kvm-ppc-infrastructure' into kvm-ppc-next
This merges the topic branch 'kvm-ppc-infrastructure' into kvm-ppc-next
so that I can then apply further patches that need the changes in the
kvm-ppc-infrastructure branch.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-09 16:24:23 +10:00
Paolo Bonzini
3f25777499 powerpc: move hmi.c to arch/powerpc/kvm/
hmi.c functions are unused unless sibling_subcore_state is nonzero, and
that in turn happens only if KVM is in use.  So move the code to
arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE
rather than CONFIG_PPC_BOOK3S_64.  The sibling_subcore_state is also
included in struct paca_struct only if KVM is supported by the kernel.

Cc: Daniel Axtens <dja@axtens.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: kvm-ppc@vger.kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-09 16:18:07 +10:00
Suresh Warrier
4ee11c1a9f powerpc/powernv: Provide facilities for EOI, usable from real mode
This adds a new function pnv_opal_pci_msi_eoi() which does the part of
end-of-interrupt (EOI) handling of an MSI which involves doing an
OPAL call.  This function can be called in real mode.  This doesn't
just export pnv_ioda2_msi_eoi() because that does a call to
icp_native_eoi(), which does not work in real mode.

This also adds a function, is_pnv_opal_msi(), which KVM can call to
check whether an interrupt is one for which we should be calling
pnv_opal_pci_msi_eoi() when we need to do an EOI.

[paulus@ozlabs.org - split out the addition of pnv_opal_pci_msi_eoi()
 from Suresh's patch "KVM: PPC: Book3S HV: Handle passthrough
 interrupts in guest"; added is_pnv_opal_msi(); wrote description.]

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-09 16:17:59 +10:00
Suresh Warrier
07b1fdf5bd powerpc: Add simple cache inhibited MMIO accessors
Add simple cache inhibited accessors for memory mapped I/O.
Unlike the accessors built from the DEF_MMIO_* macros, these
don't include any hardware memory barriers, callers need to
manage memory barriers on their own. These can only be called
in hypervisor real mode.

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
[paulus@ozlabs.org - added line to comment]
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-09 16:17:59 +10:00
Paul Mackerras
0eeede0c63 powerpc/mm: Speed up computation of base and actual page size for a HPTE
This replaces a 2-D search through an array with a simple 8-bit table
lookup for determining the actual and/or base page size for a HPT entry.

The encoding in the second doubleword of the HPTE is designed to encode
the actual and base page sizes without using any more bits than would be
needed for a 4k page number, by using between 1 and 8 low-order bits of
the RPN (real page number) field to encode the page sizes.  A single
"large page" bit in the first doubleword indicates that these low-order
bits are to be interpreted like this.

We can determine the page sizes by using the low-order 8 bits of the RPN
to look up a 256-entry table.  For actual page sizes less than 1MB, some
of the upper bits of these 8 bits are going to be real address bits, but
we can cope with that by replicating the entries for those smaller page
sizes.

While we're at it, let's move the hpte_page_size() and hpte_base_page_size()
functions from a KVM-specific header to a header for 64-bit HPT systems,
since this computation doesn't have anything specifically to do with KVM.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-09 16:14:48 +10:00
Paul Mackerras
f077aaf075 powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET
In commit c60ac5693c ("powerpc: Update kernel VSID range", 2013-03-13)
we lost a check on the region number (the top four bits of the effective
address) for addresses below PAGE_OFFSET.  That commit replaced a check
that the top 18 bits were all zero with a check that bits 46 - 59 were
zero (performed for all addresses, not just user addresses).

This means that userspace can access an address like 0x1000_0xxx_xxxx_xxxx
and we will insert a valid SLB entry for it.  The VSID used will be the
same as if the top 4 bits were 0, but the page size will be some random
value obtained by indexing beyond the end of the mm_ctx_high_slices_psize
array in the paca.  If that page size is the same as would be used for
region 0, then userspace just has an alias of the region 0 space.  If the
page size is different, then no HPTE will be found for the access, and
the process will get a SIGSEGV (since hash_page_mm() will refuse to create
a HPTE for the bogus address).

The access beyond the end of the mm_ctx_high_slices_psize can be at most
5.5MB past the array, and so will be in RAM somewhere.  Since the access
is a load performed in real mode, it won't fault or crash the kernel.
At most this bug could perhaps leak a little bit of information about
blocks of 32 bytes of memory located at offsets of i * 512kB past the
paca->mm_ctx_high_slices_psize array, for 1 <= i <= 11.

Fixes: c60ac5693c ("powerpc: Update kernel VSID range")
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-08 13:15:33 +10:00
Christophe Leroy
8540571e01 powerpc/32: Fix again csum_partial_copy_generic()
Commit 7aef413656 ("powerpc32: rewrite csum_partial_copy_generic()
based on copy_tofrom_user()") introduced a bug when destination address
is odd and len is lower than cacheline size.

In that case the resulting csum value doesn't have to be rotated one
byte because the cache-aligned copy part is skipped so no alignment
is performed.

Fixes: 7aef413656 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()")
Cc: stable@vger.kernel.org # v4.6+
Reported-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Tested-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-08 13:15:02 +10:00
Gavin Shan
caa58f8088 powerpc/powernv: Fix corrupted PE allocation bitmap on releasing PE
In pnv_ioda_free_pe(), the PE object (including the associated PE
number) is cleared before resetting the corresponding bit in the
PE allocation bitmap. It means PE#0 is always released to the bitmap
wrongly.

This fixes above issue by caching the PE number before the PE object
is cleared.

Fixes: 1e9167726c ("powerpc/powernv: Use PE instead of number during setup and release"
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-08 13:12:52 +10:00
Suraj Jitindar Singh
2a27f514a4 KVM: PPC: Implement existing and add new halt polling vcpu stats
vcpu stats are used to collect information about a vcpu which can be viewed
in the debugfs. For example halt_attempted_poll and halt_successful_poll
are used to keep track of the number of times the vcpu attempts to and
successfully polls. These stats are currently not used on powerpc.

Implement incrementation of the halt_attempted_poll and
halt_successful_poll vcpu stats for powerpc. Since these stats are summed
over all the vcpus for all running guests it doesn't matter which vcpu
they are attributed to, thus we choose the current runner vcpu of the
vcore.

Also add new vcpu stats: halt_poll_success_ns, halt_poll_fail_ns and
halt_wait_ns to be used to accumulate the total time spend polling
successfully, polling unsuccessfully and waiting respectively, and
halt_successful_wait to accumulate the number of times the vcpu waits.
Given that halt_poll_success_ns, halt_poll_fail_ns and halt_wait_ns are
expressed in nanoseconds it is necessary to represent these as 64-bit
quantities, otherwise they would overflow after only about 4 seconds.

Given that the total time spend either polling or waiting will be known and
the number of times that each was done, it will be possible to determine
the average poll and wait times. This will give the ability to tune the kvm
module parameters based on the calculated average wait and poll times.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-08 12:25:37 +10:00
Suraj Jitindar Singh
8a7e75d47b KVM: Add provisioning for ulong vm stats and u64 vcpu stats
vms and vcpus have statistics associated with them which can be viewed
within the debugfs. Currently it is assumed within the vcpu_stat_get() and
vm_stat_get() functions that all of these statistics are represented as
u32s, however the next patch adds some u64 vcpu statistics.

Change all vcpu statistics to u64 and modify vcpu_stat_get() accordingly.
Since vcpu statistics are per vcpu, they will only be updated by a single
vcpu at a time so this shouldn't present a problem on 32-bit machines
which can't atomically increment 64-bit numbers. However vm statistics
could potentially be updated by multiple vcpus from that vm at a time.
To avoid the overhead of atomics make all vm statistics ulong such that
they are 64-bit on 64-bit systems where they can be atomically incremented
and are 32-bit on 32-bit systems which may not be able to atomically
increment 64-bit numbers. Modify vm_stat_get() to expect ulongs.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-08 12:25:37 +10:00
Suraj Jitindar Singh
0cda69dd7c KVM: PPC: Book3S HV: Implement halt polling
This patch introduces new halt polling functionality into the kvm_hv kernel
module. When a vcore is idle it will poll for some period of time before
scheduling itself out.

When all of the runnable vcpus on a vcore have ceded (and thus the vcore is
idle) we schedule ourselves out to allow something else to run. In the
event that we need to wake up very quickly (for example an interrupt
arrives), we are required to wait until we get scheduled again.

Implement halt polling so that when a vcore is idle, and before scheduling
ourselves, we poll for vcpus in the runnable_threads list which have
pending exceptions or which leave the ceded state. If we poll successfully
then we can get back into the guest very quickly without ever scheduling
ourselves, otherwise we schedule ourselves out as before.

There exists generic halt_polling code in virt/kvm_main.c, however on
powerpc the polling conditions are different to the generic case. It would
be nice if we could just implement an arch specific kvm_check_block()
function, but there is still other arch specific things which need to be
done for kvm_hv (for example manipulating vcore states) which means that a
separate implementation is the best option.

Testing of this patch with a TCP round robin test between two guests with
virtio network interfaces has found a decrease in round trip time of ~15us
on average. A performance gain is only seen when going out of and
back into the guest often and quickly, otherwise there is no net benefit
from the polling. The polling interval is adjusted such that when we are
often scheduled out for long periods of time it is reduced, and when we
often poll successfully it is increased. The rate at which the polling
interval increases or decreases, and the maximum polling interval, can
be set through module parameters.

Based on the implementation in the generic kvm module by Wanpeng Li and
Paolo Bonzini, and on direction from Paul Mackerras.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-08 12:21:45 +10:00
Suraj Jitindar Singh
7b5f8272c7 KVM: PPC: Book3S HV: Change vcore element runnable_threads from linked-list to array
The struct kvmppc_vcore is a structure used to store various information
about a virtual core for a kvm guest. The runnable_threads element of the
struct provides a list of all of the currently runnable vcpus on the core
(those in the KVMPPC_VCPU_RUNNABLE state). The previous implementation of
this list was a linked_list. The next patch requires that the list be able
to be iterated over without holding the vcore lock.

Reimplement the runnable_threads list in the kvmppc_vcore struct as an
array. Implement function to iterate over valid entries in the array and
update access sites accordingly.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-08 12:21:44 +10:00
Suraj Jitindar Singh
e64fb7e272 KVM: PPC: Book3S HV: Move struct kvmppc_vcore from kvm_host.h to kvm_book3s.h
The next commit will introduce a member to the kvmppc_vcore struct which
references MAX_SMT_THREADS which is defined in kvm_book3s_asm.h, however
this file isn't included in kvm_host.h directly. Thus compiling for
certain platforms such as pmac32_defconfig and ppc64e_defconfig with KVM
fails due to MAX_SMT_THREADS not being defined.

Move the struct kvmppc_vcore definition to kvm_book3s.h which explicitly
includes kvm_book3s_asm.h.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-09-08 12:21:44 +10:00
Kees Cook
81409e9e28 usercopy: fold builtin_const check into inline function
Instead of having each caller of check_object_size() need to remember to
check for a const size parameter, move the check into check_object_size()
itself. This actually matches the original implementation in PaX, though
this commit cleans up the now-redundant builtin_const() calls in the
various architectures.

Signed-off-by: Kees Cook <keescook@chromium.org>
2016-09-06 12:17:29 -07:00
Sebastian Andrzej Siewior
da3ed6519b powerpc/mmu nohash: Convert to hotplug state machine
Install the callbacks via the state machine.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: rt@linutronix.de
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20160818125731.27256-17-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-06 18:30:27 +02:00
Sebastian Andrzej Siewior
68e694dcef powerpc/powermac: Convert to hotplug state machine
Install the callbacks via the state machine.
I assume here that the powermac has two CPUs and so only one can go up
or down at a time. The variable smp_core99_host_open is here to ensure
that we do not try to open or close the i2c host twice if something goes
wrong and we invoke the prepare or online callback twice due to
rollback.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: rt@linutronix.de
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20160818125731.27256-16-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-06 18:30:26 +02:00
Gavin Shan
b314427a52 powerpc/powernv: Fix crash on releasing compound PE
The compound PE is created to accommodate the devices attached to
one specific PCI bus that consume multiple M64 segments. The compound
PE is made up of one master PE and possibly multiple slave PEs. The
slave PEs should be destroyed when releasing the master PE. A kernel
crash happens when derferencing @pe->pdev on releasing the slave PE
in pnv_ioda_deconfigure_pe().

  # echo 0 > /sys/bus/pci/slots/C7/power
  iommu: Removing device 0000:01:00.1 from group 0
  iommu: Removing device 0000:01:00.0 from group 0
  Unable to handle kernel paging request for data at address 0x00000010
  Faulting instruction address: 0xc00000000005d898
  cpu 0x1: Vector: 300 (Data Access) at [c000000fe8217620]
      pc: c00000000005d898: pnv_ioda_release_pe+0x288/0x610
      lr: c00000000005dbdc: pnv_ioda_release_pe+0x5cc/0x610
      sp: c000000fe82178a0
     msr: 9000000000009033
     dar: 10
   dsisr: 40000000
    current = 0xc000000fe815ab80
    paca    = 0xc00000000ff00400	 softe: 0	 irq_happened: 0x01
      pid   = 2709, comm = sh
  Linux version 4.8.0-rc5-gavin-00006-g745efdb (gwshan@gwshan) \
  (gcc version 4.9.3 (Buildroot 2016.02-rc2-00093-g5ea3bce) ) #586 SMP \
  Tue Sep 6 13:37:29 AEST 2016
  enter ? for help
  [c000000fe8217940] c00000000005d684 pnv_ioda_release_pe+0x74/0x610
  [c000000fe82179e0] c000000000034460 pcibios_release_device+0x50/0x70
  [c000000fe8217a10] c0000000004aba80 pci_release_dev+0x50/0xa0
  [c000000fe8217a40] c000000000704898 device_release+0x58/0xf0
  [c000000fe8217ac0] c000000000470510 kobject_release+0x80/0xf0
  [c000000fe8217b00] c000000000704dd4 put_device+0x24/0x40
  [c000000fe8217b20] c0000000004af94c pci_remove_bus_device+0x12c/0x150
  [c000000fe8217b60] c000000000034244 pci_hp_remove_devices+0x94/0xd0
  [c000000fe8217ba0] c0000000004ca444 pnv_php_disable_slot+0x64/0xb0
  [c000000fe8217bd0] c0000000004c88c0 power_write_file+0xa0/0x190
  [c000000fe8217c50] c0000000004c248c pci_slot_attr_store+0x3c/0x60
  [c000000fe8217c70] c0000000002d6494 sysfs_kf_write+0x94/0xc0
  [c000000fe8217cb0] c0000000002d50f0 kernfs_fop_write+0x180/0x260
  [c000000fe8217d00] c0000000002334a0 __vfs_write+0x40/0x190
  [c000000fe8217d90] c000000000234738 vfs_write+0xc8/0x240
  [c000000fe8217de0] c000000000236250 SyS_write+0x60/0x110
  [c000000fe8217e30] c000000000009524 system_call+0x38/0x108

It fixes the kernel crash by bypassing releasing resources (DMA,
IO and memory segments, PELTM) because there are no resources assigned
to the slave PE.

Fixes: c5f7700bbd ("powerpc/powernv: Dynamically release PE")
Reported-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-06 14:54:46 +10:00
Benjamin Herrenschmidt
f8e33475b0 powerpc/xics/opal: Fix processor numbers in OPAL ICP
When using the OPAL ICP backend we incorrectly pass Linux CPU numbers
rather than HW CPU numbers to OPAL.

Fixes: d74361881f ("powerpc/xics: Add ICP OPAL backend")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-06 14:54:45 +10:00
Thiago Jung Bauermann
d81d825821 powerpc/pseries: Fix little endian build with CONFIG_KEXEC=n
On ppc64le, builds with CONFIG_KEXEC=n fail with:

arch/powerpc/platforms/pseries/setup.c: In function ‘pseries_big_endian_exceptions’:
arch/powerpc/platforms/pseries/setup.c:403:13: error: implicit declaration of function ‘kdump_in_progress’
  if (rc && !kdump_in_progress())

This is because pseries/setup.c includes <linux/kexec.h>, but
kdump_in_progress() is defined in <asm/kexec.h>. This is a problem
because the former only includes the latter if CONFIG_KEXEC_CORE=y.

Fix it by including <asm/kexec.h> directly, as is done in powernv/setup.c.

Fixes: d3cbff1b5a ("powerpc: Put exception configuration in a common place")
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-06 14:54:08 +10:00
Cyril Bur
78a3e8889b powerpc: signals: Discard transaction state from signal frames
Userspace can begin and suspend a transaction within the signal
handler which means they might enter sys_rt_sigreturn() with the
processor in suspended state.

sys_rt_sigreturn() wants to restore process context (which may have
been in a transaction before signal delivery). To do this it must
restore TM SPRS. To achieve this, any transaction initiated within the
signal frame must be discarded in order to be able to restore TM SPRs
as TM SPRs can only be manipulated non-transactionally..
>From the PowerPC ISA:
  TM Bad Thing Exception [Category: Transactional Memory]
   An attempt is made to execute a mtspr targeting a TM register in
   other than Non-transactional state.

Not doing so results in a TM Bad Thing:
[12045.221359] Kernel BUG at c000000000050a40 [verbose debug info unavailable]
[12045.221470] Unexpected TM Bad Thing exception at c000000000050a40 (msr 0x201033)
[12045.221540] Oops: Unrecoverable exception, sig: 6 [#1]
[12045.221586] SMP NR_CPUS=2048 NUMA PowerNV
[12045.221634] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE
 nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter
 ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables kvm_hv kvm
 uio_pdrv_genirq ipmi_powernv uio powernv_rng ipmi_msghandler autofs4 ses enclosure
 scsi_transport_sas bnx2x ipr mdio libcrc32c
[12045.222167] CPU: 68 PID: 6178 Comm: sigreturnpanic Not tainted 4.7.0 #34
[12045.222224] task: c0000000fce38600 ti: c0000000fceb4000 task.ti: c0000000fceb4000
[12045.222293] NIP: c000000000050a40 LR: c0000000000163bc CTR: 0000000000000000
[12045.222361] REGS: c0000000fceb7ac0 TRAP: 0700   Not tainted (4.7.0)
[12045.222418] MSR: 9000000300201033 <SF,HV,ME,IR,DR,RI,LE,TM[SE]> CR: 28444280  XER: 20000000
[12045.222625] CFAR: c0000000000163b8 SOFTE: 0 PACATMSCRATCH: 900000014280f033
GPR00: 01100000b8000001 c0000000fceb7d40 c00000000139c100 c0000000fce390d0
GPR04: 900000034280f033 0000000000000000 0000000000000000 0000000000000000
GPR08: 0000000000000000 b000000000001033 0000000000000001 0000000000000000
GPR12: 0000000000000000 c000000002926400 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 00003ffff98cadd0 00003ffff98cb470 0000000000000000
GPR28: 900000034280f033 c0000000fceb7ea0 0000000000000001 c0000000fce390d0
[12045.223535] NIP [c000000000050a40] tm_restore_sprs+0xc/0x1c
[12045.223584] LR [c0000000000163bc] tm_recheckpoint+0x5c/0xa0
[12045.223630] Call Trace:
[12045.223655] [c0000000fceb7d80] [c000000000026e74] sys_rt_sigreturn+0x494/0x6c0
[12045.223738] [c0000000fceb7e30] [c0000000000092e0] system_call+0x38/0x108
[12045.223806] Instruction dump:
[12045.223841] 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8
[12045.223955] 4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020
[12045.224074] ---[ end trace cb8002ee240bae76 ]---

It isn't clear exactly if there is really a use case for userspace
returning with a suspended transaction, however, doing so doesn't (on
its own) constitute a bad frame. As such, this patch simply discards
the transactional state of the context calling the sigreturn and
continues.

Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Acked-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-29 12:48:40 +10:00
Mukesh Ojha
a9cbf0b219 powerpc/powernv : Drop reference added by kset_find_obj()
In a situation, where Linux kernel gets notified about duplicate error log
from OPAL, it is been observed that kernel fails to remove sysfs entries
(/sys/firmware/opal/elog/0xXXXXXXXX) of such error logs. This is because,
we currently search the error log/dump kobject in the kset list via
'kset_find_obj()' routine. Which eventually increment the reference count
by one, once it founds the kobject.

So, unless we decrement the reference count by one after it found the kobject,
we would not be able to release the kobject properly later.

This patch adds the 'kobject_put()' which was missing earlier.

Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-29 12:48:21 +10:00
Nicholas Piggin
cc7786d3ee powerpc/tm: do not use r13 for tabort_syscall
tabort_syscall runs with RI=1, so a nested recoverable machine
check will load the paca into r13 and overwrite what we loaded
it with, because exceptions returning to privileged mode do not
restore r13.

Fixes: b4b56f9eca (powerpc/tm: Abort syscalls in active transactions)
Cc: stable@vger.kernel.org
Signed-off-by: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-29 12:47:56 +10:00
Paul Mackerras
4b3d173d04 KVM: PPC: Always select KVM_VFIO, plus Makefile cleanup
As discussed recently on the kvm mailing list, David Gibson's
intention in commit 178a787502 ("vfio: Enable VFIO device for
powerpc", 2016-02-01) was to have the KVM VFIO device built in
on all powerpc platforms.  This patch adds the "select KVM_VFIO"
statement that makes this happen.

Currently, arch/powerpc/kvm/Makefile doesn't include vfio.o for
the 64-bit kvm module, because the list of objects doesn't use
the $(common-objs-y) list.  The reason it doesn't is because we
don't necessarily want coalesced_mmio.o or emulate.o (for example
if HV KVM is the only target), and common-objs-y includes both.

Since this is confusing, this patch adjusts the definitions so that
we now use $(common-objs-y) in the list for the 64-bit kvm.ko
module, emulate.o is removed from common-objs-y and added in the
places that need it, and the inclusion of coalesced_mmio.o now
depends on CONFIG_KVM_MMIO.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-08-25 22:25:49 +10:00
Josh Poimboeuf
9a7c348ba6 ftrace: Add return address pointer to ftrace_ret_stack
Storing this value will help prevent unwinders from getting out of sync
with the function graph tracer ret_stack.  Now instead of needing a
stateful iterator, they can compare the return address pointer to find
the right ret_stack entry.

Note that an array of 50 ftrace_ret_stack structs is allocated for every
task.  So when an arch implements this, it will add either 200 or 400
bytes of memory usage per task (depending on whether it's a 32-bit or
64-bit platform).

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a95cfcc39e8f26b89a430c56926af0bb217bc0a1.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-24 12:15:14 +02:00
Paolo Bonzini
7c379526d7 powerpc: move hmi.c to arch/powerpc/kvm/
hmi.c functions are unused unless sibling_subcore_state is nonzero, and
that in turn happens only if KVM is in use.  So move the code to
arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE
rather than CONFIG_PPC_BOOK3S_64.  The sibling_subcore_state is also
included in struct paca_struct only if KVM is supported by the kernel.

Cc: Daniel Axtens <dja@axtens.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: kvm-ppc@vger.kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22 11:09:33 +10:00
Christophe Leroy
41017a7579 powerpc: sysdev: cpm: fix gpio save_regs functions
of_mm_gpiochip_add_data() calls mm_gc->save_regs() before
setting the data. Therefore ->save_regs() cannot use
gpiochip_get_data()

[    0.275940] Unable to handle kernel paging request for data at address 0x00000130
[    0.283120] Faulting instruction address: 0xc01b44cc
[    0.288175] Oops: Kernel access of bad area, sig: 11 [#1]
[    0.293343] PREEMPT CMPC885
[    0.296141] CPU: 0 PID: 1 Comm: swapper Not tainted 4.7.0-g65124df-dirty #68
[    0.304131] task: c6074000 ti: c6080000 task.ti: c6080000
[    0.309459] NIP: c01b44cc LR: c0011720 CTR: c0011708
[    0.314372] REGS: c6081d90 TRAP: 0300   Not tainted  (4.7.0-g65124df-dirty)
[    0.322267] MSR: 00009032 <EE,ME,IR,DR,RI>  CR: 24000028  XER: 20000000
[    0.328813] DAR: 00000130 DSISR: c0000000
GPR00: c01b6d0c c6081e40 c6074000 c6017000 c9028000 c601d028 c6081dd8 00000000
GPR08: c601d028 00000000 ffffffff 00000001 24000044 00000000 c0002790 00000000
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 c05643b0 00000083
GPR24: c04a1a6c c0560000 c04a8308 c04c6480 c0012498 c6017000 c7ffcc78 c6017000
[    0.360806] NIP [c01b44cc] gpiochip_get_data+0x4/0xc
[    0.365684] LR [c0011720] cpm1_gpio16_save_regs+0x18/0x44
[    0.370972] Call Trace:
[    0.373451] [c6081e50] [c01b6d0c] of_mm_gpiochip_add_data+0x70/0xdc
[    0.379624] [c6081e70] [c00124c0] cpm_init_par_io+0x28/0x118
[    0.385238] [c6081e80] [c04a8ac0] do_one_initcall+0xb0/0x17c
[    0.390819] [c6081ef0] [c04a8cbc] kernel_init_freeable+0x130/0x1dc
[    0.396924] [c6081f30] [c00027a4] kernel_init+0x14/0x110
[    0.402177] [c6081f40] [c000b424] ret_from_kernel_thread+0x5c/0x64
[    0.408233] Instruction dump:
[    0.411168] 4182fafc 3f80c040 48234c6d 3bc0fff0 3b9c5ed0 4bfffaf4 81290020 712a0004
[    0.418825] 4182fb34 48234c51 4bfffb2c 81230004 <80690130> 4e800020 7c0802a6 9421ffe0
[    0.426763] ---[ end trace fe4113ee21d72ffa ]---

fixes: e65078f1f3 ("powerpc: sysdev: cpm1: use gpiochip data pointer")
fixes: a14a2d484b ("powerpc: cpm_common: use gpiochip data pointer")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22 11:09:33 +10:00
Nicholas Piggin
a74599a504 powerpc/pseries: PACA save area fix for MCE vs MCE
MCE must not enable MSR_RI until PACA_EXMC is no longer being used.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22 11:09:33 +10:00
Nicholas Piggin
3f3b5dc14c powerpc/pseries: PACA save area fix for general exception vs MCE
MCE must not use PACA_EXGEN. When a general exception enables MSR_RI,
that means SPRN_SRR[01] and SPRN_SPRG are no longer used. However the
PACA save area is still in use.
Acked-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22 11:09:33 +10:00
Michael Ellerman
66443efa83 powerpc/prom: Fix sub-processor option passed to ibm, client-architecture-support
When booting from an OpenFirmware which supports it, we use the
"ibm,client-architecture-support" firmware call to communicate
our capabilities to firmware.

The format of the structure we pass to firmware is specified in
PAPR (Power Architecture Platform Requirements), or the public version
LoPAPR (Linux on Power Architecture Platform Reference).

Referring to table 244 in LoPAPR v1.1, option vector 5 contains a 4 byte
field at bytes 17-20 for the "Platform Facilities Enable". This is
followed by a 1 byte field at byte 21 for "Sub-Processor Represenation
Level".

Comparing to the code, there we have the Platform Facilities
options (OV5_PFO_*) at byte 17, but we fail to pad that field out to its
full width of 4 bytes. This means the OV5_SUB_PROCESSORS option is
incorrectly placed at byte 18.

Fix it by adding zero bytes for bytes 18, 19, 20, and comment the bytes
to hopefully make it clearer in future.

As far as I'm aware nothing actually consumes this value at this time,
so the effect of this bug is nil in practice.

It does mean we've been incorrectly setting bit 15 of the "Platform
Facilities Enable" option for the past ~3 1/2 years, so we should avoid
allocating that bit to anything else in future.

Fixes: df77c79920 ("powerpc/pseries: Update ibm,architecture.vec for PAPR 2.7/POWER8")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22 11:09:33 +10:00
Boqun Feng
19ab58d19e powerpc, hotplug: Avoid to touch non-existent cpumasks.
We observed a kernel oops when running a PPC guest with config NR_CPUS=4
and qemu option "-smp cores=1,threads=8":

[   30.634781] Unable to handle kernel paging request for data at
address 0xc00000014192eb17
[   30.636173] Faulting instruction address: 0xc00000000003e5cc
[   30.637069] Oops: Kernel access of bad area, sig: 11 [#1]
[   30.637877] SMP NR_CPUS=4 NUMA pSeries
[   30.638471] Modules linked in:
[   30.638949] CPU: 3 PID: 27 Comm: migration/3 Not tainted
4.7.0-07963-g9714b26 #1
[   30.640059] task: c00000001e29c600 task.stack: c00000001e2a8000
[   30.640956] NIP: c00000000003e5cc LR: c00000000003e550 CTR:
0000000000000000
[   30.642001] REGS: c00000001e2ab8e0 TRAP: 0300   Not tainted
(4.7.0-07963-g9714b26)
[   30.643139] MSR: 8000000102803033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 22004084  XER: 00000000
[   30.644583] CFAR: c000000000009e98 DAR: c00000014192eb17 DSISR: 40000000 SOFTE: 0
GPR00: c00000000140a6b8 c00000001e2abb60 c0000000016dd300 0000000000000003
GPR04: 0000000000000000 0000000000000004 c0000000016e5920 0000000000000008
GPR08: 0000000000000004 c00000014192eb17 0000000000000000 0000000000000020
GPR12: c00000000140a6c0 c00000000ffffc00 c0000000000d3ea8 c00000001e005680
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 c00000001e6b3a00 0000000000000000 0000000000000001
GPR24: c00000001ff85138 c00000001ff85130 000000001eb6f000 0000000000000001
GPR28: 0000000000000000 c0000000017014e0 0000000000000000 0000000000000018
[   30.653882] NIP [c00000000003e5cc] __cpu_disable+0xcc/0x190
[   30.654713] LR [c00000000003e550] __cpu_disable+0x50/0x190
[   30.655528] Call Trace:
[   30.655893] [c00000001e2abb60] [c00000000003e550] __cpu_disable+0x50/0x190 (unreliable)
[   30.657280] [c00000001e2abbb0] [c0000000000aca0c] take_cpu_down+0x5c/0x100
[   30.658365] [c00000001e2abc10] [c000000000163918] multi_cpu_stop+0x1a8/0x1e0
[   30.659617] [c00000001e2abc60] [c000000000163cc0] cpu_stopper_thread+0xf0/0x1d0
[   30.660737] [c00000001e2abd20] [c0000000000d8d70] smpboot_thread_fn+0x290/0x2a0
[   30.661879] [c00000001e2abd80] [c0000000000d3fa8] kthread+0x108/0x130
[   30.662876] [c00000001e2abe30] [c000000000009968] ret_from_kernel_thread+0x5c/0x74
[   30.664017] Instruction dump:
[   30.664477] 7bde1f24 38a00000 787f1f24 3b600001 39890008 7d204b78 7d05e214 7d0b07b4
[   30.665642] 796b1f24 7d26582a 7d204a14 7d29f214 <7d4048a8> 7d4a3878 7d4049ad 40c2fff4
[   30.666854] ---[ end trace 32643b7195717741 ]---

The reason of this is that in __cpu_disable(), when we try to set the
cpu_sibling_mask or cpu_core_mask of the sibling CPUs of the disabled
one, we don't check whether the current configuration employs those
sibling CPUs(hw threads). And if a CPU is not employed by a
configuration, the percpu structures cpu_{sibling,core}_mask are not
allocated, therefore accessing those cpumasks will result in problems as
above.

This patch fixes this problem by adding an addition check on whether the
id is no less than nr_cpu_ids in the sibling CPU iteration code.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22 11:09:33 +10:00
Paul Gortmaker
8a39b05f08 powerpc: migrate exception table users off module.h and onto extable.h
These files were only including module.h for exception table
related functions.  We've now separated that content out into its
own file "extable.h" so now move over to that and avoid all the
extra header content in module.h that we don't really need to compile
these files.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22 11:09:33 +10:00
Andrzej Hajda
6096481649 powerpc/powernv/pci: fix iterator signedness
Unsigned type is always non-negative, so the loop could not end in case
condition is never true.

The problem has been detected using semantic patch
scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci

Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22 11:09:33 +10:00
Mauricio Faria de Oliveira
2dd9c11b9d powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)
This patch leverages 'struct pci_host_bridge' from the PCI subsystem
in order to free the pci_controller only after the last reference to
its devices is dropped (avoiding an oops in pcibios_release_device()
if the last reference is dropped after pcibios_free_controller()).

The patch relies on pci_host_bridge.release_fn() (and .release_data),
which is called automatically by the PCI subsystem when the root bus
is released (i.e., the last reference is dropped).  Those fields are
set via pci_set_host_bridge_release() (e.g. in the platform-specific
implementation of pcibios_root_bridge_prepare()).

It introduces the 'pcibios_free_controller_deferred()' .release_fn()
and it expects .release_data to hold a pointer to the pci_controller.

The function implictly calls 'pcibios_free_controller()', so an user
must *NOT* explicitly call it if using the new _deferred() callback.

The functionality is enabled for pseries (although it isn't platform
specific, and may be used by cxl).

Details on not-so-elegant design choices:

 - Use 'pci_host_bridge.release_data' field as pointer to associated
   'struct pci_controller' so *not* to 'pci_bus_to_host(bridge->bus)'
   in pcibios_free_controller_deferred().

   That's because pci_remove_root_bus() sets 'host_bridge->bus = NULL'
   (so, if the last reference is released after pci_remove_root_bus()
   runs, which eventually reaches pcibios_free_controller_deferred(),
   that would hit a null pointer dereference).

   The cxl/vphb.c code calls pci_remove_root_bus(), and the cxl folks
   are interested in this fix.

Test-case #1 (hold references)

  # ls -ld /sys/block/sd* | grep -m1 0021:01:00.0
  <...> /sys/block/sdaa -> ../devices/pci0021:01/0021:01:00.0/<...>

  # ls -ld /sys/block/sd* | grep -m1 0021:01:00.1
  <...> /sys/block/sdab -> ../devices/pci0021:01/0021:01:00.1/<...>

  # cat >/dev/sdaa & pid1=$!
  # cat >/dev/sdab & pid2=$!

  # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r
  Validating PHB DLPAR capability...yes.
  [  594.306719] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01
  [  594.306738] pci_hp_remove_devices:    Removing 0021:01:00.0...
  ...
  [  598.236381] pci_hp_remove_devices:    Removing 0021:01:00.1...
  ...
  [  611.972077] pci_bus 0021:01: busn_res: [bus 01-ff] is released
  [  611.972140] rpadlpar_io: slot PHB 33 removed

  # kill -9 $pid1
  # kill -9 $pid2
  [  632.918088] pcibios_free_controller_deferred: domain 33, dynamic 1

Test-case #2 (don't hold references)

  # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r
  Validating PHB DLPAR capability...yes.
  [  916.357363] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01
  [  916.357386] pci_hp_remove_devices:    Removing 0021:01:00.0...
  ...
  [  920.566527] pci_hp_remove_devices:    Removing 0021:01:00.1...
  ...
  [  933.955873] pci_bus 0021:01: busn_res: [bus 01-ff] is released
  [  933.955977] pcibios_free_controller_deferred: domain 33, dynamic 1
  [  933.955999] rpadlpar_io: slot PHB 33 removed

Suggested-By: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> # cxl
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22 11:09:33 +10:00
Markus Elfring
f5ed841ce7 powerpc: mpc8349emitx: Delete unnecessary assignment for the field "owner"
The field "owner" is set by the core.
Thus delete an unneeded initialisation.

Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22 11:09:33 +10:00
Markus Elfring
e72e799c09 powerpc/512x: Delete unnecessary assignment for the field "owner"
The field "owner" is set by the core.
Thus delete an unneeded initialisation.

Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22 11:09:33 +10:00
Guenter Roeck
e340eca90e powerpc: cputhreads: Add missing include file
Powerpc builds may fail with the following build error.

Error log:
In file included from ./arch/powerpc/include/asm/mmu_context.h:11:0,
                 from ./include/linux/mmu_context.h:4,
		 from mm/mmu_context.c:8:
./arch/powerpc/include/asm/cputhreads.h: In function 'get_tensr':
./arch/powerpc/include/asm/cputhreads.h:101:2: error:
	implicit declaration of function 'cpu_has_feature'

The problem can be triggered by configuring ppc64e_defconfig and selecting
CONFIG_TICK_CPU_ACCOUNTING instead of CONFIG_VIRT_CPU_ACCOUNTING_NATIVE.

Fixes: b92a226e52 ("powerpc: Move cpu_has_feature() to a separate file")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2016-08-22 11:09:33 +10:00
Paul Mackerras
34a75b0f63 KVM: PPC: Implement kvm_arch_intc_initialized() for PPC
It doesn't make sense to create irqfds for a VM that doesn't have
in-kernel interrupt controller emulation.  There is an existing
interface for architecture code to tell the irqfd code whether or
not any interrupt controller has been initialized, called
kvm_arch_intc_initialized(), so let's implement that for powerpc.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-08-19 13:00:06 +10:00
Paul Mackerras
e48ba1cbce KVM: PPC: Book3S: Don't crash if irqfd used with no in-kernel XICS emulation
It turns out that if userspace creates a pseries-type VM without
in-kernel XICS (interrupt controller) emulation, and then connects
an eventfd to the VM as an irqfd, and the eventfd gets signalled,
that the code will try to deliver an interrupt via the non-existent
XICS object and crash the host kernel with a NULL pointer dereference.

To fix this, we check for the presence of the XICS object before
trying to deliver the interrupt, and return with an error if not.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-08-19 13:00:06 +10:00
Linus Torvalds
329f415291 KVM locks kvm_device list to prevent corruption on device creation.
PPC splits debugfs initialization from creation of the xics device to
 unlock the newly taken kvm lock earlier.
 
 s390 prevents userspace from triggering two WARN_ON_ONCE.
 
 MIPS fixes several issues in the management of TLB faults (Cc: stable).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCAAGBQJXrx2ZAAoJEED/6hsPKofoo/4H/jra5NNxvpo09LWlXTwGXxBH
 cwcfDZSiOFxgvWztKJOIjPI4ETL3mnZvb9SFWBZZh1U0kfZ/TGiWouwaDNlBkPYj
 I3YHuPI7if+yUOmJlI3N2hWa0Wo0qiMqIjKT0pQVSLLdK/CVE+xGyS+qtXTNXHQn
 pFdKlYr//7OwQEY0ow1yj5VnsFrXB1JWFyB/+N5zaCfbCaQVyZAL7rj8SUbC/32W
 CiNhrvatzierKIfPerWw8DvvBKhCgWaRuLl0W+uMncrC9Qepcx9moM2beD1txK2I
 iHor1TDxUPifGQONfWMAlw87FluzHF4vQ5nN2jyTi8TT+CEfZpZ43Q+DY7okD4w=
 =NQP9
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
 "KVM:
   - lock kvm_device list to prevent corruption on device creation.

  PPC:
   - split debugfs initialization from creation of the xics device to
     unlock the newly taken kvm lock earlier.

  s390:
   - prevent userspace from triggering two WARN_ON_ONCE.

  MIPS:
   - fix several issues in the management of TLB faults (Cc: stable)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  MIPS: KVM: Propagate kseg0/mapped tlb fault errors
  MIPS: KVM: Fix gfn range check in kseg0 tlb faults
  MIPS: KVM: Add missing gfn range check
  MIPS: KVM: Fix mapped fault broken commpage handling
  KVM: Protect device ops->create and list_add with kvm->lock
  KVM: PPC: Move xics_debugfs_init out of create
  KVM: s390: reset KVM_REQ_MMU_RELOAD if mapping the prefix failed
  KVM: s390: set the prefix initially properly
2016-08-13 10:11:14 -07:00
Linus Torvalds
8766dc68d1 powerpc fixes for 4.8 #3
- powerpc/vdso: Fix build rules to rebuild vdsos correctly from Nicholas Piggin
  - powerpc/ptrace: Fix coredump since ptrace TM changes from Cyril Bur
  - powerpc/32: Fix csum_partial_copy_generic() from Christophe Leroy
  - cxl: Set psl_fir_cntl to production environment value from Frederic Barrat
  - powerpc/eeh: Switch to conventional PCI address output in EEH log from Guilherme G. Piccoli
  - cxl: Use fixed width predefined types in data structure. from Philippe Bergheaud
  - powerpc/vdso: Add missing include file from Guenter Roeck
  - powerpc: Fix unused function warning 'lmb_to_memblock' from Alastair D'Silva
  - powerpc/powernv/ioda: Fix TCE invalidate to work in real mode again from Alexey Kardashevskiy
  - powerpc/cell: Add missing error code in spufs_mkgang() from Dan Carpenter
  - crypto: crc32c-vpmsum - Convert to CPU feature based module autoloading from Anton Blanchard
  - powerpc/pasemi: Fix coherent_dma_mask for dma engine from Darren Stevens
 
 Benjamin Herrenschmidt:
  - powerpc/32: Fix crash during static key init
  - powerpc: Update obsolete comment in setup_32.c about early_init()
  - powerpc: Print the kernel load address at the end of prom_init()
  - powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARs
  - powerpc/xics: Properly set Edge/Level type and enable resend
 
 Mahesh Salgaonkar:
  - powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
  - powerpc/powernv: Fix MCE handler to avoid trashing CR0/CR1 registers.
  - powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h
  - powerpc/powernv: Load correct TOC pointer while waking up from winkle.
 
 Andrew Donnellan:
  - cxl: Fix sparse warnings
  - cxl: Fix NULL dereference in cxl_context_init() on PowerVM guests
 
 Michael Ellerman:
  - selftests/powerpc: Specify we expect to build with std=gnu99
  - powerpc/Makefile: Use cflags-y/aflags-y for setting endian options
  - powerpc/pci: Fix endian bug in fixed PHB numbering
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXrZFAAAoJEFHr6jzI4aWAxacQALPfu/kbKJFhwX8dnbzaCwHe
 1bTZHE4bkkxfS5JrghbiLZHUeoCZucDhGGlZSPOEb5VA9lkEX3OJJRQDng754Pit
 u3pwt0SLmAxBn9BgTZy/5g5U6KMGptzJcSsKVEtZs17PKpqhPNELMm5EmGhJmNHH
 Ksycw4FhVrsjDm5n7s4IqUhsh0Z9QPOOxxb5rVgdBBxmLHz5a1FJSSCFan5WW3PT
 QNiMfg58NdBBOFbDQJWLiWXrfPPUMhXfPxHGGArXPEsa+7l5yXaygCSv5KyUBJMt
 sDxn6XZMuYzzvg4j8uc9mkDWNWiyxcxBJ6+/Hm5xf9vvpxzHAM1M8j9xqpaCHjeg
 b0fsWqVeLD+DuAVqh6rUgUERbsfUtuKXRSB+NR0hHWd7GLx707FIr3i1AAvjDODC
 qwcZg9mkcAbKAIOAmsk9aAB60jl7aENiz+bTvLYMHDhIbb+st94jajdaG7MSVn5z
 M9FFbRKmRHTW0Qoop1VuseyO9C+Lmb+ksIhBHeYaNDaJ5lzk0NwJltCNd4ybnL6h
 i+AFxuhN0uyT6OJOPqTR07+9p+k04LOSYPZR34rclKQ3Z+sQiYQAmwLMHasN6uBk
 dZxJUxmeio5J/0BXLGKLYFnaNpHnq3EQm9vdt6spn1kidmm+bOeICB8UW8AairqC
 8HasF1QrjZihmoBoXgul
 =gw2z
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Some powerpc fixes for 4.8:

  Misc:
   - powerpc/vdso: Fix build rules to rebuild vdsos correctly from Nicholas Piggin
   - powerpc/ptrace: Fix coredump since ptrace TM changes from Cyril Bur
   - powerpc/32: Fix csum_partial_copy_generic() from Christophe Leroy
   - cxl: Set psl_fir_cntl to production environment value from Frederic Barrat
   - powerpc/eeh: Switch to conventional PCI address output in EEH log from Guilherme G. Piccoli
   - cxl: Use fixed width predefined types in data structure. from Philippe Bergheaud
   - powerpc/vdso: Add missing include file from Guenter Roeck
   - powerpc: Fix unused function warning 'lmb_to_memblock' from Alastair D'Silva
   - powerpc/powernv/ioda: Fix TCE invalidate to work in real mode again from Alexey Kardashevskiy
   - powerpc/cell: Add missing error code in spufs_mkgang() from Dan Carpenter
   - crypto: crc32c-vpmsum - Convert to CPU feature based module autoloading from Anton Blanchard
   - powerpc/pasemi: Fix coherent_dma_mask for dma engine from Darren Stevens

  Benjamin Herrenschmidt:
   - powerpc/32: Fix crash during static key init
   - powerpc: Update obsolete comment in setup_32.c about early_init()
   - powerpc: Print the kernel load address at the end of prom_init()
   - powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARs
   - powerpc/xics: Properly set Edge/Level type and enable resend

  Mahesh Salgaonkar:
   - powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
   - powerpc/powernv: Fix MCE handler to avoid trashing CR0/CR1 registers.
   - powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h
   - powerpc/powernv: Load correct TOC pointer while waking up from winkle.

  Andrew Donnellan:
   - cxl: Fix sparse warnings
   - cxl: Fix NULL dereference in cxl_context_init() on PowerVM guests

  Michael Ellerman:
   - selftests/powerpc: Specify we expect to build with std=gnu99
   - powerpc/Makefile: Use cflags-y/aflags-y for setting endian options
   - powerpc/pci: Fix endian bug in fixed PHB numbering"

* tag 'powerpc-4.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (26 commits)
  selftests/powerpc: Specify we expect to build with std=gnu99
  powerpc/vdso: Fix build rules to rebuild vdsos correctly
  powerpc/Makefile: Use cflags-y/aflags-y for setting endian options
  powerpc/32: Fix crash during static key init
  powerpc: Update obsolete comment in setup_32.c about early_init()
  powerpc: Print the kernel load address at the end of prom_init()
  powerpc/ptrace: Fix coredump since ptrace TM changes
  powerpc/32: Fix csum_partial_copy_generic()
  cxl: Set psl_fir_cntl to production environment value
  powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARs
  powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
  powerpc/pci: Fix endian bug in fixed PHB numbering
  powerpc/eeh: Switch to conventional PCI address output in EEH log
  cxl: Fix sparse warnings
  cxl: Fix NULL dereference in cxl_context_init() on PowerVM guests
  cxl: Use fixed width predefined types in data structure.
  powerpc/vdso: Add missing include file
  powerpc: Fix unused function warning 'lmb_to_memblock'
  powerpc/powernv: Fix MCE handler to avoid trashing CR0/CR1 registers.
  powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h
  ...
2016-08-12 12:09:44 -07:00
Christoffer Dall
a28ebea2ad KVM: Protect device ops->create and list_add with kvm->lock
KVM devices were manipulating list data structures without any form of
synchronization, and some implementations of the create operations also
suffered from a lack of synchronization.

Now when we've split the xics create operation into create and init, we
can hold the kvm->lock mutex while calling the create operation and when
manipulating the devices list.

The error path in the generic code gets slightly ugly because we have to
take the mutex again and delete the device from the list, but holding
the mutex during anon_inode_getfd or releasing/locking the mutex in the
common non-error path seemed wrong.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-08-12 12:01:27 +02:00
Christoffer Dall
023e9fddc3 KVM: PPC: Move xics_debugfs_init out of create
As we are about to hold the kvm->lock during the create operation on KVM
devices, we should move the call to xics_debugfs_init into its own
function, since holding a mutex over extended amounts of time might not
be a good idea.

Introduce an init operation on the kvm_device_ops struct which cannot
fail and call this, if configured, after the device has been created.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-08-12 12:01:26 +02:00
Nicholas Piggin
b9a4a0d02c powerpc/vdso: Fix build rules to rebuild vdsos correctly
When using if_changed, we need to add FORCE as a dependency (see
Documentation/kbuild/makefiles.txt) otherwise we don't get command line
change checking amongst other things. This has resulted in vdsos not
being rebuilt when switching between big and little endian.

The vdso64/32ld commands have to be changed around to avoid pulling
FORCE into the linker command line (code copied from x86).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-10 23:04:12 +10:00
Michael Ellerman
164af597ce powerpc/Makefile: Use cflags-y/aflags-y for setting endian options
When we introduced the little endian support, we added the endian flags
to CC directly using override. I don't know the history of why we did
that, I suspect no one does.

Although this mostly works, it has one bug, which is that CROSS32CC
doesn't get -mbig-endian. That means when the compiler is little endian
by default and the user is building big endian, vdso32 is incorrectly
compiled as little endian and the kernel fails to build.

Instead we can add the endian flags to cflags-y/aflags-y, and then
append those to KBUILD_CFLAGS/KBUILD_AFLAGS.

This has the advantage of being 1) less ugly, 2) the documented way of
adding flags in the arch Makefile and 3) it fixes building vdso32 with a
LE toolchain.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-10 23:01:53 +10:00
Benjamin Herrenschmidt
97f6e0cc35 powerpc/32: Fix crash during static key init
We cannot do those initializations from apply_feature_fixups() as
this function runs in a very restricted environment on 32-bit where
the kernel isn't running at its linked address and the PTRRELOC()
macro must be used for any global accesss.

Instead, split them into a separtate steup_feature_keys() function
which is called in a more suitable spot on ppc32.

Fixes: 309b315b6e ("powerpc: Call jump_label_init() in apply_feature_fixups()")
Reported-and-tested-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-10 19:41:58 +10:00
Benjamin Herrenschmidt
f9cc1d1f80 powerpc: Update obsolete comment in setup_32.c about early_init()
We don't identify the machine type anymore...

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-10 19:38:02 +10:00
Benjamin Herrenschmidt
7d70c63c71 powerpc: Print the kernel load address at the end of prom_init()
This makes it easier to debug crashes that happen very early before
the kernel takes over Open Firmware by allowing us to relate the OF
reported crashing addresses to offsets within the kernel.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-10 19:37:43 +10:00
Cyril Bur
c7a318ba86 powerpc/ptrace: Fix coredump since ptrace TM changes
Commit 8d460f6156 ("powerpc/process: Add the function
flush_tmregs_to_thread") added flush_tmregs_to_thread() and included
the assumption that it would only be called for a task which is not
current.

Although this is correct for ptrace, when generating a core dump, some
of the routines which call flush_tmregs_to_thread() are called. This
leads to a WARNing such as:

  Not expecting ptrace on self: TM regs may be incorrect
  ------------[ cut here ]------------
  WARNING: CPU: 123 PID: 7727 at arch/powerpc/kernel/process.c:1088 flush_tmregs_to_thread+0x78/0x80
  CPU: 123 PID: 7727 Comm: libvirtd Not tainted 4.8.0-rc1-gcc6x-g61e8a0d #1
  task: c000000fe631b600 task.stack: c000000fe63b0000
  NIP: c00000000001a1a8 LR: c00000000001a1a4 CTR: c000000000717780
  REGS: c000000fe63b3420 TRAP: 0700   Not tainted  (4.8.0-rc1-gcc6x-g61e8a0d)
  MSR: 900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 28004222  XER: 20000000
  ...
  NIP [c00000000001a1a8] flush_tmregs_to_thread+0x78/0x80
  LR [c00000000001a1a4] flush_tmregs_to_thread+0x74/0x80
  Call Trace:
   flush_tmregs_to_thread+0x74/0x80 (unreliable)
   vsr_get+0x64/0x1a0
   elf_core_dump+0x604/0x1430
   do_coredump+0x5fc/0x1200
   get_signal+0x398/0x740
   do_signal+0x54/0x2b0
   do_notify_resume+0x98/0xb0
   ret_from_except_lite+0x70/0x74

So fix flush_tmregs_to_thread() to detect the case where it is called on
current, and a transaction is active, and in that case flush the TM regs
to the thread_struct.

This patch also moves flush_tmregs_to_thread() into ptrace.c as it is
only called from that file.

Fixes: 8d460f6156 ("powerpc/process: Add the function flush_tmregs_to_thread")
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
[mpe: Flesh out change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-10 16:34:20 +10:00
Christophe Leroy
1bc8b816cb powerpc/32: Fix csum_partial_copy_generic()
Commit 7aef413656 ("powerpc32: rewrite csum_partial_copy_generic()
based on copy_tofrom_user()") introduced a bug when destination
address is odd and initial csum is not null

In that (rare) case the initial csum value has to be rotated one byte
as well as the resulting value is

This patch also fixes related comments

Fixes: 7aef413656 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-10 14:52:45 +10:00
Benjamin Herrenschmidt
5958d19a14 powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARs
The generic allocation code may sometimes decide to assign a prefetchable
64-bit BAR to the M32 window. In fact it may also decide to allocate
a 64-bit non-prefetchable BAR to the M64 one ! So using the resource
flags as a test to decide which window was used for PE allocation is
just wrong and leads to insane PE numbers.

Instead, compare the addresses to figure it out.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Rename the function as agreed by Ben & Gavin]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09 19:51:47 +10:00
Mahesh Salgaonkar
c74dd88e77 powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
When machine check occurs with MSR(RI=0), it means MC interrupt is
unrecoverable and kernel goes down to panic path. But the console
message still shows it as recovered. This patch fixes the MCE console
messages.

Fixes: 36df96f8ac ("powerpc/book3s: Decode and save machine check event.")
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09 19:46:54 +10:00
Michael Ellerman
61e8a0d5a0 powerpc/pci: Fix endian bug in fixed PHB numbering
The recent commit 63a72284b1 ("powerpc/pci: Assign fixed PHB number
based on device-tree properties"), added code to read a 64-bit property
from the device tree, and if not found read a 32-bit property (reg).

There was a bug in the 32-bit case, on big endian machines, due to the
use of the 64-bit value to read the 32-bit property. The cast of &prop
means we end up writing to the high 32-bit of prop, leaving the low
32-bits containing whatever junk was on the stack.

If that junk value was non-zero, and < MAX_PHBS, we would end up using
it as the PHB id. This results in users seeing what appear to be random
PHB ids.

Fix it by reading into a u32 property and then assigning that to the
u64 value, letting the CPU do the correct conversions for us.

Fixes: 63a72284b1 ("powerpc/pci: Assign fixed PHB number based on device-tree properties")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09 16:52:03 +10:00
Guilherme G. Piccoli
10560b9afc powerpc/eeh: Switch to conventional PCI address output in EEH log
This is a very minor/trivial fix for the output of PCI address on EEH
logs. The PCI address on "OF node" field currently is using ":" as a
separator for the function, but the usual separator is ".". This patch
changes the separator to dot, so the PCI address is printed as usual.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09 16:52:03 +10:00
Guenter Roeck
546c4402c5 powerpc/vdso: Add missing include file
Some powerpc builds fail with the following buld error.

  In file included from ./arch/powerpc/include/asm/mmu_context.h:11:0,
                   from arch/powerpc/kernel/vdso.c:28:
  arch/powerpc/include/asm/cputhreads.h: In function 'get_tensr':
  arch/powerpc/include/asm/cputhreads.h:101:2: error:
  	implicit declaration of function 'cpu_has_feature'

Fixes: b92a226e52 ("powerpc: Move cpu_has_feature() to a separate file")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09 16:52:00 +10:00
Alastair D'Silva
9dc512819e powerpc: Fix unused function warning 'lmb_to_memblock'
This patch fixes the following warning:
arch/powerpc/platforms/pseries/hotplug-memory.c:323:29: error: 'lmb_to_memblock' defined but not used [-Werror=unused-function]
static struct memory_block *lmb_to_memblock(struct of_drconf_cell *lmb)
                           ^~~~~~~~~~~~~~~

The only consumer of this function is 'dlpar_remove_lmb', which is
enabled with CONFIG_MEMORY_HOTREMOVE, so move it into the same
ifdef block.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09 16:52:00 +10:00
Mahesh Salgaonkar
bc14c49195 powerpc/powernv: Fix MCE handler to avoid trashing CR0/CR1 registers.
The current implementation of MCE early handling modifies CR0/1 registers
without saving its old values. Fix this by moving early check for
powersaving mode to machine_check_handle_early().

The power architecture 2.06 or later allows the possibility of getting
machine check while in nap/sleep/winkle. The last bit of HSPRG0 is set
to 1, if thread is woken up from winkle. Hence, clear the last bit of
HSPRG0 (r13) before MCE handler starts using it as paca pointer.

Also, the current code always puts the thread into nap state irrespective
of whatever idle state it woke up from. Fix that by looking at
paca->thread_idle_state and put the thread back into same state where it
came from.

Fixes: 1c51089f77 ("powerpc/book3s: Return from interrupt if coming from evil context.")
Reported-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reviewed-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09 16:51:35 +10:00
Mahesh Salgaonkar
98d8821a47 powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h
Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h so that MCE handler changes
in subsequent patch can use it.

No functionality change.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09 14:50:20 +10:00
Mahesh Salgaonkar
e325d76f8b powerpc/powernv: Load correct TOC pointer while waking up from winkle.
The function pnv_restore_hyp_resource() loads the TOC into r2 from
the invalid PACA pointer before fixing r13 value. This do not affect
POWER ISA 3.0 but it does have an impact on POWER ISA 2.07 or less
leading CPU to get stuck forever.

	login: [  471.830433] Processor 120 is stuck.

This can be easily reproducible using following steps:
- Turn off SMT
	$ ppc64_cpu --smt=off
- offline/online any online cpu (Thread 0 of any core which is online)
	$ echo 0 > /sys/devices/system/cpu/cpu<num>/online
	$ echo 1 > /sys/devices/system/cpu/cpu<num>/online

For POWER ISA 2.07 or less, the last bit of HSPRG0 is set indicating
that thread is waking up from winkle. Hence, the last bit of HSPRG0(r13)
needs to be clear before accessing it as PACA to avoid loading invalid
values from invalid PACA pointer.

Fix this by loading TOC after r13 register is corrected.

Fixes: bcef83a00d ("powerpc/powernv: Add platform support for stop instruction")
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09 14:50:19 +10:00
Alexey Kardashevskiy
4d9021957b powerpc/powernv/ioda: Fix TCE invalidate to work in real mode again
Commit fd141d1a99 ("powerpc/powernv/pci: Rework accessing the TCE
invalidate register") broke TCE invalidation on IODA2/PHB3 for real
mode.

This makes invalidate work again.

Fixes: fd141d1a99 ("powerpc/powernv/pci: Rework accessing the TCE invalidate register")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09 14:50:19 +10:00
Dan Carpenter
54a94fcf2f powerpc/cell: Add missing error code in spufs_mkgang()
We should return -ENOMEM if alloc_spu_gang() fails.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09 14:50:18 +10:00
Benjamin Herrenschmidt
880a3d6afd powerpc/xics: Properly set Edge/Level type and enable resend
This sets the type of the interrupt appropriately. We set it as follow:

 - If not mapped from the device-tree, we use edge. This is the case
of the virtual interrupts and PCI MSIs for example.

 - If mapped from the device-tree and #interrupt-cells is 2 (PAPR
compliant), we use the second cell to set the appropriate type

 - If mapped from the device-tree and #interrupt-cells is 1 (current
OPAL on P8 does that), we assume level sensitive since those are
typically going to be the PSI LSIs which are level sensitive.

Additionally, we mark the interrupts requested via the opal_interrupts
property all level. This is a bit fishy but the best we can do until we
fix OPAL to properly expose them with a complete descriptor. It is also
correct for the current HW anyway as OPAL interrupts are currently PCI
error and PSI interrupts which are level.

Finally now that edge interrupts are properly identified, we can enable
CONFIG_HARDIRQS_SW_RESEND which will make the core re-send them if
they occur while masked, which some drivers rely upon.

This fixes issues with lost interrupts on some Mellanox adapters.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09 14:50:18 +10:00
Anton Blanchard
d2cf5be07f crypto: crc32c-vpmsum - Convert to CPU feature based module autoloading
This patch utilises the GENERIC_CPU_AUTOPROBE infrastructure
to automatically load the crc32c-vpmsum module if the CPU supports
it.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09 14:50:17 +10:00
Linus Torvalds
1eccfa090e Implements HARDENED_USERCOPY verification of copy_to_user/copy_from_user
bounds checking for most architectures on SLAB and SLUB.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJXl9tlAAoJEIly9N/cbcAm5BoP/ikTtDp2bFw1sn92yHTnIWzl
 O+dcKVAeRgjfnSvPfb1JITpaM58exQSaDsPBeR0DbVzU1zDdhLcwHHiQupFh98Ka
 vBZthbrlL/u4NB26enEEW0iyA32BsxYBMnIu0z5ux9RbZflmQwGQ0c0rvy3dJ7/b
 FzB5ayVST5y/a0m6/sImeeExh78GU9rsMb1XmJRMwlJAy6miDz/F9TP0LnuW6PhG
 J5XC99ygNJS1pQBLACRsrZw6ImgBxXnWCok6tWPMxFfD+rJBU2//wqS+HozyMWHL
 iYP7+ytVo/ZVok4114X/V4Oof3a6wqgpBuYrivJ228QO+UsLYbYLo6sZ8kRK7VFm
 9GgHo/8rWB1T9lBbSaa7UL5r0dVNNLjFGS42vwV+YlgUMQ1A35VRojO0jUnJSIQU
 Ug1IxKmylLd0nEcwD8/l3DXeQABsfL8GsoKW0OtdTZtW4RND4gzq34LK6t7hvayF
 kUkLg1OLNdUJwOi16M/rhugwYFZIMfoxQtjkRXKWN4RZ2QgSHnx2lhqNmRGPAXBG
 uy21wlzUTfLTqTpoeOyHzJwyF2qf2y4nsziBMhvmlrUvIzW1LIrYUKCNT4HR8Sh5
 lC2WMGYuIqaiu+NOF3v6CgvKd9UW+mxMRyPEybH8mEgfm+FLZlWABiBjIUpSEZuB
 JFfuMv1zlljj/okIQRg8
 =USIR
 -----END PGP SIGNATURE-----

Merge tag 'usercopy-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull usercopy protection from Kees Cook:
 "Tbhis implements HARDENED_USERCOPY verification of copy_to_user and
  copy_from_user bounds checking for most architectures on SLAB and
  SLUB"

* tag 'usercopy-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  mm: SLUB hardened usercopy support
  mm: SLAB hardened usercopy support
  s390/uaccess: Enable hardened usercopy
  sparc/uaccess: Enable hardened usercopy
  powerpc/uaccess: Enable hardened usercopy
  ia64/uaccess: Enable hardened usercopy
  arm64/uaccess: Enable hardened usercopy
  ARM: uaccess: Enable hardened usercopy
  x86/uaccess: Enable hardened usercopy
  mm: Hardened usercopy
  mm: Implement stack frame object validation
  mm: Add is_migrate_cma_page
2016-08-08 14:48:14 -07:00
Darren Stevens
416f37d081 powerpc/pasemi: Fix coherent_dma_mask for dma engine
Commit 817820b022 ("powerpc/iommu: Support "hybrid" iommu/direct DMA
ops for coherent_mask < dma_mask) adds a check of coherent_dma_mask for
dma allocations.

Unfortunately current PASemi code does not set this value for the DMA
engine, which ends up with the default value of 0xffffffff, the result
is on a PASemi system with >2Gb ram and iommu enabled the the onboard
ethernet stops working due to an inability to allocate memory. Add an
initialisation to pci_dma_dev_setup_pasemi().

Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-08 16:19:09 +10:00
Al Viro
9445aa1a30 ppc: move exports to definitions
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-08-07 23:50:09 -04:00
Linus Torvalds
6c84239d59 RTC for 4.8
Cleanups:
  - huge cleanup of rtc-generic and char/genrtc this allowed to cleanup rtc-cmos,
   rtc-sh, rtc-m68k, rtc-powerpc and rtc-parisc
  - move mn10300 to rtc-cmos
 
 Subsystem:
  - fix wakealarms after hibernate
  - multiples fixes for rctest
  - simplify implementations of .read_alarm
 
 New drivers:
  - Maxim MAX6916
 
 Drivers:
  - ds1307: fix weekday
  - m41t80: add wakeup support
  - pcf85063: add support for PCF85063A variant
  - rv8803: extend i2c fix and other fixes
  - s35390a: fix alarm reading, this fixes instant reboot after shutdown for QNAP
    TS-41x
  - s3c: clock fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJXokhIAAoJENiigzvaE+LCZqQP+wWzintN/N1u3dKiVB7iSdwq
 +S/jAXD9wW8OK9PI60/YUGRYeUXmZW9t4XYg1VKCxU9KpVC17LgOtDyXD8BufP1V
 uREJEzZw9O7zCCjeHp/ICFjBkc62Net6ZDOO+ZyXPNfddpS1Xq1uUgXLZc/202UR
 ID/kewu0pJRDnoxyqznWn9+8D33w/ygXs2slY2Ive0ONtjdgxGcsj2rNbb2RYn2z
 OP7br3lLg7qkFh4TtXb61eh/9GYIk6wzP/CrX5l/jH4SjQnrIk5g/X/Cd1qQ/qso
 JZzFoonOKvIp5Gw/+fZ9NP3YFcnkoRMv4NjZV8PAmsYLds+ibRiBcoB8u6FmiJV7
 WW5uopgPkfCGN5BV3+QHwJDVe+WlgnlzaT5zPUCcP5KWusDts4fWIgzP7vrtAzf4
 3OJLrgSGdBeOqWnJD21nxKUD27JOseX7D+BFtwxR4lMsXHqlHJfETpZ8gts1ZGH3
 2U353j/jkZvGWmc6dMcuxOXT2K4VqpYeIIqs0IcLu6hM9crtR89zPR2Iu1AilfDW
 h2NroF+Q//SgMMzWoTEG6Tn7RAc7MthgA/tRCFZF9CBMzNs988w0CTHnKsIHmjpU
 UKkMeJGAC9YrPYIcqrg0oYsmLUWXc8JuZbGJBnei3BzbaMTlcwIN9qj36zfq6xWc
 TMLpbWEoIsgFIZMP/hAP
 =rpGB
 -----END PGP SIGNATURE-----

Merge tag 'rtc-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
 "RTC for 4.8

  Cleanups:
   - huge cleanup of rtc-generic and char/genrtc this allowed to cleanup
     rtc-cmos, rtc-sh, rtc-m68k, rtc-powerpc and rtc-parisc
   - move mn10300 to rtc-cmos

  Subsystem:
   - fix wakealarms after hibernate
   - multiples fixes for rctest
   - simplify implementations of .read_alarm

  New drivers:
   - Maxim MAX6916

  Drivers:
   - ds1307: fix weekday
   - m41t80: add wakeup support
   - pcf85063: add support for PCF85063A variant
   - rv8803: extend i2c fix and other fixes
   - s35390a: fix alarm reading, this fixes instant reboot after
     shutdown for QNAP TS-41x
   - s3c: clock fixes"

* tag 'rtc-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (65 commits)
  rtc: rv8803: Clear V1F when setting the time
  rtc: rv8803: Stop the clock while setting the time
  rtc: rv8803: Always apply the I²C workaround
  rtc: rv8803: Fix read day of week
  rtc: rv8803: Remove the check for valid time
  rtc: rv8803: Kconfig: Indicate rx8900 support
  rtc: asm9260: remove .owner field for driver
  rtc: at91sam9: Fix missing spin_lock_init()
  rtc: m41t80: add suspend handlers for alarm IRQ
  rtc: m41t80: make it a real error message
  rtc: pcf85063: Add support for the PCF85063A device
  rtc: pcf85063: fix year range
  rtc: hym8563: in .read_alarm set .tm_sec to 0 to signal minute accuracy
  rtc: explicitly set tm_sec = 0 for drivers with minute accurancy
  rtc: s3c: Add s3c_rtc_{enable/disable}_clk in s3c_rtc_setfreq()
  rtc: s3c: Remove unnecessary call to disable already disabled clock
  rtc: abx80x: use devm_add_action_or_reset()
  rtc: m41t80: use devm_add_action_or_reset()
  rtc: fix a typo and reduce three empty lines to one
  rtc: s35390a: improve two comments in .set_alarm
  ...
2016-08-05 09:48:22 -04:00
Linus Torvalds
2cfd716d27 powerpc updates for 4.8 #2
Fixes:
  - Fix early access to cpu_spec relocation from Benjamin Herrenschmidt
  - Fix incorrect event codes in power9-event-list from Madhavan Srinivasan
  - Move register_process_table() out of ppc_md from Michael Ellerman
 
 Use jump_label for [cpu|mmu]_has_feature() from Aneesh Kumar K.V, Kevin Hao and Michael Ellerman:
  - Add mmu_early_init_devtree() from Michael Ellerman
  - Move disable_radix handling into mmu_early_init_devtree() from Michael Ellerman
  - Do hash device tree scanning earlier from Michael Ellerman
  - Do radix device tree scanning earlier from Michael Ellerman
  - Do feature patching before MMU init from Michael Ellerman
  - Check features don't change after patching from Michael Ellerman
  - Make MMU_FTR_RADIX a MMU family feature from Aneesh Kumar K.V
  - Convert mmu_has_feature() to returning bool from Michael Ellerman
  - Convert cpu_has_feature() to returning bool from Michael Ellerman
  - Define radix_enabled() in one place & use static inline from Michael Ellerman
  - Add early_[cpu|mmu]_has_feature() from Michael Ellerman
  - Convert early cpu/mmu feature check to use the new helpers from Aneesh Kumar K.V
  - jump_label: Make it possible for arches to invoke jump_label_init() earlier from Kevin Hao
  - Call jump_label_init() in apply_feature_fixups() from Aneesh Kumar K.V
  - Remove mfvtb() from Kevin Hao
  - Move cpu_has_feature() to a separate file from Kevin Hao
  - Add kconfig option to use jump labels for cpu/mmu_has_feature() from Michael Ellerman
  - Add option to use jump label for cpu_has_feature() from Kevin Hao
  - Add option to use jump label for mmu_has_feature() from Kevin Hao
  - Catch usage of cpu/mmu_has_feature() before jump label init from Aneesh Kumar K.V
  - Annotate jump label assembly from Michael Ellerman
 
 TLB flush enhancements from Aneesh Kumar K.V:
  - radix: Implement tlb mmu gather flush efficiently
  - Add helper for finding SLBE LLP encoding
  - Use hugetlb flush functions
  - Drop multiple definition of mm_is_core_local
  - radix: Add tlb flush of THP ptes
  - radix: Rename function and drop unused arg
  - radix/hugetlb: Add helper for finding page size
  - hugetlb: Add flush_hugetlb_tlb_range
  - remove flush_tlb_page_nohash
 
 Add new ptrace regsets from Anshuman Khandual and Simon Guo:
  - elf: Add powerpc specific core note sections
  - Add the function flush_tmregs_to_thread
  - Enable in transaction NT_PRFPREG ptrace requests
  - Enable in transaction NT_PPC_VMX ptrace requests
  - Enable in transaction NT_PPC_VSX ptrace requests
  - Adapt gpr32_get, gpr32_set functions for transaction
  - Enable support for NT_PPC_CGPR
  - Enable support for NT_PPC_CFPR
  - Enable support for NT_PPC_CVMX
  - Enable support for NT_PPC_CVSX
  - Enable support for TM SPR state
  - Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR
  - Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR
  - Enable support for EBB registers
  - Enable support for Performance Monitor registers
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXpGaLAAoJEFHr6jzI4aWA9aYP/1AqmRPJ9D0XVUJWT+FVABUK
 LESESoVFF4Hug1j1F8Synhg5o4SzD2t45iGKbclYaFthOIyovMg7Wr1KSu4hQ0go
 rPuQfpXDNQ8jKdDX8hbPXKUxrNRBNfqJGFo5E7mO6wN9AJ9d1LVwQ+jKAva29Tqs
 LaAlMbQNbeObPNzOl73B73iew3aozr+mXjBqv82lqvgYknBD2CLf24xGG3eNIbq5
 ZZk4LPC8pdkaxnajnzRFzqwiyPWzao0yfpVRKh52TKHBQF/prR/KACb6zUuja/61
 krOfegUKob14OYrehjs6X8XNRLnILRI0u1H5bmj7eVEiY/usyNzE93SMHZM3Wdau
 sQF/Au4OLNXj0ZQdNBtzRsZRyp1d560Gsj+lQGBoPd4hfIWkFYHvxzxsUSdqv4uA
 MWDMwN0Vvfk0cpprsabsWNevkaotYYBU00px5hF/e5ZUc9/x/xYUVMgPEDr0QZLr
 cHJo9/Pjk4u/0g4lj+2y1LLl/0tNEZZg69O6bvffPAPVSS4/P4y/bKKYd4I0zL99
 Ykp91mSmkl70F3edgOSFqyda2gN2l2Ekb/i081YGXheFy1rbD29Vxv82BOVog4KY
 ibvOqp38WDzCVk5OXuCRvBl0VudLKGJYdppU1nXg4KgrTZXHeCAC0E+NzUsgOF4k
 OMvQ+5drVxrno+Hw8FVJ
 =0Q8E
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull more powerpc updates from Michael Ellerman:
 "These were delayed for various reasons, so I let them sit in next a
  bit longer, rather than including them in my first pull request.

  Fixes:
   - Fix early access to cpu_spec relocation from Benjamin Herrenschmidt
   - Fix incorrect event codes in power9-event-list from Madhavan Srinivasan
   - Move register_process_table() out of ppc_md from Michael Ellerman

  Use jump_label use for [cpu|mmu]_has_feature():
   - Add mmu_early_init_devtree() from Michael Ellerman
   - Move disable_radix handling into mmu_early_init_devtree() from Michael Ellerman
   - Do hash device tree scanning earlier from Michael Ellerman
   - Do radix device tree scanning earlier from Michael Ellerman
   - Do feature patching before MMU init from Michael Ellerman
   - Check features don't change after patching from Michael Ellerman
   - Make MMU_FTR_RADIX a MMU family feature from Aneesh Kumar K.V
   - Convert mmu_has_feature() to returning bool from Michael Ellerman
   - Convert cpu_has_feature() to returning bool from Michael Ellerman
   - Define radix_enabled() in one place & use static inline from Michael Ellerman
   - Add early_[cpu|mmu]_has_feature() from Michael Ellerman
   - Convert early cpu/mmu feature check to use the new helpers from Aneesh Kumar K.V
   - jump_label: Make it possible for arches to invoke jump_label_init() earlier from Kevin Hao
   - Call jump_label_init() in apply_feature_fixups() from Aneesh Kumar K.V
   - Remove mfvtb() from Kevin Hao
   - Move cpu_has_feature() to a separate file from Kevin Hao
   - Add kconfig option to use jump labels for cpu/mmu_has_feature() from Michael Ellerman
   - Add option to use jump label for cpu_has_feature() from Kevin Hao
   - Add option to use jump label for mmu_has_feature() from Kevin Hao
   - Catch usage of cpu/mmu_has_feature() before jump label init from Aneesh Kumar K.V
   - Annotate jump label assembly from Michael Ellerman

  TLB flush enhancements from Aneesh Kumar K.V:
   - radix: Implement tlb mmu gather flush efficiently
   - Add helper for finding SLBE LLP encoding
   - Use hugetlb flush functions
   - Drop multiple definition of mm_is_core_local
   - radix: Add tlb flush of THP ptes
   - radix: Rename function and drop unused arg
   - radix/hugetlb: Add helper for finding page size
   - hugetlb: Add flush_hugetlb_tlb_range
   - remove flush_tlb_page_nohash

  Add new ptrace regsets from Anshuman Khandual and Simon Guo:
   - elf: Add powerpc specific core note sections
   - Add the function flush_tmregs_to_thread
   - Enable in transaction NT_PRFPREG ptrace requests
   - Enable in transaction NT_PPC_VMX ptrace requests
   - Enable in transaction NT_PPC_VSX ptrace requests
   - Adapt gpr32_get, gpr32_set functions for transaction
   - Enable support for NT_PPC_CGPR
   - Enable support for NT_PPC_CFPR
   - Enable support for NT_PPC_CVMX
   - Enable support for NT_PPC_CVSX
   - Enable support for TM SPR state
   - Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR
   - Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR
   - Enable support for EBB registers
   - Enable support for Performance Monitor registers"

* tag 'powerpc-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (48 commits)
  powerpc/mm: Move register_process_table() out of ppc_md
  powerpc/perf: Fix incorrect event codes in power9-event-list
  powerpc/32: Fix early access to cpu_spec relocation
  powerpc/ptrace: Enable support for Performance Monitor registers
  powerpc/ptrace: Enable support for EBB registers
  powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR
  powerpc/ptrace: Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR
  powerpc/ptrace: Enable support for TM SPR state
  powerpc/ptrace: Enable support for NT_PPC_CVSX
  powerpc/ptrace: Enable support for NT_PPC_CVMX
  powerpc/ptrace: Enable support for NT_PPC_CFPR
  powerpc/ptrace: Enable support for NT_PPC_CGPR
  powerpc/ptrace: Adapt gpr32_get, gpr32_set functions for transaction
  powerpc/ptrace: Enable in transaction NT_PPC_VSX ptrace requests
  powerpc/ptrace: Enable in transaction NT_PPC_VMX ptrace requests
  powerpc/ptrace: Enable in transaction NT_PRFPREG ptrace requests
  powerpc/process: Add the function flush_tmregs_to_thread
  elf: Add powerpc specific core note sections
  powerpc/mm: remove flush_tlb_page_nohash
  powerpc/mm/hugetlb: Add flush_hugetlb_tlb_range
  ...
2016-08-05 09:00:54 -04:00
Dan Carpenter
380afa3698 powerpc/fsl_rio: fix a missing error code
We should set the error code here rather than incorrectly returning 0.
Otherwise static checkers complain.

Link: http://lkml.kernel.org/r/20160804053525.GM775@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04 20:02:09 -04:00
Jason Baron
5411fd7fdc powerpc: add explicit #include <asm/asm-compat.h> for jump label
The stringify_in_c() macro may not be included. Make the dependency
explicit.

Link: http://lkml.kernel.org/r/564720c5328edd53c9d56db325be7215440eec3e.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Joe Perches <joe@perches.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04 08:50:07 -04:00
Krzysztof Kozlowski
00085f1efa dma-mapping: use unsigned long for dma_attrs
The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer.  Thus the pointer can point to const data.
However the attributes do not have to be a bitfield.  Instead unsigned
long will do fine:

1. This is just simpler.  Both in terms of reading the code and setting
   attributes.  Instead of initializing local attributes on the stack
   and passing pointer to it to dma_set_attr(), just set the bits.

2. It brings safeness and checking for const correctness because the
   attributes are passed by value.

Semantic patches for this change (at least most of them):

    virtual patch
    virtual context

    @r@
    identifier f, attrs;

    @@
    f(...,
    - struct dma_attrs *attrs
    + unsigned long attrs
    , ...)
    {
    ...
    }

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

and

    // Options: --all-includes
    virtual patch
    virtual context

    @r@
    identifier f, attrs;
    type t;

    @@
    t f(..., struct dma_attrs *attrs);

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-04 08:50:07 -04:00
Michael Ellerman
eea8148c69 powerpc/mm: Move register_process_table() out of ppc_md
We want to initialise register_process_table() before ppc_md is setup,
so that it can be called as part of MMU init (at least on Radix ATM).

That no longer works because probe_machine() requires that ppc_md be
empty before it's called, and we now do probe_machine() much later.

So make register_process_table a global for now. It will probably move
into a mmu_radix_ops struct at some point in the future.

This was broken by me when applying commit 7025776ed1 "powerpc/mm:
Move hash table ops to a separate structure" due to conflicts with other
patches.

Fixes: 7025776ed1 ("powerpc/mm: Move hash table ops to a separate structure")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-04 20:22:34 +10:00
Madhavan Srinivasan
1a058f1643 powerpc/perf: Fix incorrect event codes in power9-event-list
These have been changed in the hardware, update Linux's version.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-04 20:22:34 +10:00
Benjamin Herrenschmidt
2c0f99516f powerpc/32: Fix early access to cpu_spec relocation
Commit 9402c68461 ("powerpc: Factor do_feature_fixup calls")
introduced a subtle bug on 32-bit. When reading the cpu spec from the
global, we not only need to do a pointer relocation on the global
address but also on the pointer we read from it.

This fixes crashes reported on MPC5200 based machines.

Fixes: 9402c68461 ("powerpc: Factor do_feature_fixup calls")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-03 15:43:16 +10:00
Linus Torvalds
d52bd54db8 Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

 - the rest of ocfs2

 - various hotfixes, mainly MM

 - quite a bit of misc stuff - drivers, fork, exec, signals, etc.

 - printk updates

 - firmware

 - checkpatch

 - nilfs2

 - more kexec stuff than usual

 - rapidio updates

 - w1 things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (111 commits)
  ipc: delete "nr_ipc_ns"
  kcov: allow more fine-grained coverage instrumentation
  init/Kconfig: add clarification for out-of-tree modules
  config: add android config fragments
  init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig
  relay: add global mode support for buffer-only channels
  init: allow blacklisting of module_init functions
  w1:omap_hdq: fix regression
  w1: add helper macro module_w1_family
  w1: remove need for ida and use PLATFORM_DEVID_AUTO
  rapidio/switches: add driver for IDT gen3 switches
  powerpc/fsl_rio: apply changes for RIO spec rev 3
  rapidio: modify for rev.3 specification changes
  rapidio: change inbound window size type to u64
  rapidio/idt_gen2: fix locking warning
  rapidio: fix error handling in mbox request/release functions
  rapidio/tsi721_dma: advance queue processing from transfer submit call
  rapidio/tsi721: add messaging mbox selector parameter
  rapidio/tsi721: add PCIe MRRS override parameter
  rapidio/tsi721_dma: add channel mask and queue size parameters
  ...
2016-08-02 21:08:07 -04:00
Alexandre Bounine
adff1649e6 powerpc/fsl_rio: apply changes for RIO spec rev 3
- Remove check for parallel PHY

 - Set LP-Serial Register Map type

[akpm@linux-foundation.org: fix build]
[alexandre.bounine@idt.com: fix build fix]
 Link: http://lkml.kernel.org/r/20160802184932.2755-1-alexandre.bounine@idt.com
Link: http://lkml.kernel.org/r/1469125134-16523-13-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:38 -04:00
Alexandre Bounine
a057a52e94 rapidio: change inbound window size type to u64
Current definition of map_inb() mport operations callback uses u32 type
to specify required inbound window (IBW) size.  This is limiting factor
because existing hardware - tsi721 and fsl_rio, both support IBW size up
to 16GB.

Changing type of size parameter to u64 to allow IBW size configurations
larger than 4GB.

[alexandre.bounine@idt.com: remove compiler warning about size of constant]
  Link: http://lkml.kernel.org/r/20160802184856.2566-1-alexandre.bounine@idt.com
Link: http://lkml.kernel.org/r/1469125134-16523-11-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:37 -04:00
Andy Lutomirski
7e7814180b signal: consolidate {TS,TLF}_RESTORE_SIGMASK code
In general, there's no need for the "restore sigmask" flag to live in
ti->flags.  alpha, ia64, microblaze, powerpc, sh, sparc (64-bit only),
tile, and x86 use essentially identical alternative implementations,
placing the flag in ti->status.

Replace those optimized implementations with an equally good common
implementation that stores it in a bitfield in struct task_struct and
drop the custom implementations.

Additional architectures can opt in by removing their
TIF_RESTORE_SIGMASK defines.

Link: http://lkml.kernel.org/r/8a14321d64a28e40adfddc90e18a96c086a6d6f9.1468522723.git.luto@kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:23 -04:00
Chen Gang
949bed2f57 include: mman: use bool instead of int for the return value of arch_validate_prot
For pure bool function's return value, bool is a little better more or
less than int.

Link: http://lkml.kernel.org/r/1469331815-2026-1-git-send-email-chengang@emindsoft.com.cn
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 19:35:02 -04:00
Fabian Frederick
bd721ea73e treewide: replace obsolete _refok by __ref
There was only one use of __initdata_refok and __exit_refok

__init_refok was used 46 times against 82 for __ref.

Those definitions are obsolete since commit 312b1485fb ("Introduce new
section reference annotations tags: __ref, __refdata, __refconst")

This patch removes the following compatibility definitions and replaces
them treewide.

/* compatibility defines */
#define __init_refok     __ref
#define __initdata_refok __refdata
#define __exit_refok     __ref

I can also provide separate patches if necessary.
(One patch per tree and check in 1 month or 2 to remove old definitions)

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 17:31:41 -04:00
Linus Torvalds
c8d0267efd PCI changes for the v4.8 merge window:
Enumeration
     Move ecam.h to linux/include/pci-ecam.h (Jayachandran C)
     Add parent device field to ECAM struct pci_config_window (Jayachandran C)
     Add generic MCFG table handling (Tomasz Nowicki)
     Refactor pci_bus_assign_domain_nr() for CONFIG_PCI_DOMAINS_GENERIC (Tomasz Nowicki)
     Factor DT-specific pci_bus_find_domain_nr() code out (Tomasz Nowicki)
 
   Resource management
     Add devm_request_pci_bus_resources() (Bjorn Helgaas)
     Unify pci_resource_to_user() declarations (Bjorn Helgaas)
     Implement pci_resource_to_user() with pcibios_resource_to_bus() (microblaze, powerpc, sparc) (Bjorn Helgaas)
     Request host bridge window resources (designware, iproc, rcar, xgene, xilinx, xilinx-nwl) (Bjorn Helgaas)
     Make PCI I/O space optional on ARM32 (Bjorn Helgaas)
     Ignore write combining when mapping I/O port space (Bjorn Helgaas)
     Claim bus resources on MIPS PCI_PROBE_ONLY set-ups (Bjorn Helgaas)
     Remove unicore32 pci=firmware command line parameter handling (Bjorn Helgaas)
     Support I/O resources when parsing host bridge resources (Jayachandran C)
     Add helpers to request/release memory and I/O regions (Johannes Thumshirn)
     Use pci_(request|release)_mem_regions (NVMe, lpfc, GenWQE, ethernet/intel, alx) (Johannes Thumshirn)
     Extend pci=resource_alignment to specify device/vendor IDs (Koehrer Mathias (ETAS/ESW5))
     Add generic pci_bus_claim_resources() (Lorenzo Pieralisi)
     Claim bus resources on ARM32 PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi)
     Remove ARM32 and ARM64 arch-specific pcibios_enable_device() (Lorenzo Pieralisi)
     Add pci_unmap_iospace() to unmap I/O resources (Sinan Kaya)
     Remove powerpc __pci_mmap_set_pgprot() (Yinghai Lu)
 
   PCI device hotplug
     Allow additional bus numbers for hotplug bridges (Keith Busch)
     Ignore interrupts during D3cold (Lukas Wunner)
 
   Power management
     Enforce type casting for pci_power_t (Andy Shevchenko)
     Don't clear d3cold_allowed for PCIe ports (Mika Westerberg)
     Put PCIe ports into D3 during suspend (Mika Westerberg)
     Power on bridges before scanning new devices (Mika Westerberg)
     Runtime resume bridge before rescan (Mika Westerberg)
     Add runtime PM support for PCIe ports (Mika Westerberg)
     Remove redundant check of pcie_set_clkpm (Shawn Lin)
 
   Virtualization
     Add function 1 DMA alias quirk for Marvell 88SE9182 (Aaron Sierra)
     Add DMA alias quirk for Adaptec 3805 (Alex Williamson)
     Mark Atheros AR9485 and QCA9882 to avoid bus reset (Chris Blake)
     Add ACS quirk for Solarflare SFC9220 (Edward Cree)
 
   MSI
     Fix PCI_MSI dependencies (Arnd Bergmann)
     Add pci_msix_desc_addr() helper (Christoph Hellwig)
     Switch msix_program_entries() to use pci_msix_desc_addr() (Christoph Hellwig)
     Make the "entries" argument to pci_enable_msix() optional (Christoph Hellwig)
     Provide sensible IRQ vector alloc/free routines (Christoph Hellwig)
     Spread interrupt vectors in pci_alloc_irq_vectors() (Christoph Hellwig)
 
   Error Handling
     Bind DPC to Root Ports as well as Downstream Ports (Keith Busch)
     Remove DPC tristate module option (Keith Busch)
     Convert Downstream Port Containment driver to use devm_* functions (Mika Westerberg)
 
   Generic host bridge driver
     Select IRQ_DOMAIN (Arnd Bergmann)
     Claim bus resources on PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi)
 
   ACPI host bridge driver
     Add ARM64 acpi_pci_bus_find_domain_nr() (Tomasz Nowicki)
     Add ARM64 ACPI support for legacy IRQs parsing and consolidation with DT code (Tomasz Nowicki)
     Implement ARM64 AML accessors for PCI_Config region (Tomasz Nowicki)
     Support ARM64 ACPI-based PCI host controller (Tomasz Nowicki)
 
   Altera host bridge driver
     Check link status before retrain link (Ley Foon Tan)
     Poll for link up status after retraining the link (Ley Foon Tan)
 
   Axis ARTPEC-6 host bridge driver
     Add PCI_MSI_IRQ_DOMAIN dependency (Arnd Bergmann)
     Add DT binding for Axis ARTPEC-6 PCIe controller (Niklas Cassel)
     Add Axis ARTPEC-6 PCIe controller driver (Niklas Cassel)
 
   Intel VMD host bridge driver
     Use lock save/restore in interrupt enable path (Jon Derrick)
     Select device dma ops to override (Keith Busch)
     Initialize list item in IRQ disable (Keith Busch)
     Use x86_vector_domain as parent domain (Keith Busch)
     Separate MSI and MSI-X vector sharing (Keith Busch)
 
   Marvell Aardvark host bridge driver
     Add DT binding for the Aardvark PCIe controller (Thomas Petazzoni)
     Add Aardvark PCI host controller driver (Thomas Petazzoni)
     Add Aardvark PCIe support for Armada 3700 (Thomas Petazzoni)
 
   Microsoft Hyper-V host bridge driver
     Fix interrupt cleanup path (Cathy Avery)
     Don't leak buffer in hv_pci_onchannelcallback() (Vitaly Kuznetsov)
     Handle all pending messages in hv_pci_onchannelcallback() (Vitaly Kuznetsov)
 
   NVIDIA Tegra host bridge driver
     Program PADS_REFCLK_CFG* always, not just on legacy SoCs (Stephen Warren)
     Program PADS_REFCLK_CFG* registers with per-SoC values (Stephen Warren)
     Use lower-case hex consistently for register definitions (Thierry Reding)
     Use generic pci_remap_iospace() rather than ARM32-specific one (Thierry Reding)
     Stop setting pcibios_min_mem (Thierry Reding)
 
   Renesas R-Car host bridge driver
     Drop gen2 dummy I/O port region (Bjorn Helgaas)
 
   TI DRA7xx host bridge driver
     Fix return value in case of error (Christophe JAILLET)
 
   Xilinx AXI host bridge driver
     Fix return value in case of error (Christophe JAILLET)
 
   Miscellaneous
     Make bus_attr_resource_alignment static (Ben Dooks)
     Include <asm/dma.h> for isa_dma_bridge_buggy (Ben Dooks)
     MAINTAINERS: Add file patterns for PCI device tree bindings (Geert Uytterhoeven)
     Make host bridge drivers explicitly non-modular (Paul Gortmaker)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXoNRtAAoJEFmIoMA60/r8LMkP/3kiNh21QFS6RZGOaDft5/Py
 n14Zo0w51avspxoI3iyDlBd5q/SssMqi+2c6Ko/fh2D2xMxJgmQOjdMDrIGARxGA
 qEHk/5IoXquY2/GcptmCk3ap66cJ6kTovS4OPrb73m3fPuknFwFwdzExq22XHbnI
 crPya6xwQxPLc54VpY/TsgW8E+EKZd/3FW9wuzzNHXrXmTILyhBQzQAA0K470GMx
 wEXU6kc3M/XhRuF1zjV9/O+H/xguwfnbTpZLvd2NAF6uXKZoRytEHHtNnVqu1hoe
 UPpDS2xq32pMNbGxGqBetCdIbkY/hWOufmckHI7Yu2OfXBYyHBYMG2je1+nMPkOV
 WiFhhrchGt5KnEMUwXPS4ROqnSZVpZBl1Fd4s10GhUYkoE2HNKJXta398H9FR1jj
 4NEVSi4mSX/+CkaoIN3lXYiaf9P0wv4Wppve4Scr30+VnLjJhm7Vw5La7v12oo6x
 otrJ/g98AkmnbuUdLeWBUS/+TOcdPjZYbw52rqBsbOOjFm51Zcj6D7kf5WcTypQy
 HzbvygSVabcioWehUG1uudC8pdJmQlUGx1aES/iu+mZEae4cuUFALu6hDBD9IYnZ
 5JdwjVzI0UItEwT3rQt3t4xiAqHADQ0NAVNJVCeREdoy/YQpSoTWGXIpyqCZ1yCm
 aBykjRsxbKQXlhVeIxuc
 =NVxu
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "Highlights:

   - ARM64 support for ACPI host bridges

   - new drivers for Axis ARTPEC-6 and Marvell Aardvark

   - new pci_alloc_irq_vectors() interface for MSI-X, MSI, legacy INTx

   - pci_resource_to_user() cleanup (more to come)

  Detailed summary:

  Enumeration:
   - Move ecam.h to linux/include/pci-ecam.h (Jayachandran C)
   - Add parent device field to ECAM struct pci_config_window (Jayachandran C)
   - Add generic MCFG table handling (Tomasz Nowicki)
   - Refactor pci_bus_assign_domain_nr() for CONFIG_PCI_DOMAINS_GENERIC (Tomasz Nowicki)
   - Factor DT-specific pci_bus_find_domain_nr() code out (Tomasz Nowicki)

  Resource management:
   - Add devm_request_pci_bus_resources() (Bjorn Helgaas)
   - Unify pci_resource_to_user() declarations (Bjorn Helgaas)
   - Implement pci_resource_to_user() with pcibios_resource_to_bus() (microblaze, powerpc, sparc) (Bjorn Helgaas)
   - Request host bridge window resources (designware, iproc, rcar, xgene, xilinx, xilinx-nwl) (Bjorn Helgaas)
   - Make PCI I/O space optional on ARM32 (Bjorn Helgaas)
   - Ignore write combining when mapping I/O port space (Bjorn Helgaas)
   - Claim bus resources on MIPS PCI_PROBE_ONLY set-ups (Bjorn Helgaas)
   - Remove unicore32 pci=firmware command line parameter handling (Bjorn Helgaas)
   - Support I/O resources when parsing host bridge resources (Jayachandran C)
   - Add helpers to request/release memory and I/O regions (Johannes Thumshirn)
   - Use pci_(request|release)_mem_regions (NVMe, lpfc, GenWQE, ethernet/intel, alx) (Johannes Thumshirn)
   - Extend pci=resource_alignment to specify device/vendor IDs (Koehrer Mathias (ETAS/ESW5))
   - Add generic pci_bus_claim_resources() (Lorenzo Pieralisi)
   - Claim bus resources on ARM32 PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi)
   - Remove ARM32 and ARM64 arch-specific pcibios_enable_device() (Lorenzo Pieralisi)
   - Add pci_unmap_iospace() to unmap I/O resources (Sinan Kaya)
   - Remove powerpc __pci_mmap_set_pgprot() (Yinghai Lu)

  PCI device hotplug:
   - Allow additional bus numbers for hotplug bridges (Keith Busch)
   - Ignore interrupts during D3cold (Lukas Wunner)

  Power management:
   - Enforce type casting for pci_power_t (Andy Shevchenko)
   - Don't clear d3cold_allowed for PCIe ports (Mika Westerberg)
   - Put PCIe ports into D3 during suspend (Mika Westerberg)
   - Power on bridges before scanning new devices (Mika Westerberg)
   - Runtime resume bridge before rescan (Mika Westerberg)
   - Add runtime PM support for PCIe ports (Mika Westerberg)
   - Remove redundant check of pcie_set_clkpm (Shawn Lin)

  Virtualization:
   - Add function 1 DMA alias quirk for Marvell 88SE9182 (Aaron Sierra)
   - Add DMA alias quirk for Adaptec 3805 (Alex Williamson)
   - Mark Atheros AR9485 and QCA9882 to avoid bus reset (Chris Blake)
   - Add ACS quirk for Solarflare SFC9220 (Edward Cree)

  MSI:
   - Fix PCI_MSI dependencies (Arnd Bergmann)
   - Add pci_msix_desc_addr() helper (Christoph Hellwig)
   - Switch msix_program_entries() to use pci_msix_desc_addr() (Christoph Hellwig)
   - Make the "entries" argument to pci_enable_msix() optional (Christoph Hellwig)
   - Provide sensible IRQ vector alloc/free routines (Christoph Hellwig)
   - Spread interrupt vectors in pci_alloc_irq_vectors() (Christoph Hellwig)

  Error Handling:
   - Bind DPC to Root Ports as well as Downstream Ports (Keith Busch)
   - Remove DPC tristate module option (Keith Busch)
   - Convert Downstream Port Containment driver to use devm_* functions (Mika Westerberg)

  Generic host bridge driver:
   - Select IRQ_DOMAIN (Arnd Bergmann)
   - Claim bus resources on PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi)

  ACPI host bridge driver:
   - Add ARM64 acpi_pci_bus_find_domain_nr() (Tomasz Nowicki)
   - Add ARM64 ACPI support for legacy IRQs parsing and consolidation with DT code (Tomasz Nowicki)
   - Implement ARM64 AML accessors for PCI_Config region (Tomasz Nowicki)
   - Support ARM64 ACPI-based PCI host controller (Tomasz Nowicki)

  Altera host bridge driver:
   - Check link status before retrain link (Ley Foon Tan)
   - Poll for link up status after retraining the link (Ley Foon Tan)

  Axis ARTPEC-6 host bridge driver:
   - Add PCI_MSI_IRQ_DOMAIN dependency (Arnd Bergmann)
   - Add DT binding for Axis ARTPEC-6 PCIe controller (Niklas Cassel)
   - Add Axis ARTPEC-6 PCIe controller driver (Niklas Cassel)

  Intel VMD host bridge driver:
   - Use lock save/restore in interrupt enable path (Jon Derrick)
   - Select device dma ops to override (Keith Busch)
   - Initialize list item in IRQ disable (Keith Busch)
   - Use x86_vector_domain as parent domain (Keith Busch)
   - Separate MSI and MSI-X vector sharing (Keith Busch)

  Marvell Aardvark host bridge driver:
   - Add DT binding for the Aardvark PCIe controller (Thomas Petazzoni)
   - Add Aardvark PCI host controller driver (Thomas Petazzoni)
   - Add Aardvark PCIe support for Armada 3700 (Thomas Petazzoni)

  Microsoft Hyper-V host bridge driver:
   - Fix interrupt cleanup path (Cathy Avery)
   - Don't leak buffer in hv_pci_onchannelcallback() (Vitaly Kuznetsov)
   - Handle all pending messages in hv_pci_onchannelcallback() (Vitaly Kuznetsov)

  NVIDIA Tegra host bridge driver:
   - Program PADS_REFCLK_CFG* always, not just on legacy SoCs (Stephen Warren)
   - Program PADS_REFCLK_CFG* registers with per-SoC values (Stephen Warren)
   - Use lower-case hex consistently for register definitions (Thierry Reding)
   - Use generic pci_remap_iospace() rather than ARM32-specific one (Thierry Reding)
   - Stop setting pcibios_min_mem (Thierry Reding)

  Renesas R-Car host bridge driver:
   - Drop gen2 dummy I/O port region (Bjorn Helgaas)

  TI DRA7xx host bridge driver:
   - Fix return value in case of error (Christophe JAILLET)

  Xilinx AXI host bridge driver:
   - Fix return value in case of error (Christophe JAILLET)

  Miscellaneous:
   - Make bus_attr_resource_alignment static (Ben Dooks)
   - Include <asm/dma.h> for isa_dma_bridge_buggy (Ben Dooks)
   - MAINTAINERS: Add file patterns for PCI device tree bindings (Geert Uytterhoeven)
   - Make host bridge drivers explicitly non-modular (Paul Gortmaker)"

* tag 'pci-v4.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (125 commits)
  PCI: xgene: Make explicitly non-modular
  PCI: thunder-pem: Make explicitly non-modular
  PCI: thunder-ecam: Make explicitly non-modular
  PCI: tegra: Make explicitly non-modular
  PCI: rcar-gen2: Make explicitly non-modular
  PCI: rcar: Make explicitly non-modular
  PCI: mvebu: Make explicitly non-modular
  PCI: layerscape: Make explicitly non-modular
  PCI: keystone: Make explicitly non-modular
  PCI: hisi: Make explicitly non-modular
  PCI: generic: Make explicitly non-modular
  PCI: designware-plat: Make it explicitly non-modular
  PCI: artpec6: Make explicitly non-modular
  PCI: armada8k: Make explicitly non-modular
  PCI: artpec: Add PCI_MSI_IRQ_DOMAIN dependency
  PCI: Add ACS quirk for Solarflare SFC9220
  arm64: dts: marvell: Add Aardvark PCIe support for Armada 3700
  PCI: aardvark: Add Aardvark PCI host controller driver
  dt-bindings: add DT binding for the Aardvark PCIe controller
  PCI: tegra: Program PADS_REFCLK_CFG* registers with per-SoC values
  ...
2016-08-02 17:12:29 -04:00
Linus Torvalds
f716a85cd6 Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild updates from Michal Marek:

 - GCC plugin support by Emese Revfy from grsecurity, with a fixup from
   Kees Cook.  The plugins are meant to be used for static analysis of
   the kernel code.  Two plugins are provided already.

 - reduction of the gcc commandline by Arnd Bergmann.

 - IS_ENABLED / IS_REACHABLE macro enhancements by Masahiro Yamada

 - bin2c fix by Michael Tautschnig

 - setlocalversion fix by Wolfram Sang

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  gcc-plugins: disable under COMPILE_TEST
  kbuild: Abort build on bad stack protector flag
  scripts: Fix size mismatch of kexec_purgatory_size
  kbuild: make samples depend on headers_install
  Kbuild: don't add obj tree in additional includes
  Kbuild: arch: look for generated headers in obtree
  Kbuild: always prefix objtree in LINUXINCLUDE
  Kbuild: avoid duplicate include path
  Kbuild: don't add ../../ to include path
  vmlinux.lds.h: replace config_enabled() with IS_ENABLED()
  kconfig.h: allow to use IS_{ENABLE,REACHABLE} in macro expansion
  kconfig.h: use already defined macros for IS_REACHABLE() define
  export.h: use __is_defined() to check if __KSYM_* is defined
  kconfig.h: use __is_defined() to check if MODULE is defined
  kbuild: setlocalversion: print error to STDERR
  Add sancov plugin
  Add Cyclomatic complexity GCC plugin
  GCC plugin infrastructure
  Shared library support
2016-08-02 16:37:12 -04:00
Linus Torvalds
221bb8a46e - ARM: GICv3 ITS emulation and various fixes. Removal of the old
VGIC implementation.
 
 - s390: support for trapping software breakpoints, nested virtualization
 (vSIE), the STHYI opcode, initial extensions for CPU model support.
 
 - MIPS: support for MIPS64 hosts (32-bit guests only) and lots of cleanups,
 preliminary to this and the upcoming support for hardware virtualization
 extensions.
 
 - x86: support for execute-only mappings in nested EPT; reduced vmexit
 latency for TSC deadline timer (by about 30%) on Intel hosts; support for
 more than 255 vCPUs.
 
 - PPC: bugfixes.
 
 The ugly bit is the conflicts.  A couple of them are simple conflicts due
 to 4.7 fixes, but most of them are with other trees. There was definitely
 too much reliance on Acked-by here.  Some conflicts are for KVM patches
 where _I_ gave my Acked-by, but the worst are for this pull request's
 patches that touch files outside arch/*/kvm.  KVM submaintainers should
 probably learn to synchronize better with arch maintainers, with the
 latter providing topic branches whenever possible instead of Acked-by.
 This is what we do with arch/x86.  And I should learn to refuse pull
 requests when linux-next sends scary signals, even if that means that
 submaintainers have to rebase their branches.
 
 Anyhow, here's the list:
 
 - arch/x86/kvm/vmx.c: handle_pcommit and EXIT_REASON_PCOMMIT was removed
 by the nvdimm tree.  This tree adds handle_preemption_timer and
 EXIT_REASON_PREEMPTION_TIMER at the same place.  In general all mentions
 of pcommit have to go.
 
 There is also a conflict between a stable fix and this patch, where the
 stable fix removed the vmx_create_pml_buffer function and its call.
 
 - virt/kvm/kvm_main.c: kvm_cpu_notifier was removed by the hotplug tree.
 This tree adds kvm_io_bus_get_dev at the same place.
 
 - virt/kvm/arm/vgic.c: a few final bugfixes went into 4.7 before the
 file was completely removed for 4.8.
 
 - include/linux/irqchip/arm-gic-v3.h: this one is entirely our fault;
 this is a change that should have gone in through the irqchip tree and
 pulled by kvm-arm.  I think I would have rejected this kvm-arm pull
 request.  The KVM version is the right one, except that it lacks
 GITS_BASER_PAGES_SHIFT.
 
 - arch/powerpc: what a mess.  For the idle_book3s.S conflict, the KVM
 tree is the right one; everything else is trivial.  In this case I am
 not quite sure what went wrong.  The commit that is causing the mess
 (fd7bacbca4, "KVM: PPC: Book3S HV: Fix TB corruption in guest exit
 path on HMI interrupt", 2016-05-15) touches both arch/powerpc/kernel/
 and arch/powerpc/kvm/.  It's large, but at 396 insertions/5 deletions
 I guessed that it wasn't really possible to split it and that the 5
 deletions wouldn't conflict.  That wasn't the case.
 
 - arch/s390: also messy.  First is hypfs_diag.c where the KVM tree
 moved some code and the s390 tree patched it.  You have to reapply the
 relevant part of commits 6c22c98637, plus all of e030c1125e, to
 arch/s390/kernel/diag.c.  Or pick the linux-next conflict
 resolution from http://marc.info/?l=kvm&m=146717549531603&w=2.
 Second, there is a conflict in gmap.c between a stable fix and 4.8.
 The KVM version here is the correct one.
 
 I have pushed my resolution at refs/heads/merge-20160802 (commit
 3d1f53419842) at git://git.kernel.org/pub/scm/virt/kvm/kvm.git.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJXoGm7AAoJEL/70l94x66DugQIAIj703ePAFepB/fCrKHkZZia
 SGrsBdvAtNsOhr7FQ5qvvjLxiv/cv7CymeuJivX8H+4kuUHUllDzey+RPHYHD9X7
 U6n1PdCH9F15a3IXc8tDjlDdOMNIKJixYuq1UyNZMU6NFwl00+TZf9JF8A2US65b
 x/41W98ilL6nNBAsoDVmCLtPNWAqQ3lajaZELGfcqRQ9ZGKcAYOaLFXHv2YHf2XC
 qIDMf+slBGSQ66UoATnYV2gAopNlWbZ7n0vO6tE2KyvhHZ1m399aBX1+k8la/0JI
 69r+Tz7ZHUSFtmlmyByi5IAB87myy2WQHyAPwj+4vwJkDGPcl0TrupzbG7+T05Y=
 =42ti
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:

 - ARM: GICv3 ITS emulation and various fixes.  Removal of the
   old VGIC implementation.

 - s390: support for trapping software breakpoints, nested
   virtualization (vSIE), the STHYI opcode, initial extensions
   for CPU model support.

 - MIPS: support for MIPS64 hosts (32-bit guests only) and lots
   of cleanups, preliminary to this and the upcoming support for
   hardware virtualization extensions.

 - x86: support for execute-only mappings in nested EPT; reduced
   vmexit latency for TSC deadline timer (by about 30%) on Intel
   hosts; support for more than 255 vCPUs.

 - PPC: bugfixes.

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits)
  KVM: PPC: Introduce KVM_CAP_PPC_HTM
  MIPS: Select HAVE_KVM for MIPS64_R{2,6}
  MIPS: KVM: Reset CP0_PageMask during host TLB flush
  MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX()
  MIPS: KVM: Sign extend MFC0/RDHWR results
  MIPS: KVM: Fix 64-bit big endian dynamic translation
  MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase
  MIPS: KVM: Use 64-bit CP0_EBase when appropriate
  MIPS: KVM: Set CP0_Status.KX on MIPS64
  MIPS: KVM: Make entry code MIPS64 friendly
  MIPS: KVM: Use kmap instead of CKSEG0ADDR()
  MIPS: KVM: Use virt_to_phys() to get commpage PFN
  MIPS: Fix definition of KSEGX() for 64-bit
  KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD
  kvm: x86: nVMX: maintain internal copy of current VMCS
  KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
  KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
  KVM: arm64: vgic-its: Simplify MAPI error handling
  KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers
  KVM: arm64: vgic-its: Turn device_id validation into generic ID validation
  ...
2016-08-02 16:11:27 -04:00
Sam Bobroff
23528bb21e KVM: PPC: Introduce KVM_CAP_PPC_HTM
Introduce a new KVM capability, KVM_CAP_PPC_HTM, that can be queried to
determine if a PowerPC KVM guest should use HTM (Hardware Transactional
Memory).

This will be used by QEMU to populate the pa-features bits in the
guest's device tree.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01 19:42:06 +02:00
Bjorn Helgaas
9454c23852 Merge branch 'pci/msi-affinity' into next
Conflicts:
	drivers/nvme/host/pci.c
2016-08-01 12:34:01 -05:00
Anshuman Khandual
a67ae75802 powerpc/ptrace: Enable support for Performance Monitor registers
This patch enables support for Performance monitor registers related
ELF core note NT_PPC_PMU based ptrace requests through
PTRACE_GETREGSET, PTRACE_SETREGSET calls. This is achieved
through adding one new register sets REGSET_PMU in powerpc
corresponding to the ELF core note sections added in this
regard. It also implements the get, set and active functions
for this new register sets added.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:24 +10:00
Anshuman Khandual
cf89d4e1b1 powerpc/ptrace: Enable support for EBB registers
This patch enables support for EBB state registers related
ELF core note NT_PPC_EBB based ptrace requests through
PTRACE_GETREGSET, PTRACE_SETREGSET calls. This is achieved
through adding one new register sets REGSET_EBB in powerpc
corresponding to the ELF core note sections added in this
regard. It also implements the get, set and active functions
for this new register sets added.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:23 +10:00
Anshuman Khandual
fa439810cc powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR
This patch enables support for running TAR, PPR, DSCR registers
related ELF core notes NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR based
ptrace requests through PTRACE_GETREGSET, PTRACE_SETREGSET calls.
This is achieved through adding three new register sets REGSET_TAR,
REGSET_PPR, REGSET_DSCR in powerpc corresponding to the ELF core
note sections added in this regad. It implements the get, set and
active functions for all these new register sets added.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:22 +10:00
Anshuman Khandual
c45dc9003a powerpc/ptrace: Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR
This patch enables support for all three TM checkpointed SPR
states related ELF core note  NT_PPC_TM_CTAR, NT_PPC_TM_CPPR,
NT_PPC_TM_CDSCR based ptrace requests through PTRACE_GETREGSET,
PTRACE_SETREGSET calls. This is achieved through adding three
new register sets REGSET_TM_CTAR, REGSET_TM_CPPR and
REGSET_TM_CDSCR in powerpc corresponding to the ELF core note
sections added. It implements the get, set and active functions
for all these new register sets added.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:22 +10:00
Anshuman Khandual
08e1c01d6a powerpc/ptrace: Enable support for TM SPR state
This patch enables support for TM SPR state related ELF core
note NT_PPC_TM_SPR based ptrace requests through PTRACE_GETREGSET,
PTRACE_SETREGSET calls. This is achieved through adding a register
set REGSET_TM_SPR in powerpc corresponding to the ELF core note
section added. It implements the get, set and active functions for
this new register set added.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:21 +10:00
Anshuman Khandual
9d3918f7c0 powerpc/ptrace: Enable support for NT_PPC_CVSX
This patch enables support for TM checkpointed VSX register
set ELF core note NT_PPC_CVSX based ptrace requests through
PTRACE_GETREGSET, PTRACE_SETREGSET calls. This is achieved
through adding a register set REGSET_CVSX in powerpc
corresponding to the ELF core note section added. It
implements the get, set and active functions for this new
register set added.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:20 +10:00
Anshuman Khandual
8c13f59999 powerpc/ptrace: Enable support for NT_PPC_CVMX
This patch enables support for TM checkpointed VMX register
set ELF core note NT_PPC_CVMX based ptrace requests through
PTRACE_GETREGSET, PTRACE_SETREGSET calls. This is achieved
through adding a register set REGSET_CVMX in powerpc
corresponding to the ELF core note section added. It
implements the get, set and active functions for this new
register set added.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:19 +10:00
Anshuman Khandual
19cbcbf75a powerpc/ptrace: Enable support for NT_PPC_CFPR
This patch enables support for TM checkpointed FPR register
set ELF core note NT_PPC_CFPR based ptrace requests through
PTRACE_GETREGSET, PTRACE_SETREGSET calls. This is achieved
through adding a register set REGSET_CFPR in powerpc
corresponding to the ELF core note section added. It
implements the get, set and active functions for this new
register set added.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:19 +10:00
Anshuman Khandual
25847fb195 powerpc/ptrace: Enable support for NT_PPC_CGPR
This patch enables support for TM checkpointed GPR register
set ELF core note NT_PPC_CGPR based ptrace requests through
PTRACE_GETREGSET, PTRACE_SETREGSET calls. This is achieved
through adding a register set REGSET_CGPR in powerpc
corresponding to the ELF core note section added. It
implements the get, set and active functions for this new
register set added.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:18 +10:00
Anshuman Khandual
04fcadce0e powerpc/ptrace: Adapt gpr32_get, gpr32_set functions for transaction
This patch splits gpr32_get, gpr32_set functions to accommodate
in transaction ptrace requests implemented in patches later in
the series.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:17 +10:00
Anshuman Khandual
94b7d3610e powerpc/ptrace: Enable in transaction NT_PPC_VSX ptrace requests
This patch enables in transaction NT_PPC_VSX ptrace requests. The
function vsr_get which gets the running value of all VSX registers
and the function vsr_set which sets the running value of of all VSX
registers work on the running set of VMX registers whose location
will be different if transaction is active. This patch makes these
functions adapt to situations when the transaction is active.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:17 +10:00
Anshuman Khandual
d844e27915 powerpc/ptrace: Enable in transaction NT_PPC_VMX ptrace requests
This patch enables in transaction NT_PPC_VMX ptrace requests. The
function vr_get which gets the running value of all VMX registers
and the function vr_set which sets the running value of of all VMX
registers work on the running set of VMX registers whose location
will be different if transaction is active. This patch makes these
functions adapt to situations when the transaction is active.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:16 +10:00
Anshuman Khandual
1ec8549d44 powerpc/ptrace: Enable in transaction NT_PRFPREG ptrace requests
This patch enables in transaction NT_PRFPREG ptrace requests.
The function fpr_get which gets the running value of all FPR
registers and the function fpr_set which sets the running
value of of all FPR registers work on the running set of FPR
registers whose location will be different if transaction is
active. This patch makes these functions adapt to situations
when the transaction is active.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:15 +10:00
Anshuman Khandual
8d460f6156 powerpc/process: Add the function flush_tmregs_to_thread
This patch creates a function flush_tmregs_to_thread which
will then be used by subsequent patches in this series. The
function checks for self tracing ptrace interface attempts
while in the TM context and logs appropriate warning message.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:15 +10:00
Aneesh Kumar K.V
703b41ad1a powerpc/mm: remove flush_tlb_page_nohash
This should be same as flush_tlb_page except for hash32. For hash32
I guess the existing code is wrong, because we don't seem to be
flushing tlb for Hash != 0 case at all. Fix this by switching to
calling flush_tlb_page() which does the right thing by flushing
tlb for both hash and nohash case with hash32

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:13 +10:00
Aneesh Kumar K.V
5491ae7b6f powerpc/mm/hugetlb: Add flush_hugetlb_tlb_range
Some archs like ppc64 need to do special things when flushing tlb for
hugepage. Add a new helper to flush hugetlb tlb range. This helps us to
avoid flushing the entire tlb mapping for the pid.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:13 +10:00
Aneesh Kumar K.V
fbfa26d854 powerpc/mm/radix/hugetlb: Add helper for finding page size from hstate
Use the helper instead of open coding the same at multiple place

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:12 +10:00
Aneesh Kumar K.V
f22dfc9158 powerpc/mm/radix: Rename function and drop unused arg
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:11 +10:00
Aneesh Kumar K.V
d8e91e93e9 powerpc/mm/radix: Add tlb flush of THP ptes
Instead of flushing the entire mm, implement a flush_pmd_tlb_range

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:10 +10:00
Aneesh Kumar K.V
9d4dab1158 powerpc/mm: Drop multiple definition of mm_is_core_local
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:10 +10:00
Aneesh Kumar K.V
13dce03363 powerpc/mm: Use hugetlb flush functions
Use flush_hugetlb_page instead of flush_tlb_page when we clear flush the
pte.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:09 +10:00
Aneesh Kumar K.V
138ee7ee01 powerpc/mm/hash: Add helper for finding SLBE LLP encoding
Replace opencoding of the same at multiple places with the helper.
No functional change with this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:08 +10:00
Aneesh Kumar K.V
8cb8140c4c powerpc/mm/radix: Implement tlb mmu gather flush efficiently
Now that we track page size in mmu_gather, we can use address based
tlbie format when doing a tlb_flush(). We don't do this if we are
invalidating the full address space.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:08 +10:00
Michael Ellerman
e2985fd9b8 powerpc/jump_label: Annotate jump label assembly
Add a comment to the generated assembler for jump labels. This makes it
easier to identify them in asm listings (generated with $ make foo.s).

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:07 +10:00
Aneesh Kumar K.V
c812c7d8f1 powerpc/mm: Catch usage of cpu/mmu_has_feature() before jump label init
This allows us to catch incorrect usage of cpu_has_feature() and
mmu_has_feature() prior to jump labels being initialised.

mpe: Use printk() and dump_stack() rather than WARN_ON(), because
WARN_ON() may not work this early in boot. Rename the Kconfig.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:06 +10:00
Kevin Hao
c12e6f24d4 powerpc: Add option to use jump label for mmu_has_feature()
As we just did for CPU features.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:06 +10:00
Kevin Hao
4db7327194 powerpc: Add option to use jump label for cpu_has_feature()
We do binary patching of asm code using CPU features, which is a
one-time operation, done during early boot. However checks of CPU
features in C code are currently done at run time, even though the set
of CPU features can never change after boot.

We can optimise this by using jump labels to implement cpu_has_feature(),
meaning checks in C code are binary patched into a single nop or branch.

For a C sequence along the lines of:

    if (cpu_has_feature(FOO))
         return 2;

The generated code before is roughly:

    ld      r9,-27640(r2)
    ld      r9,0(r9)
    lwz     r9,32(r9)
    cmpwi   cr7,r9,0
    bge     cr7, 1f
    li      r3,2
    blr
1:  ...

After (true):
    nop
    li      r3,2
    blr

After (false):
    b	1f
    li      r3,2
    blr
1:  ...

mpe: Rename MAX_CPU_FEATURES as we already have a #define with that
name, and define it simply as a constant, rather than doing tricks with
sizeof and NULL pointers. Rename the array to cpu_feature_keys. Use the
kconfig we added to guard it. Add BUILD_BUG_ON() if the feature is not a
compile time constant. Rewrite the change log.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:05 +10:00
Michael Ellerman
bfbfc8a43c powerpc: Add kconfig option to use jump labels for cpu/mmu_has_feature()
Add a kconfig option to control whether we use jump label for the
cpu/mmu_has_feature() checks. Currently this does nothing, but we will
enabled it in the subsequent patches.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:04 +10:00
Kevin Hao
b92a226e52 powerpc: Move cpu_has_feature() to a separate file
We plan to use jump label for cpu_has_feature(). In order to implement
this we need to include the linux/jump_label.h in asm/cputable.h.

Unfortunately if we do that it leads to an include loop. The root of the
problem seems to be that reg.h needs cputable.h (for CPU_FTRs), and then
cputable.h via jump_label.h eventually pulls in hw_irq.h which needs
reg.h (for MSR_EE).

So move cpu_has_feature() to a separate file on its own.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Rename to cpu_has_feature.h and flesh out change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:03 +10:00
Kevin Hao
905259e33d powerpc: Remove mfvtb()
This function is only used by get_vtb(). They are almost the same except
the reading from the real register. Move the mfspr() to get_vtb() and
kill the function mfvtb(). With this, we can eliminate the use of
cpu_has_feature() in very core header file like reg.h. This is a
preparation for the use of jump label for cpu_has_feature().

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:03 +10:00
Aneesh Kumar K.V
309b315b6e powerpc: Call jump_label_init() in apply_feature_fixups()
Call jump_label_init() early so that we can use static keys for CPU and
MMU feature checks.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:02 +10:00
Aneesh Kumar K.V
b8f1b4f860 powerpc/mm: Convert early cpu/mmu feature check to use the new helpers
This switches early feature checks to use the non static key variant of
the function. In later patches we will be switching cpu_has_feature()
and mmu_has_feature() to use static keys and we can use them only after
static key/jump label is initialized. Any check for feature before jump
label init should be done using this new helper.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:01 +10:00
Michael Ellerman
a141cca389 powerpc/mm: Add early_[cpu|mmu]_has_feature()
In later patches, we will be switching CPU and MMU feature checks to
use static keys.

For checks in early boot before jump label is initialized we need a
variant of [cpu|mmu]_has_feature() that doesn't use jump labels.

So create those called, unimaginatively, early_[cpu|mmu]_has_feature().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:00 +10:00
Michael Ellerman
bab4c8de62 powerpc/mm: Define radix_enabled() in one place & use static inline
Currently we have radix_enabled() three times, twice in asm/book3s/64/mmu.h
and then a fallback in asm/mmu.h.

Consolidate them in asm/mmu.h. While we're at it convert them to be
static inlines, and change the fallback case to returning a bool, like
mmu_has_feature().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:59 +10:00
Michael Ellerman
6574ba950b powerpc/kernel: Convert cpu_has_feature() to returning bool
The intention is that the result is only used as a boolean, so enforce
that by changing the return type to bool.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:58 +10:00
Michael Ellerman
a81dc9d995 powerpc/kernel: Convert mmu_has_feature() to returning bool
The intention is that the result is only used as a boolean, so enforce
that by changing the return type to bool.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:58 +10:00
Aneesh Kumar K.V
5a25b6f527 powerpc/mm: Make MMU_FTR_RADIX a MMU family feature
MMU feature bits are defined such that we use the lower half to
present MMU family features. Remove the strict split of half and
also move Radix to a mmu family feature. Radix introduce a new MMU
model and strictly speaking it is a new MMU family. This also free
up bits which can be used for individual features later.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:57 +10:00
Michael Ellerman
a28e46f109 powerpc/kernel: Check features don't change after patching
Early in boot we binary patch some sections of code based on the CPU and
MMU feature bits. But it is a one-time patching, there is no facility
for repatching the code later if the set of features change.

It is a major bug if the set of features changes after we've done the
code patching - so add a check for it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:56 +10:00
Michael Ellerman
9e8066f398 powerpc/64: Do feature patching before MMU init
Up until now we needed to do the MMU init before feature patching,
because part of the MMU init was scanning the device tree and setting
and/or clearing some MMU feature bits.

Now that we have split that MMU feature modification out into routines
called from early_init_devtree() (called earlier) we can now do feature
patching before calling MMU init.

The advantage of this is it means the remainder of the MMU init runs
with the final set of features which will apply for the rest of the life
of the system. This means we don't have to special case anything called
from MMU init to deal with a changing set of feature bits.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:56 +10:00
Michael Ellerman
2537b09c93 powerpc/mm: Do radix device tree scanning earlier
Like we just did for hash, split the device tree scanning parts out and
call them from mmu_early_init_devtree().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:55 +10:00
Michael Ellerman
bacf9cf883 powerpc/mm: Do hash device tree scanning earlier
Currently MMU initialisation (early_init_mmu()) consists of a mixture of
scanning the device tree, setting MMU feature bits, and then also doing
actual initialisation of MMU data structures.

We'd like to decouple the setting of the MMU features from the actual
setup. So split out the device tree scanning, and associated code, and
call it from mmu_init_early_devtree().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:54 +10:00
Michael Ellerman
c610ec60ed powerpc/mm: Move disable_radix handling into mmu_early_init_devtree()
Move the handling of the disable_radix command line argument into the
newly created mmu_early_init_devtree().

It's an MMU option so it's preferable to have it in an mm related file,
and it also means platforms that don't support radix don't have to carry
the code.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:53 +10:00
Michael Ellerman
1a01dc87e0 powerpc/mm: Add mmu_early_init_devtree()
Empty for now, but we'll add to it in the next patch.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:53 +10:00
Linus Torvalds
bad60e6f25 powerpc updates for 4.8 # 1
Highlights:
  - PowerNV PCI hotplug support.
  - Lots more Power9 support.
  - eBPF JIT support on ppc64le.
  - Lots of cxl updates.
  - Boot code consolidation.
 
 Bug fixes:
  - Fix spin_unlock_wait() from Boqun Feng
  - Fix stack pointer corruption in __tm_recheckpoint() from Michael Neuling
  - Fix multiple bugs in memory_hotplug_max() from Bharata B Rao
  - mm: Ensure "special" zones are empty from Oliver O'Halloran
  - ftrace: Separate the heuristics for checking call sites from Michael Ellerman
  - modules: Never restore r2 for a mprofile-kernel style mcount() call from Michael Ellerman
  - Fix endianness when reading TCEs from Alexey Kardashevskiy
  - start rtasd before PCI probing from Greg Kurz
  - PCI: rpaphp: Fix slot registration for multiple slots under a PHB from Tyrel Datwyler
  - powerpc/mm: Add memory barrier in __hugepte_alloc() from Sukadev Bhattiprolu
 
 Cleanups & fixes:
  - Drop support for MPIC in pseries from Rashmica Gupta
  - Define and use PPC64_ELF_ABI_v2/v1 from Michael Ellerman
  - Remove unused symbols in asm-offsets.c from Rashmica Gupta
  - Fix SRIOV not building without EEH enabled from Russell Currey
  - Remove kretprobe_trampoline_holder. from Thiago Jung Bauermann
  - Reduce log level of PCI I/O space warning from Benjamin Herrenschmidt
  - Add array bounds checking to crash_shutdown_handlers from Suraj Jitindar Singh
  - Avoid -maltivec when using clang integrated assembler from Anton Blanchard
  - Fix array overrun in ppc_rtas() syscall from Andrew Donnellan
  - Fix error return value in cmm_mem_going_offline() from Rasmus Villemoes
  - export cpu_to_core_id() from Mauricio Faria de Oliveira
  - Remove old symbols from defconfigs from Andrew Donnellan
  - Update obsolete comments in setup_32.c about entry conditions from Benjamin Herrenschmidt
  - Add comment explaining the purpose of setup_kdump_trampoline() from Benjamin Herrenschmidt
  - Merge the RELOCATABLE config entries for ppc32 and ppc64 from Kevin Hao
  - Remove RELOCATABLE_PPC32 from Kevin Hao
  - Fix .long's in tlb-radix.c to more meaningful from Balbir Singh
 
 Minor cleanups & fixes:
  - Andrew Donnellan, Anna-Maria Gleixner, Anton Blanchard, Benjamin
    Herrenschmidt, Bharata B Rao, Christophe Leroy, Colin Ian King, Geliang
    Tang, Greg Kurz, Madhavan Srinivasan, Michael Ellerman, Michael Ellerman,
    Stephen Rothwell, Stewart Smith.
 
 Freescale updates from Scott:
  - "Highlights include more 8xx optimizations, device tree updates,
    and MVME7100 support."
 
 PowerNV PCI hotplug from Gavin Shan:
  - PCI: Add pcibios_setup_bridge()
  - Override pcibios_setup_bridge()
  - Remove PCI_RESET_DELAY_US
  - Move pnv_pci_ioda_setup_opal_tce_kill() around
  - Increase PE# capacity
  - Allocate PE# in reverse order
  - Create PEs in pcibios_setup_bridge()
  - Setup PE for root bus
  - Extend PCI bridge resources
  - Make pnv_ioda_deconfigure_pe() visible
  - Dynamically release PE
  - Update bridge windows on PCI plug
  - Delay populating pdn
  - Support PCI slot ID
  - Use PCI slot reset infrastructure
  - Introduce pnv_pci_get_slot_id()
  - Functions to get/set PCI slot state
  - PCI/hotplug: PowerPC PowerNV PCI hotplug driver
  - Print correct PHB type names
 
 Power9 idle support from Shreyas B. Prabhu:
  - set power_save func after the idle states are initialized
  - Use PNV_THREAD_WINKLE macro while requesting for winkle
  - make hypervisor state restore a function
  - Rename idle_power7.S to idle_book3s.S
  - Rename reusable idle functions to hardware agnostic names
  - Make pnv_powersave_common more generic
  - abstraction for saving SPRs before entering deep idle states
  - Add platform support for stop instruction
  - cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATES
  - cpuidle/powernv: cleanup cpuidle-powernv.c
  - cpuidle/powernv: Add support for POWER ISA v3 idle states
  - Use deepest stop state when cpu is offlined
 
 Power9 PMU from Madhavan Srinivasan:
  - factor out power8 pmu macros and defines
  - factor out power8 pmu functions
  - factor out power8 __init_pmu code
  - Add power9 event list macros for generic and cache events
  - Power9 PMU support
  - Export Power9 generic and cache events to sysfs
 
 Power9 preliminary interrupt & PCI support from Benjamin Herrenschmidt:
  - Add XICS emulation APIs
  - Move a few exception common handlers to make room
  - Add support for HV virtualization interrupts
  - Add mechanism to force a replay of interrupts
  - Add ICP OPAL backend
  - Discover IODA3 PHBs
  - pci: Remove obsolete SW invalidate
  - opal: Add real mode call wrappers
  - Rename TCE invalidation calls
  - Remove SWINV constants and obsolete TCE code
  - Rework accessing the TCE invalidate register
  - Fallback to OPAL for TCE invalidations
  - Use the device-tree to get available range of M64's
  - Check status of a PHB before using it
  - pci: Don't try to allocate resources that will be reassigned
 
 Other Power9:
  - Send SIGBUS on unaligned copy and paste from Chris Smart
  - Large Decrementer support from Oliver O'Halloran
  - Load Monitor Register Support from Jack Miller
 
 Performance improvements from Anton Blanchard:
  - Avoid load hit store in __giveup_fpu() and __giveup_altivec()
  - Avoid load hit store in setup_sigcontext()
  - Remove assembly versions of strcpy, strcat, strlen and strcmp
  - Align hot loops of some string functions
 
 eBPF JIT from Naveen N. Rao:
  - Fix/enhance 32-bit Load Immediate implementation
  - Optimize 64-bit Immediate loads
  - Introduce rotate immediate instructions
  - A few cleanups
  - Isolate classic BPF JIT specifics into a separate header
  - Implement JIT compiler for extended BPF
 
 Operator Panel driver from Suraj Jitindar Singh:
  - devicetree/bindings: Add binding for operator panel on FSP machines
  - Add inline function to get rc from an ASYNC_COMP opal_msg
  - Add driver for operator panel on FSP machines
 
 Sparse fixes from Daniel Axtens:
  - make some things static
  - Introduce asm-prototypes.h
  - Include headers containing prototypes
  - Use #ifdef __BIG_ENDIAN__ #else for REG_BYTE
  - kvm: Clarify __user annotations
  - Pass endianness to sparse
  - Make ppc_md.{halt, restart} __noreturn
 
 MM fixes & cleanups from Aneesh Kumar K.V:
  - radix: Update LPCR HR bit as per ISA
  - use _raw variant of page table accessors
  - Compile out radix related functions if RADIX_MMU is disabled
  - Clear top 16 bits of va only on older cpus
  - Print formation regarding the the MMU mode
  - hash: Update SDR1 size encoding as documented in ISA 3.0
  - radix: Update PID switch sequence
  - radix: Update machine call back to support new HCALL.
  - radix: Add LPID based tlb flush helpers
  - radix: Add a kernel command line to disable radix
  - Cleanup LPCR defines
 
 Boot code consolidation from Benjamin Herrenschmidt:
  - Move epapr_paravirt_early_init() to early_init_devtree()
  - cell: Don't use flat device-tree after boot
  - ge_imp3a: Don't use the flat device-tree after boot
  - mpc85xx_ds: Don't use the flat device-tree after boot
  - mpc85xx_rdb: Don't use the flat device-tree after boot
  - Don't test for machine type in rtas_initialize()
  - Don't test for machine type in smp_setup_cpu_maps()
  - dt: Add of_device_compatible_match()
  - Factor do_feature_fixup calls
  - Move 64-bit feature fixup earlier
  - Move 64-bit memory reserves to setup_arch()
  - Use a cachable DART
  - Move FW feature probing out of pseries probe()
  - Put exception configuration in a common place
  - Remove early allocation of the SMU command buffer
  - Move MMU backend selection out of platform code
  - pasemi: Remove IOBMAP allocation from platform probe()
  - mm/hash: Don't use machine_is() early during boot
  - Don't test for machine type to detect HEA special case
  - pmac: Remove spurrious machine type test
  - Move hash table ops to a separate structure
  - Ensure that ppc_md is empty before probing for machine type
  - Move 64-bit probe_machine() to later in the boot process
  - Move 32-bit probe() machine to later in the boot process
  - Get rid of ppc_md.init_early()
  - Move the boot time info banner to a separate function
  - Move setting of {i,d}cache_bsize to initialize_cache_info()
  - Move the content of setup_system() to setup_arch()
  - Move cache info inits to a separate function
  - Re-order the call to smp_setup_cpu_maps()
  - Re-order setup_panic()
  - Make a few boot functions __init
  - Merge 32-bit and 64-bit setup_arch()
 
 Other new features:
  - tty/hvc: Use IRQF_SHARED for OPAL hvc consoles from Sam Mendoza-Jonas
  - tty/hvc: Use opal irqchip interface if available from Sam Mendoza-Jonas
  - powerpc: Add module autoloading based on CPU features from Alastair D'Silva
  - crypto: vmx - Convert to CPU feature based module autoloading from Alastair D'Silva
  - Wake up kopald polling thread before waiting for events from Benjamin Herrenschmidt
  - xmon: Dump ISA 2.06 SPRs from Michael Ellerman
  - xmon: Dump ISA 2.07 SPRs from Michael Ellerman
  - Add a parameter to disable 1TB segs from Oliver O'Halloran
  - powerpc/boot: Add OPAL console to epapr wrappers from Oliver O'Halloran
  - Assign fixed PHB number based on device-tree properties from Guilherme G. Piccoli
  - pseries: Add pseries hotplug workqueue from John Allen
  - pseries: Add support for hotplug interrupt source from John Allen
  - pseries: Use kernel hotplug queue for PowerVM hotplug events from John Allen
  - pseries: Move property cloning into its own routine from Nathan Fontenot
  - pseries: Dynamic add entires to associativity lookup array from Nathan Fontenot
  - pseries: Auto-online hotplugged memory from Nathan Fontenot
  - pseries: Remove call to memblock_add() from Nathan Fontenot
 
 cxl:
  - Add set and get private data to context struct from Michael Neuling
  - make base more explicitly non-modular from Paul Gortmaker
  - Use for_each_compatible_node() macro from Wei Yongjun
  - Frederic Barrat
    - Abstract the differences between the PSL and XSL
    - Make vPHB device node match adapter's
  - Philippe Bergheaud
    - Add mechanism for delivering AFU driver specific events
    - Ignore CAPI adapters misplaced in switched slots
    - Refine slice error debug messages
  - Andrew Donnellan
    - static-ify variables to fix sparse warnings
    - PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl
    - PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state
    - Add cxl_check_and_switch_mode() API to switch bi-modal cards
    - remove dead Kconfig options
    - fix potential NULL dereference in free_adapter()
  - Ian Munsie
    - Update process element after allocating interrupts
    - Add support for CAPP DMA mode
    - Fix allowing bogus AFU descriptors with 0 maximum processes
    - Fix allocating a minimum of 2 pages for the SPA
    - Fix bug where AFU disable operation had no effect
    - Workaround XSL bug that does not clear the RA bit after a reset
    - Fix NULL pointer dereference on kernel contexts with no AFU interrupts
    - powerpc/powernv: Split cxl code out into a separate file
    - Add cxl_slot_is_supported API
    - Enable bus mastering for devices using CAPP DMA mode
    - Move cxl_afu_get / cxl_afu_put to base
    - Allow a default context to be associated with an external pci_dev
    - Do not create vPHB if there are no AFU configuration records
    - powerpc/powernv: Add support for the cxl kernel api on the real phb
    - Add support for using the kernel API with a real PHB
    - Add kernel APIs to get & set the max irqs per context
    - Add preliminary workaround for CX4 interrupt limitation
    - Add support for interrupts on the Mellanox CX4
    - Workaround PE=0 hardware limitation in Mellanox CX4
    - powerpc/powernv: Fix pci-cxl.c build when CONFIG_MODULES=n
 
 selftests:
  - Test unaligned copy and paste from Chris Smart
  - Load Monitor Register Tests from Jack Miller
  - Cyril Bur
    - exec() with suspended transaction
    - Use signed long to read perf_event_paranoid
    - Fix usage message in context_switch
    - Fix generation of vector instructions/types in context_switch
  - Michael Ellerman
    - Use "Delta" rather than "Error" in normal output
    - Import Anton's mmap & futex micro benchmarks
    - Add a test for PROT_SAO
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXnWchAAoJEFHr6jzI4aWAe64P/36Vd9yJLptjkoyZp8/IQtu1
 Cv8buQwGdKuSMzdkcUAOXcC3fe2u70ZWXMKKLfY3koIV1IAiqdWk5/XWRKMP2XmE
 dG0LhSf0uu7uh+mE0WvQnRu46ImeKtQ+mPp4Hbs/s9SxMSeYjruv3vdWWmgUq0cl
 Gac2qJSRtAMmgLuHWMjf7N5mxOTOnKejU4o2i9cJ+YHmWKOdCigv2Ge1UadOQFlC
 E7tRPiUR3asfDfj+e+LVTTdToH6p8pk+mOUzIoZ8jIkQ+IXzi62UDl5+Rw9mqiuX
 1CtqEMUXxo2qwX+d4TcV/QUOp0YKPuIcUZ9NMMS+S3lOyJ4NFt+j2Izk7QJp5kNP
 gKVqB68TjDQsBuDr3P9ynlHbduxTIhZAqopbTrLe0FIg48nUe4n1yHJBVzqaVajX
 rFBJSsSUffBLAARNPSXJJhIgc2C1/qOC8dgMeDMcR2kPirDHaQZ/lY1yEpq1yiqR
 q6e3v5hvIAm4IjbYk0mF7TUxBrPGVE/ExyBINyASRoYxAJ1PyeD/iljZ9vI3asRA
 s+hhxT8H3f7lnqTrmJqMjHgAdGkmag07EdmvFNX4xK4aADSy7Y6g4dw25ffRopo9
 p9Jf9HX+dZv65Y3UjbV/6HuXcaSEBJJLSVWvii65PebqSN0LuHEFvNeIJ6Iblx0B
 AWh/hd0Iin2gdkcG39Mr
 =Z5kM
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Highlights:
   - PowerNV PCI hotplug support.
   - Lots more Power9 support.
   - eBPF JIT support on ppc64le.
   - Lots of cxl updates.
   - Boot code consolidation.

  Bug fixes:
   - Fix spin_unlock_wait() from Boqun Feng
   - Fix stack pointer corruption in __tm_recheckpoint() from Michael
     Neuling
   - Fix multiple bugs in memory_hotplug_max() from Bharata B Rao
   - mm: Ensure "special" zones are empty from Oliver O'Halloran
   - ftrace: Separate the heuristics for checking call sites from
     Michael Ellerman
   - modules: Never restore r2 for a mprofile-kernel style mcount() call
     from Michael Ellerman
   - Fix endianness when reading TCEs from Alexey Kardashevskiy
   - start rtasd before PCI probing from Greg Kurz
   - PCI: rpaphp: Fix slot registration for multiple slots under a PHB
     from Tyrel Datwyler
   - powerpc/mm: Add memory barrier in __hugepte_alloc() from Sukadev
     Bhattiprolu

  Cleanups & fixes:
   - Drop support for MPIC in pseries from Rashmica Gupta
   - Define and use PPC64_ELF_ABI_v2/v1 from Michael Ellerman
   - Remove unused symbols in asm-offsets.c from Rashmica Gupta
   - Fix SRIOV not building without EEH enabled from Russell Currey
   - Remove kretprobe_trampoline_holder from Thiago Jung Bauermann
   - Reduce log level of PCI I/O space warning from Benjamin
     Herrenschmidt
   - Add array bounds checking to crash_shutdown_handlers from Suraj
     Jitindar Singh
   - Avoid -maltivec when using clang integrated assembler from Anton
     Blanchard
   - Fix array overrun in ppc_rtas() syscall from Andrew Donnellan
   - Fix error return value in cmm_mem_going_offline() from Rasmus
     Villemoes
   - export cpu_to_core_id() from Mauricio Faria de Oliveira
   - Remove old symbols from defconfigs from Andrew Donnellan
   - Update obsolete comments in setup_32.c about entry conditions from
     Benjamin Herrenschmidt
   - Add comment explaining the purpose of setup_kdump_trampoline() from
     Benjamin Herrenschmidt
   - Merge the RELOCATABLE config entries for ppc32 and ppc64 from Kevin
     Hao
   - Remove RELOCATABLE_PPC32 from Kevin Hao
   - Fix .long's in tlb-radix.c to more meaningful from Balbir Singh

  Minor cleanups & fixes:
   - Andrew Donnellan, Anna-Maria Gleixner, Anton Blanchard, Benjamin
     Herrenschmidt, Bharata B Rao, Christophe Leroy, Colin Ian King,
     Geliang Tang, Greg Kurz, Madhavan Srinivasan, Michael Ellerman,
     Michael Ellerman, Stephen Rothwell, Stewart Smith.

  Freescale updates from Scott:
   - "Highlights include more 8xx optimizations, device tree updates,
     and MVME7100 support."

  PowerNV PCI hotplug from Gavin Shan:
   - PCI: Add pcibios_setup_bridge()
   - Override pcibios_setup_bridge()
   - Remove PCI_RESET_DELAY_US
   - Move pnv_pci_ioda_setup_opal_tce_kill() around
   - Increase PE# capacity
   - Allocate PE# in reverse order
   - Create PEs in pcibios_setup_bridge()
   - Setup PE for root bus
   - Extend PCI bridge resources
   - Make pnv_ioda_deconfigure_pe() visible
   - Dynamically release PE
   - Update bridge windows on PCI plug
   - Delay populating pdn
   - Support PCI slot ID
   - Use PCI slot reset infrastructure
   - Introduce pnv_pci_get_slot_id()
   - Functions to get/set PCI slot state
   - PCI/hotplug: PowerPC PowerNV PCI hotplug driver
   - Print correct PHB type names

  Power9 idle support from Shreyas B. Prabhu:
   - set power_save func after the idle states are initialized
   - Use PNV_THREAD_WINKLE macro while requesting for winkle
   - make hypervisor state restore a function
   - Rename idle_power7.S to idle_book3s.S
   - Rename reusable idle functions to hardware agnostic names
   - Make pnv_powersave_common more generic
   - abstraction for saving SPRs before entering deep idle states
   - Add platform support for stop instruction
   - cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATES
   - cpuidle/powernv: cleanup cpuidle-powernv.c
   - cpuidle/powernv: Add support for POWER ISA v3 idle states
   - Use deepest stop state when cpu is offlined

  Power9 PMU from Madhavan Srinivasan:
   - factor out power8 pmu macros and defines
   - factor out power8 pmu functions
   - factor out power8 __init_pmu code
   - Add power9 event list macros for generic and cache events
   - Power9 PMU support
   - Export Power9 generic and cache events to sysfs

  Power9 preliminary interrupt & PCI support from Benjamin Herrenschmidt:
   - Add XICS emulation APIs
   - Move a few exception common handlers to make room
   - Add support for HV virtualization interrupts
   - Add mechanism to force a replay of interrupts
   - Add ICP OPAL backend
   - Discover IODA3 PHBs
   - pci: Remove obsolete SW invalidate
   - opal: Add real mode call wrappers
   - Rename TCE invalidation calls
   - Remove SWINV constants and obsolete TCE code
   - Rework accessing the TCE invalidate register
   - Fallback to OPAL for TCE invalidations
   - Use the device-tree to get available range of M64's
   - Check status of a PHB before using it
   - pci: Don't try to allocate resources that will be reassigned

  Other Power9:
   - Send SIGBUS on unaligned copy and paste from Chris Smart
   - Large Decrementer support from Oliver O'Halloran
   - Load Monitor Register Support from Jack Miller

  Performance improvements from Anton Blanchard:
   - Avoid load hit store in __giveup_fpu() and __giveup_altivec()
   - Avoid load hit store in setup_sigcontext()
   - Remove assembly versions of strcpy, strcat, strlen and strcmp
   - Align hot loops of some string functions

  eBPF JIT from Naveen N. Rao:
   - Fix/enhance 32-bit Load Immediate implementation
   - Optimize 64-bit Immediate loads
   - Introduce rotate immediate instructions
   - A few cleanups
   - Isolate classic BPF JIT specifics into a separate header
   - Implement JIT compiler for extended BPF

  Operator Panel driver from Suraj Jitindar Singh:
   - devicetree/bindings: Add binding for operator panel on FSP machines
   - Add inline function to get rc from an ASYNC_COMP opal_msg
   - Add driver for operator panel on FSP machines

  Sparse fixes from Daniel Axtens:
   - make some things static
   - Introduce asm-prototypes.h
   - Include headers containing prototypes
   - Use #ifdef __BIG_ENDIAN__ #else for REG_BYTE
   - kvm: Clarify __user annotations
   - Pass endianness to sparse
   - Make ppc_md.{halt, restart} __noreturn

  MM fixes & cleanups from Aneesh Kumar K.V:
   - radix: Update LPCR HR bit as per ISA
   - use _raw variant of page table accessors
   - Compile out radix related functions if RADIX_MMU is disabled
   - Clear top 16 bits of va only on older cpus
   - Print formation regarding the the MMU mode
   - hash: Update SDR1 size encoding as documented in ISA 3.0
   - radix: Update PID switch sequence
   - radix: Update machine call back to support new HCALL.
   - radix: Add LPID based tlb flush helpers
   - radix: Add a kernel command line to disable radix
   - Cleanup LPCR defines

  Boot code consolidation from Benjamin Herrenschmidt:
   - Move epapr_paravirt_early_init() to early_init_devtree()
   - cell: Don't use flat device-tree after boot
   - ge_imp3a: Don't use the flat device-tree after boot
   - mpc85xx_ds: Don't use the flat device-tree after boot
   - mpc85xx_rdb: Don't use the flat device-tree after boot
   - Don't test for machine type in rtas_initialize()
   - Don't test for machine type in smp_setup_cpu_maps()
   - dt: Add of_device_compatible_match()
   - Factor do_feature_fixup calls
   - Move 64-bit feature fixup earlier
   - Move 64-bit memory reserves to setup_arch()
   - Use a cachable DART
   - Move FW feature probing out of pseries probe()
   - Put exception configuration in a common place
   - Remove early allocation of the SMU command buffer
   - Move MMU backend selection out of platform code
   - pasemi: Remove IOBMAP allocation from platform probe()
   - mm/hash: Don't use machine_is() early during boot
   - Don't test for machine type to detect HEA special case
   - pmac: Remove spurrious machine type test
   - Move hash table ops to a separate structure
   - Ensure that ppc_md is empty before probing for machine type
   - Move 64-bit probe_machine() to later in the boot process
   - Move 32-bit probe() machine to later in the boot process
   - Get rid of ppc_md.init_early()
   - Move the boot time info banner to a separate function
   - Move setting of {i,d}cache_bsize to initialize_cache_info()
   - Move the content of setup_system() to setup_arch()
   - Move cache info inits to a separate function
   - Re-order the call to smp_setup_cpu_maps()
   - Re-order setup_panic()
   - Make a few boot functions __init
   - Merge 32-bit and 64-bit setup_arch()

  Other new features:
   - tty/hvc: Use IRQF_SHARED for OPAL hvc consoles from Sam Mendoza-Jonas
   - tty/hvc: Use opal irqchip interface if available from Sam Mendoza-Jonas
   - powerpc: Add module autoloading based on CPU features from Alastair D'Silva
   - crypto: vmx - Convert to CPU feature based module autoloading from Alastair D'Silva
   - Wake up kopald polling thread before waiting for events from Benjamin Herrenschmidt
   - xmon: Dump ISA 2.06 SPRs from Michael Ellerman
   - xmon: Dump ISA 2.07 SPRs from Michael Ellerman
   - Add a parameter to disable 1TB segs from Oliver O'Halloran
   - powerpc/boot: Add OPAL console to epapr wrappers from Oliver O'Halloran
   - Assign fixed PHB number based on device-tree properties from Guilherme G. Piccoli
   - pseries: Add pseries hotplug workqueue from John Allen
   - pseries: Add support for hotplug interrupt source from John Allen
   - pseries: Use kernel hotplug queue for PowerVM hotplug events from John Allen
   - pseries: Move property cloning into its own routine from Nathan Fontenot
   - pseries: Dynamic add entires to associativity lookup array from Nathan Fontenot
   - pseries: Auto-online hotplugged memory from Nathan Fontenot
   - pseries: Remove call to memblock_add() from Nathan Fontenot

  cxl:
   - Add set and get private data to context struct from Michael Neuling
   - make base more explicitly non-modular from Paul Gortmaker
   - Use for_each_compatible_node() macro from Wei Yongjun
   - Frederic Barrat
   - Abstract the differences between the PSL and XSL
   - Make vPHB device node match adapter's
   - Philippe Bergheaud
   - Add mechanism for delivering AFU driver specific events
   - Ignore CAPI adapters misplaced in switched slots
   - Refine slice error debug messages
   - Andrew Donnellan
   - static-ify variables to fix sparse warnings
   - PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl
   - PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state
   - Add cxl_check_and_switch_mode() API to switch bi-modal cards
   - remove dead Kconfig options
   - fix potential NULL dereference in free_adapter()
   - Ian Munsie
   - Update process element after allocating interrupts
   - Add support for CAPP DMA mode
   - Fix allowing bogus AFU descriptors with 0 maximum processes
   - Fix allocating a minimum of 2 pages for the SPA
   - Fix bug where AFU disable operation had no effect
   - Workaround XSL bug that does not clear the RA bit after a reset
   - Fix NULL pointer dereference on kernel contexts with no AFU interrupts
   - powerpc/powernv: Split cxl code out into a separate file
   - Add cxl_slot_is_supported API
   - Enable bus mastering for devices using CAPP DMA mode
   - Move cxl_afu_get / cxl_afu_put to base
   - Allow a default context to be associated with an external pci_dev
   - Do not create vPHB if there are no AFU configuration records
   - powerpc/powernv: Add support for the cxl kernel api on the real phb
   - Add support for using the kernel API with a real PHB
   - Add kernel APIs to get & set the max irqs per context
   - Add preliminary workaround for CX4 interrupt limitation
   - Add support for interrupts on the Mellanox CX4
   - Workaround PE=0 hardware limitation in Mellanox CX4
   - powerpc/powernv: Fix pci-cxl.c build when CONFIG_MODULES=n

  selftests:
   - Test unaligned copy and paste from Chris Smart
   - Load Monitor Register Tests from Jack Miller
   - Cyril Bur
   - exec() with suspended transaction
   - Use signed long to read perf_event_paranoid
   - Fix usage message in context_switch
   - Fix generation of vector instructions/types in context_switch
   - Michael Ellerman
   - Use "Delta" rather than "Error" in normal output
   - Import Anton's mmap & futex micro benchmarks
   - Add a test for PROT_SAO"

* tag 'powerpc-4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (263 commits)
  powerpc/mm: Parenthesise IS_ENABLED() in if condition
  tty/hvc: Use opal irqchip interface if available
  tty/hvc: Use IRQF_SHARED for OPAL hvc consoles
  selftests/powerpc: exec() with suspended transaction
  powerpc: Improve comment explaining why we modify VRSAVE
  powerpc/mm: Drop unused externs for hpte_init_beat[_v3]()
  powerpc/mm: Rename hpte_init_lpar() and move the fallback to a header
  powerpc/mm: Fix build break when PPC_NATIVE=n
  crypto: vmx - Convert to CPU feature based module autoloading
  powerpc: Add module autoloading based on CPU features
  powerpc/powernv/ioda: Fix endianness when reading TCEs
  powerpc/mm: Add memory barrier in __hugepte_alloc()
  powerpc/modules: Never restore r2 for a mprofile-kernel style mcount() call
  powerpc/ftrace: Separate the heuristics for checking call sites
  powerpc: Merge 32-bit and 64-bit setup_arch()
  powerpc/64: Make a few boot functions __init
  powerpc: Re-order setup_panic()
  powerpc: Re-order the call to smp_setup_cpu_maps()
  powerpc/32: Move cache info inits to a separate function
  powerpc/64: Move the content of setup_system() to setup_arch()
  ...
2016-07-30 21:01:36 -07:00
Linus Torvalds
f64d6e2aaa DeviceTree update for 4.8:
- Removal of most of_platform_populate() calls in arch code. Now the DT
 core code calls it in the default case and platforms only need to call
 it if they have special needs.
 
 - Use pr_fmt on all the DT core print statements.
 
 - CoreSight binding doc improvements to block name descriptions.
 
 - Add dt_to_config script which can parse dts files and list
 corresponding kernel config options.
 
 - Fix memory leak hit with a PowerMac DT.
 
 - Correct a bunch of STMicro compatible strings to use the correct
 vendor prefix.
 
 - Fix DA9052 PMIC binding doc to match what is actually used in dts
 files.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXm9KcAAoJEPr7XbWNvGHDRT4QAIIIOSB4AWHardnMLROgGge9
 aOQKZ/05O9feOcxYKe8FkQbcH+IujJjrUL+yrRD36yGQPAyBP21gtcmmfrkCcwFM
 kH915f/JbGvXpfwEf8dcarHhzYH6FFJiQGduPpWfwSSWynx+xq5EKPwCqYzMg8bN
 SExxt7vUx1MKFOExZ0K8BNCo8VMVLUWQoJ1DNeJDuL25Op4EU3i2l1HQNYV/3XDk
 BSA3x7Lw3GjrWEH20VWYn2Azq1OFLY+E2FC2lnG4nbkk5X8dZbUH9PR1Sk7uTQDj
 uxTjWe59NBpliCxKSAbMbTAU/WwSB1pJ0I+zDJBiQsdFT+nb5F4zOrs3qSKHa/A9
 Rv6AC8k5gdSMrDB1dOspfF2vWvOOInXgNV4/Kza0D92mbCpwyUuF+vhE6rfcMrZU
 OiD7rj2/fvO7Y9fUAhrp6zrfrOfH9B1Z9vS+940AlK96YwPE2+J0SA2vBxR/wg8H
 7fj4Ud5X+SFisXWQhh5Wlv0W9o6e7C7fsi8vpkQ7gufmezLFWVnJKsUfQaxGEwhG
 Hkhm9kuSHHMd+6dEnn2756DnNfJAtQv6rSR0/QR4Lf9y5L4dvR3kAQIci8X/nx4P
 sIk+IJWGZG6wziZq59hh+SO6HEqdSNuvh+5sbR0iUimdE/1HsDBdPiocXf/r8iwK
 NY9nGeZPRrXmFgdpoZfm
 =wLMr
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull DeviceTree updates from Rob Herring:

 - remove most of_platform_populate() calls in arch code.  Now the DT
   core code calls it in the default case and platforms only need to
   call it if they have special needs

 - use pr_fmt on all the DT core print statements

 - CoreSight binding doc improvements to block name descriptions

 - add dt_to_config script which can parse dts files and list
   corresponding kernel config options

 - fix memory leak hit with a PowerMac DT

 - correct a bunch of STMicro compatible strings to use the correct
   vendor prefix

 - fix DA9052 PMIC binding doc to match what is actually used in dts
   files

* tag 'devicetree-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (35 commits)
  documentation: da9052: Update regulator bindings names to match DA9052/53 DTS expectations
  xtensa: Partially Revert "xtensa: Remove unnecessary of_platform_populate with default match table"
  xtensa: Fix build error due to missing include file
  MIPS: ath79: Add missing include file
  Fix spelling errors in Documentation/devicetree
  ARM: dts: fix STMicroelectronics compatible strings
  powerpc/dts: fix STMicroelectronics compatible strings
  Documentation: dt: i2c: use correct STMicroelectronics vendor prefix
  scripts/dtc: dt_to_config - kernel config options for a devicetree
  of: fdt: mark unflattened tree as detached
  of: overlay: add resolver error prints
  coresight: document binding acronyms
  Documentation/devicetree: document cavium-pip rx-delay/tx-delay properties
  of: use pr_fmt prefix for all console printing
  of/irq: Mark initialised interrupt controllers as populated
  of: fix memory leak related to safe_name()
  Revert "of/platform: export of_default_bus_match_table"
  of: unittest: use of_platform_default_populate() to populate default bus
  memory: omap-gpmc: use of_platform_default_populate() to populate default bus
  bus: uniphier-system-bus: use of_platform_default_populate() to populate default bus
  ...
2016-07-30 11:32:01 -07:00
Michael Ellerman
719dbb2df7 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
Freescale updates from Scott:

"Highlights include more 8xx optimizations, device tree updates,
and MVME7100 support."
2016-07-30 13:43:19 +10:00
Linus Torvalds
7a1e8b80fb Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Highlights:

   - TPM core and driver updates/fixes
   - IPv6 security labeling (CALIPSO)
   - Lots of Apparmor fixes
   - Seccomp: remove 2-phase API, close hole where ptrace can change
     syscall #"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits)
  apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling
  tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family)
  tpm: Factor out common startup code
  tpm: use devm_add_action_or_reset
  tpm2_i2c_nuvoton: add irq validity check
  tpm: read burstcount from TPM_STS in one 32-bit transaction
  tpm: fix byte-order for the value read by tpm2_get_tpm_pt
  tpm_tis_core: convert max timeouts from msec to jiffies
  apparmor: fix arg_size computation for when setprocattr is null terminated
  apparmor: fix oops, validate buffer size in apparmor_setprocattr()
  apparmor: do not expose kernel stack
  apparmor: fix module parameters can be changed after policy is locked
  apparmor: fix oops in profile_unpack() when policy_db is not present
  apparmor: don't check for vmalloc_addr if kvzalloc() failed
  apparmor: add missing id bounds check on dfa verification
  apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task
  apparmor: use list_next_entry instead of list_entry_next
  apparmor: fix refcount race when finding a child profile
  apparmor: fix ref count leak when profile sha1 hash is read
  apparmor: check that xindex is in trans_table bounds
  ...
2016-07-29 17:38:46 -07:00
Linus Torvalds
a6408f6cb6 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp hotplug updates from Thomas Gleixner:
 "This is the next part of the hotplug rework.

   - Convert all notifiers with a priority assigned

   - Convert all CPU_STARTING/DYING notifiers

     The final removal of the STARTING/DYING infrastructure will happen
     when the merge window closes.

  Another 700 hundred line of unpenetrable maze gone :)"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
  timers/core: Correct callback order during CPU hot plug
  leds/trigger/cpu: Move from CPU_STARTING to ONLINE level
  powerpc/numa: Convert to hotplug state machine
  arm/perf: Fix hotplug state machine conversion
  irqchip/armada: Avoid unused function warnings
  ARC/time: Convert to hotplug state machine
  clocksource/atlas7: Convert to hotplug state machine
  clocksource/armada-370-xp: Convert to hotplug state machine
  clocksource/exynos_mct: Convert to hotplug state machine
  clocksource/arm_global_timer: Convert to hotplug state machine
  rcu: Convert rcutree to hotplug state machine
  KVM/arm/arm64/vgic-new: Convert to hotplug state machine
  smp/cfd: Convert core to hotplug state machine
  x86/x2apic: Convert to CPU hotplug state machine
  profile: Convert to hotplug state machine
  timers/core: Convert to hotplug state machine
  hrtimer: Convert to hotplug state machine
  x86/tboot: Convert to hotplug state machine
  arm64/armv8 deprecated: Convert to hotplug state machine
  hwtracing/coresight-etm4x: Convert to hotplug state machine
  ...
2016-07-29 13:55:30 -07:00
Linus Torvalds
f0c98ebc57 libnvdimm for 4.8
1/ Replace pcommit with ADR / directed-flushing:
    The pcommit instruction, which has not shipped on any product, is
    deprecated. Instead, the requirement is that platforms implement either
    ADR, or provide one or more flush addresses per nvdimm. ADR
    (Asynchronous DRAM Refresh) flushes data in posted write buffers to the
    memory controller on a power-fail event. Flush addresses are defined in
    ACPI 6.x as an NVDIMM Firmware Interface Table (NFIT) sub-structure:
    "Flush Hint Address Structure". A flush hint is an mmio address that
    when written and fenced assures that all previous posted writes
    targeting a given dimm have been flushed to media.
 
 2/ On-demand ARS (address range scrub):
    Linux uses the results of the ACPI ARS commands to track bad blocks
    in pmem devices.  When latent errors are detected we re-scrub the media
    to refresh the bad block list, userspace can also request a re-scrub at
    any time.
 
 3/ Support for the Microsoft DSM (device specific method) command format.
 
 4/ Support for EDK2/OVMF virtual disk device memory ranges.
 
 5/ Various fixes and cleanups across the subsystem.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXmXBsAAoJEB7SkWpmfYgCEwwP/1IOt9ocP+iHLMDH9KE7VaTZ
 NmUDR+Zy6g5cRQM7SgcuU5BXUcx+OsSrSrUTVF1cW994o9Gbz1mFotkv0ZAsPcYY
 ZVRQxo2oqHrssyOcg+PsgKWiXn68rJOCgmpEyzaJywl5qTMst7pzsT1s1f7rSh6h
 trCf4VaJJwxZR8fARGtlHUnnhPe2Orp99EZRKEWprAsIv2kPuWpPHSjRjuEgN1JG
 KW8AYwWqFTtiLRUk86I4KBB0wcDrfctsjgN9Ogd6+aHyQBRnVSr2U+vDCFkC8KLu
 qiDCpYp+yyxBjclnljz7tRRT3GtzfCUWd4v2KVWqgg2IaobUc0Lbukp/rmikUXQP
 WLikT2OCQ994eFK5OX3Q3cIU/4j459TQnof8q14yVSpjAKrNUXVSR5puN7Hxa+V7
 41wKrAsnsyY1oq+Yd/rMR8VfH7PHx3bFkrmRCGZCufLX1UQm4aYj+sWagDKiV3yA
 DiudghbOnhfurfGsnXUVw7y7GKs+gNWNBmB6ndAD6ZEHmKoGUhAEbJDLCc3DnANl
 b/2mv1MIdIcC1DlCmnbbcn6fv6bICe/r8poK3VrCK3UgOq/EOvKIWl7giP+k1JuC
 6DdVYhlNYIVFXUNSLFAwz8OkLu8byx7WDm36iEqrKHtPw+8qa/2bWVgOU6OBgpjV
 cN3edFVIdxvZeMgM5Ubq
 =xCBG
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:

 - Replace pcommit with ADR / directed-flushing.

   The pcommit instruction, which has not shipped on any product, is
   deprecated.  Instead, the requirement is that platforms implement
   either ADR, or provide one or more flush addresses per nvdimm.

   ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers
   to the memory controller on a power-fail event.

   Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware
   Interface Table (NFIT) sub-structure: "Flush Hint Address Structure".
   A flush hint is an mmio address that when written and fenced assures
   that all previous posted writes targeting a given dimm have been
   flushed to media.

 - On-demand ARS (address range scrub).

   Linux uses the results of the ACPI ARS commands to track bad blocks
   in pmem devices.  When latent errors are detected we re-scrub the
   media to refresh the bad block list, userspace can also request a
   re-scrub at any time.

 - Support for the Microsoft DSM (device specific method) command
   format.

 - Support for EDK2/OVMF virtual disk device memory ranges.

 - Various fixes and cleanups across the subsystem.

* tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits)
  libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
  nfit: do an ARS scrub on hitting a latent media error
  nfit: move to nfit/ sub-directory
  nfit, libnvdimm: allow an ARS scrub to be triggered on demand
  libnvdimm: register nvdimm_bus devices with an nd_bus driver
  pmem: clarify a debug print in pmem_clear_poison
  x86/insn: remove pcommit
  Revert "KVM: x86: add pcommit support"
  nfit, tools/testing/nvdimm/: unify shutdown paths
  libnvdimm: move ->module to struct nvdimm_bus_descriptor
  nfit: cleanup acpi_nfit_init calling convention
  nfit: fix _FIT evaluation memory leak + use after free
  tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
  tools/testing/nvdimm: add virtual ramdisk range
  acpi, nfit: treat virtual ramdisk SPA as pmem region
  pmem: kill __pmem address space
  pmem: kill wmb_pmem()
  libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
  fs/dax: remove wmb_pmem()
  libnvdimm, pmem: flush posted-write queues on shutdown
  ...
2016-07-28 17:38:16 -07:00
Paul Mackerras
93d17397e4 KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
It turns out that if the guest does a H_CEDE while the CPU is in
a transactional state, and the H_CEDE does a nap, and the nap
loses the architected state of the CPU (which is is allowed to do),
then we lose the checkpointed state of the virtual CPU.  In addition,
the transactional-memory state recorded in the MSR gets reset back
to non-transactional, and when we try to return to the guest, we take
a TM bad thing type of program interrupt because we are trying to
transition from non-transactional to transactional with a hrfid
instruction, which is not permitted.

The result of the program interrupt occurring at that point is that
the host CPU will hang in an infinite loop with interrupts disabled.
Thus this is a denial of service vulnerability in the host which can
be triggered by any guest (and depending on the guest kernel, it can
potentially triggered by unprivileged userspace in the guest).

This vulnerability has been assigned the ID CVE-2016-5412.

To fix this, we save the TM state before napping and restore it
on exit from the nap, when handling a H_CEDE in real mode.  The
case where H_CEDE exits to host virtual mode is already OK (as are
other hcalls which exit to host virtual mode) because the exit
path saves the TM state.

Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-07-28 16:10:07 +10:00
Paul Mackerras
f024ee0984 KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
This moves the transactional memory state save and restore sequences
out of the guest entry/exit paths into separate procedures.  This is
so that these sequences can be used in going into and out of nap
in a subsequent patch.

The only code changes here are (a) saving and restore LR on the
stack, since these new procedures get called with a bl instruction,
(b) explicitly saving r1 into the PACA instead of assuming that
HSTATE_HOST_R1(r13) is already set, and (c) removing an unnecessary
and redundant setting of MSR[TM] that should have been removed by
commit 9d4d0bdd9e0a ("KVM: PPC: Book3S HV: Add transactional memory
support", 2013-09-24) but wasn't.

Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-07-28 16:09:34 +10:00
Stephen Rothwell
fbef66f0ad powerpc/mm: Parenthesise IS_ENABLED() in if condition
Currently IS_ENABLED() produces an expression surrounded by parentheses,
which allows this code to compile, generating eg:

    else if (1 || 0)
        hpte_init_native();

However a change to the macro in the kbuild tree will break this in
future by removing the parentheses.

Fixes: 7353644fa9 ("powerpc/mm: Fix build break when PPC_NATIVE=n")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-28 12:39:15 +10:00
Linus Torvalds
607e11ab66 New LED class driver:
- LED driver for TI LP3952 6-Channel Color LED
 
 LED core improvements:
 - Only descend into leds directory when CONFIG_NEW_LEDS is set
 - Add no-op gpio_led_register_device when LED subsystem is disabled
 - MAINTAINERS: Add file patterns for led device tree bindings
 
 LED Trigger core improvements:
 - return error if invalid trigger name is provided via sysfs
 
 LED class drivers improvements
 - is31fl32xx: define complete i2c_device_id table
 - is31fl32xx: fix typo in id and match table names
 - leds-gpio: Set of_node for created LED devices
 - pca9532: Add device tree support
 
 Conversion of IDE trigger to common disk trigger:
 - leds: convert IDE trigger to common disk trigger
 - leds: documentation: 'ide-disk' to 'disk-activity'
 - unicore32: use the new LED disk activity trigger
 - parisc: use the new LED disk activity trigger
 - mips: use the new LED disk activity trigger
 - arm: use the new LED disk activity trigger
 - powerpc: use the new LED disk activity trigger
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.17 (GNU/Linux)
 
 iQIcBAABAgAGBQJXlbvcAAoJEL1qUBy3i3wm1NoP/1ujmu9eljBO/0VyyrZNSWvM
 F367twUAmfyrC+xos/lyxn8u8yknyLNfbUStU0L/cpz2GQRNwuIS4h+xrkyqsDCj
 JBCKrZQCqi/Hsj6jBuIUW+s0OFjxhBJpEkeYW7FM6i2+O9EUxZvCuuG2WPQBNayx
 73qG4RdHmB+qFWMlIp//iF8OviSSm2sd5+UEAJH8mYwrbCcv2gT7UjIaIRczpo4H
 JTkc/onR6y/qliUUrbLpMr40bJLA4+p9ZkqEGPS4lsKiUKf5SF7bLg59RNebIF75
 FZ52yhAw55KLiF0yokthlay8cBUjsd52lROjANkIe1iEHBiKoJRKLOA9WozY37qu
 3/9fKcddiPHxVF3ItcwhrldVru2ZxfWFRz/44rCPDTjKEfsuNN/2EkY8aVYCiHpO
 8gSXw/HjLPCSQIp6PVcqGSVZd1VWUYKcJY/xvCa3zQi0r7JdAmYUp76KCkmq9FgB
 DuQTkxM57ndKiz5WqlGU70mzht43/raqzPOQwLMwz5mGKj/nXvNMKFLQ1xhnyfeY
 vw7D2f/wqm74cTMhMS40IDlpJ81EoCYpsnzvtIHti3FPeo8wmVNz30H51JvFRdJm
 hzeju1tdSLwazGBIIAtjLSCLpnkaOq7rMwwGndGO5QCO/+A6gAC3iAKe4YFl+AXV
 CrS9BAdZSslYg+ddJYg6
 =8lmX
 -----END PGP SIGNATURE-----

Merge tag 'leds_for_4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds

Pull LED updates from Jacek Anaszewski:
 "New LED class driver:
   - LED driver for TI LP3952 6-Channel Color LED

  LED core improvements:
   - Only descend into leds directory when CONFIG_NEW_LEDS is set
   - Add no-op gpio_led_register_device when LED subsystem is disabled
   - MAINTAINERS: Add file patterns for led device tree bindings

  LED Trigger core improvements:
   - return error if invalid trigger name is provided via sysfs

  LED class drivers improvements
   - is31fl32xx: define complete i2c_device_id table
   - is31fl32xx: fix typo in id and match table names
   - leds-gpio: Set of_node for created LED devices
   - pca9532: Add device tree support

  Conversion of IDE trigger to common disk trigger:
   - leds: convert IDE trigger to common disk trigger
   - leds: documentation: 'ide-disk' to 'disk-activity'
   - unicore32: use the new LED disk activity trigger
   - parisc: use the new LED disk activity trigger
   - mips: use the new LED disk activity trigger
   - arm: use the new LED disk activity trigger
   - powerpc: use the new LED disk activity trigger"

* tag 'leds_for_4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
  leds: is31fl32xx: define complete i2c_device_id table
  leds: is31fl32xx: fix typo in id and match table names
  leds: LED driver for TI LP3952 6-Channel Color LED
  leds: leds-gpio: Set of_node for created LED devices
  leds: triggers: return error if invalid trigger name is provided via sysfs
  leds: Only descend into leds directory when CONFIG_NEW_LEDS is set
  leds: Add no-op gpio_led_register_device when LED subsystem is disabled
  unicore32: use the new LED disk activity trigger
  parisc: use the new LED disk activity trigger
  mips: use the new LED disk activity trigger
  arm: use the new LED disk activity trigger
  powerpc: use the new LED disk activity trigger
  leds: documentation: 'ide-disk' to 'disk-activity'
  leds: convert IDE trigger to common disk trigger
  leds: pca9532: Add device tree support
  MAINTAINERS: Add file patterns for led device tree bindings
2016-07-27 14:03:52 -07:00
Linus Torvalds
0e06f5c0de Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - a few misc bits

 - ocfs2

 - most(?) of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits)
  thp: fix comments of __pmd_trans_huge_lock()
  cgroup: remove unnecessary 0 check from css_from_id()
  cgroup: fix idr leak for the first cgroup root
  mm: memcontrol: fix documentation for compound parameter
  mm: memcontrol: remove BUG_ON in uncharge_list
  mm: fix build warnings in <linux/compaction.h>
  mm, thp: convert from optimistic swapin collapsing to conservative
  mm, thp: fix comment inconsistency for swapin readahead functions
  thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
  shmem: split huge pages beyond i_size under memory pressure
  thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
  khugepaged: add support of collapse for tmpfs/shmem pages
  shmem: make shmem_inode_info::lock irq-safe
  khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
  thp: extract khugepaged from mm/huge_memory.c
  shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
  shmem: add huge pages support
  shmem: get_unmapped_area align huge page
  shmem: prepare huge= mount option and sysfs knob
  mm, rmap: account shmem thp pages
  ...
2016-07-26 19:55:54 -07:00
Linus Torvalds
1cd04d293c This is the bulk of GPIO changes for the v4.8 kernel cycle.
Core changes:
 
 - The big item is of course the completion of the character
   device ABI. It has now replaced and surpassed the former
   unmaintainable sysfs ABI: we can now hammer (bitbang)
   individual lines or sets of lines and read individual lines
   or sets of lines from userspace, and we can also register
   to listen to GPIO events from userspace. As a tie-in we
   have two new tools in tools/gpio: gpio-hammer and
   gpio-event-mon that illustrate the proper use of the new
   ABI. As someone said: the wild west days of GPIO are now
   over.
 
 - Continued to remove the pointless
   ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB Kconfig symbols.
   I'm patching hexagon, openrisc, powerpc, sh, unicore,
   ia64 and microblaze. These are either ACKed by their
   maintainers or patched anyways after a grace period and
   no response from maintainers. Some archs (ARM) come in from
   their trees, and others (x86) are still not fixed, so I
   might send a second pull request to root it out later in
   this merge window, or just defer to v4.9.
 
 - The GPIO tools are moved to the tools build system.
 
 New drivers:
 
 - New driver for the MAX77620/MAX20024.
 
 - New driver for the Intel Merrifield.
 
 - Enabled PCA953x for the TI PCA9536.
 
 - Enabled PCA953x for the Intel Edison.
 
 - Enabled R8A7792 in the RCAR driver.
 
 Driver improvements:
 
 - The STMPE and F7188x now supports the .get_direction()
   callback.
 
 - The Xilinx driver supports setting multiple lines at
   once.
 
 - ACPI support for the Vulcan GPIO controller.
 
 - The MMIO GPIO driver supports device tree probing.
 
 - The Acer One 10 is supported through the _DEP ACPI
   attribute.
 
 Cleanups:
 
 - A major cleanup of the OF/DT support code. It is way
   easier to read and understand now, probably this improves
   performance too.
 
 - Drop a few redundant .owner assignments.
 
 - Remove CLPS711x boardfile support: we are 100% DT.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXlcT4AAoJEEEQszewGV1zACwQAK5SZr0F5c3QvYbJSiJBCGA7
 MZKUYHnYoBpZaPKcFKoOXEM1WOvlABlh9U0y0xkL8gQ6giyKup1wYJJCuYgW29gL
 ny4r7Z8rs2Wm1ujL+FLAwuxIwCY3BnhUucp8YiSaHPBuKRfsHorFPvXiAgLZjNYC
 Qk3Q48xYW4inw9sy2BbMfsU3CZnkvgy5euooyy1ezwachRhuHdBy/MVCG012PC4s
 0d6LGdByEx1uK4NeV7ssPys444M8unep2EWgy6Rvc1U+FmGA487EvL+X8nxTQTj3
 uTMxA8nddmZTEeEIqhpRw/dPiFlWxPFwfWmNEre05gKLb/LUK2tgsUOnmIFgVUw/
 t41IzdQNLQQZxmiXplZn6s5mAr2VNuTxkRq1CIl4SwQW+Uy4TU3q8aDPkKzsyhiR
 yw6o6ul0pQs8UZEggnht8ie6JiSnJ55ehI/nlRxpK/797Ff6Yp4FARs3ZtFnQDDu
 SWewnbRatZQ89lvy4BA7QCWeV4Scjk4k/e2HjUAFnkfMDaYqpi4vTdzwnWdVjd+F
 hMgu6VnkN3oSE7ZMrKJMh7b7h1uMnIwKBFWbkrlOEuhT1X0ZDsEOBv5juSBPYomN
 EOIJUyWqxn0ZfxeONbdbCPteYlfJF+TW/rE9LQMxS1nNwsqw2IQW6NCmrM9Nx6Fv
 FP++26nYMTSh82gwOYw3
 =NwcK
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO updates from Linus Walleij:
 "This is the bulk of GPIO changes for the v4.8 kernel cycle.  The big
  news is the completion of the chardev ABI which I'm very happy about
  and apart from that it's an ordinary, quite busy cycle.  The details
  are below.

  The patches are tested in linux-next for some time, patches to other
  subsystem mostly have ACKs.

  I got overly ambitious with configureing lines as input for IRQ lines
  but it turns out that some controllers have their interrupt-enable and
  input-enabling in orthogonal settings so the assumption that all IRQ
  lines are input lines does not hold.  Oh well, revert and back to the
  drawing board with that.

  Core changes:

   - The big item is of course the completion of the character device
     ABI.  It has now replaced and surpassed the former unmaintainable
     sysfs ABI: we can now hammer (bitbang) individual lines or sets of
     lines and read individual lines or sets of lines from userspace,
     and we can also register to listen to GPIO events from userspace.

     As a tie-in we have two new tools in tools/gpio: gpio-hammer and
     gpio-event-mon that illustrate the proper use of the new ABI.  As
     someone said: the wild west days of GPIO are now over.

   - Continued to remove the pointless ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB
     Kconfig symbols.  I'm patching hexagon, openrisc, powerpc, sh,
     unicore, ia64 and microblaze.  These are either ACKed by their
     maintainers or patched anyways after a grace period and no response
     from maintainers.

     Some archs (ARM) come in from their trees, and others (x86) are
     still not fixed, so I might send a second pull request to root it
     out later in this merge window, or just defer to v4.9.

   - The GPIO tools are moved to the tools build system.

  New drivers:

   - New driver for the MAX77620/MAX20024.

   - New driver for the Intel Merrifield.

   - Enabled PCA953x for the TI PCA9536.

   - Enabled PCA953x for the Intel Edison.

   - Enabled R8A7792 in the RCAR driver.

  Driver improvements:

   - The STMPE and F7188x now supports the .get_direction() callback.

   - The Xilinx driver supports setting multiple lines at once.

   - ACPI support for the Vulcan GPIO controller.

   - The MMIO GPIO driver supports device tree probing.

   - The Acer One 10 is supported through the _DEP ACPI attribute.

  Cleanups:

   - A major cleanup of the OF/DT support code.  It is way easier to
     read and understand now, probably this improves performance too.

   - Drop a few redundant .owner assignments.

   - Remove CLPS711x boardfile support: we are 100% DT"

* tag 'gpio-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (67 commits)
  MAINTAINERS: Add INTEL MERRIFIELD GPIO entry
  gpio: dwapb: add missing fwnode_handle_put() in dwapb_gpio_get_pdata()
  gpio: merrifield: Protect irq_ack() and gpio_set() by lock
  gpio: merrifield: Introduce GPIO driver to support Merrifield
  gpio: intel-mid: Make it depend to X86_INTEL_MID
  gpio: intel-mid: Sort header block alphabetically
  gpio: intel-mid: Remove potentially harmful code
  gpio: rcar: add R8A7792 support
  gpiolib: remove duplicated include from gpiolib.c
  Revert "gpio: convince line to become input in irq helper"
  gpiolib: of_find_gpio(): Don't discard errors
  gpio: of: Allow overriding the device node
  gpio: free handles in fringe cases
  gpio: tps65218: Add platform_device_id table
  gpio: max77620: get gpio value based on direction
  gpio: lynxpoint: avoid potential warning on error path
  tools/gpio: add install section
  tools/gpio: move to tools buildsystem
  gpio: intel-mid: switch to devm_gpiochip_add_data()
  gpio: 74x164: Use spi_write() helper instead of open coding
  ...
2016-07-26 19:16:01 -07:00
Linus Torvalds
1b3fc0bef8 pstore subsystem updates for v4.8
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJXloQYAAoJEIly9N/cbcAmGC0QAIpyoiqEuDiJq/XpRg1ux+PC
 Vyr15Pub9yQYwcrWffMH1Zr0GhlFmXb1iP9rp36zdtMhjBEfq7wegvblLVMlzl6G
 7nYt8hJjDh/h8iw1lElgDL2kwUbTym43HoczJNvY/lOmFuUMK8AoDIRYjFTLAKfQ
 S4KA9MFJe3kDh4OUoQVfQNrC2VReLD4uvXk4EUF0wDYoqjVKyU3WBHOMgEmggKTR
 cb+fwhg3Lj4cuMMtZqy8wCqZ/hqhaH8giHC9YbIZQyre3ylncH9xUZyfiqS6nQGc
 eLc03qxqDNsmZvcY6cJgXldLQ3tXM4o96Moakzn2n4sQcW9vh/3oZzDPd7gC8Ei1
 GfIXmRBXFhj5JaeHNGJxL6oCywK+JaqxG8nqD7cEcXTzJiHzjn5kKKSFlr3GmI7w
 47htXv9t07SMgQW0IlBws5yApfeB62dQXmhZc1kMtbonhGdAZCCUg2Nrv34VxrjX
 Dp+LCmD5bg/fBrnAt8f+IIQEd3pElngay+SmSEB9XFUejf3pKw8SvcoDbmE3LD7M
 zGh5bEkptHll2GMInVAt4b4tiC44e7u+0H1Rsi/ttA1cktXZ9hOtFewhXNvfOS2I
 hAMGSngOdpzR5v9Mof5hJAWrr1CLkoh757UoYMsb8u2V9aQw7oVZ5JRMzjZBU6iC
 qXqHm4h5P1bpAeit3DLF
 =fiRI
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore subsystem updates from Kees Cook:
 "This expands the supported compressors, fixes some bugs, and finally
  adds DT bindings"

* tag 'pstore-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore/ram: add Device Tree bindings
  efi-pstore: implement efivars_pstore_exit()
  pstore: drop file opened reference count
  pstore: add lzo/lz4 compression support
  pstore: Cleanup pstore_dump()
  pstore: Enable compression on normal path (again)
  ramoops: Only unregister when registered
2016-07-26 18:48:23 -07:00
Linus Torvalds
6453dbdda3 Power management material for v4.8-rc1
- Rework the cpufreq governor interface to make it more straightforward
    and modify the conservative governor to avoid using transition
    notifications (Rafael Wysocki).
 
  - Rework the handling of frequency tables by the cpufreq core to make
    it more efficient (Viresh Kumar).
 
  - Modify the schedutil governor to reduce the number of wakeups it
    causes to occur in cases when the CPU frequency doesn't need to be
    changed (Steve Muckle, Viresh Kumar).
 
  - Fix some minor issues and clean up code in the cpufreq core and
    governors (Rafael Wysocki, Viresh Kumar).
 
  - Add Intel Broxton support to the intel_pstate driver (Srinivas
    Pandruvada).
 
  - Fix problems related to the config TDP feature and to the validity
    of the MSR_HWP_INTERRUPT register in intel_pstate (Jan Kiszka,
    Srinivas Pandruvada).
 
  - Make intel_pstate update the cpu_frequency tracepoint even if
    the frequency doesn't change to avoid confusing powertop (Rafael
    Wysocki).
 
  - Clean up the usage of __init/__initdata in intel_pstate, mark some
    of its internal variables as __read_mostly and drop an unused
    structure element from it (Jisheng Zhang, Carsten Emde).
 
  - Clean up the usage of some duplicate MSR symbols in intel_pstate
    and turbostat (Srinivas Pandruvada).
 
  - Update/fix the powernv, s3c24xx and mvebu cpufreq drivers (Akshay
    Adiga, Viresh Kumar, Ben Dooks).
 
  - Fix a regression (introduced during the 4.5 cycle) in the
    pcc-cpufreq driver by reverting the problematic commit (Andreas
    Herrmann).
 
  - Add support for Intel Denverton to intel_idle, clean up Broxton
    support in it and make it explicitly non-modular (Jacob Pan,
    Jan Beulich, Paul Gortmaker).
 
  - Add support for Denverton and Ivy Bridge server to the Intel RAPL
    power capping driver and make it more careful about the handing
    of MSRs that may not be present (Jacob Pan, Xiaolong Wang).
 
  - Fix resume from hibernation on x86-64 by making the CPU offline
    during resume avoid using MONITOR/MWAIT in the "play dead" loop
    which may lead to an inadvertent "revival" of a "dead" CPU and
    a page fault leading to a kernel crash from it (Rafael Wysocki).
 
  - Make memory management during resume from hibernation more
    straightforward (Rafael Wysocki).
 
  - Add debug features that should help to detect problems related
    to hibernation and resume from it (Rafael Wysocki, Chen Yu).
 
  - Clean up hibernation core somewhat (Rafael Wysocki).
 
  - Prevent KASAN from instrumenting the hibernation core which leads
    to large numbers of false-positives from it (James Morse).
 
  - Prevent PM (hibernate and suspend) notifiers from being called
    during the cleanup phase if they have not been called during the
    corresponding preparation phase which is possible if one of the
    other notifiers returns an error at that time (Lianwei Wang).
 
  - Improve suspend-related debug printout in the tasks freezer and
    clean up suspend-related console handling (Roger Lu, Borislav
    Petkov).
 
  - Update the AnalyzeSuspend script in the kernel sources to
    version 4.2 (Todd Brandt).
 
  - Modify the generic power domains framework to make it handle
    system suspend/resume better (Ulf Hansson).
 
  - Make the runtime PM framework avoid resuming devices synchronously
    when user space changes the runtime PM settings for them and
    improve its error reporting (Rafael Wysocki, Linus Walleij).
 
  - Fix error paths in devfreq drivers (exynos, exynos-ppmu, exynos-bus)
    and in the core, make some devfreq code explicitly non-modular and
    change some of it into tristate (Bartlomiej Zolnierkiewicz,
    Peter Chen, Paul Gortmaker).
 
  - Add DT support to the generic PM clocks management code and make
    it export some more symbols (Jon Hunter, Paul Gortmaker).
 
  - Make the PCI PM core code slightly more robust against possible
    driver errors (Andy Shevchenko).
 
  - Make it possible to change DESTDIR and PREFIX in turbostat
    (Andy Shevchenko).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJXl7/dAAoJEILEb/54YlRx+VgQAIQJOWvxKew3Yl02c/sdj9OT
 5VNnFrzGzdcAPofvvG9qGq8B0Es1vYehJpwwOB21ri8EvYv0riIiU1yrqslObojQ
 oaZOkSBpbIoKjGR4CpYA/A+feE+8EqIBdPGd+lx5a6oRdUi7tRVHBG9lyLO3FB/i
 jan1q8dMpZsmu+Y+rVVHGnCVuIlIEqr2ZnZfCwDAulO2Arp/QFAh4kH08ELATvrl
 bkPa25vq7/VMP/vCDzrfZKD5mUuKogIRu/J5wx4py1nE+FB35cKKyqBOgklLwAeY
 UI8vjDhr/myNUs54AZlktOkq47TCYvjvhX9kmOxBjuWqFbRusU012IRek1fYPRIV
 ZqbkqNX7UEVQwunAEg9AyFwyzEtOht93dQDT5RLEd4QzKuM76gmHpLeTGGMzE+nu
 FnmF9JGl4DVwqpZl9yU2+hR2Mt3bP8OF8qYmNiGUB3KO4emPslhSd+6y8liA5Bx2
 SJf0Gb//vaHCh3/uMnwAonYPqRkZvBLOMwuL1VUjNQfRMnQtDdgHMYB1aT/EglPA
 8ww6j4J8rVRLAxvYQ3UEmNA/vBNclKXblRR18+JddEZP9/oX0ATfwnCCUpr839uk
 xxyQhrm4/AI60+PHWCX4GG80YrKdOGTkF7LXCQZanVWjjuyF17rufegZ2YWLT07v
 JU1Cmumfdy2jJluT8xsR
 =uVGz
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael  Wysocki:
 "Again, the majority of changes go into the cpufreq subsystem, but
  there are no big features this time.  The cpufreq changes that stand
  out somewhat are the governor interface rework and improvements
  related to the handling of frequency tables.  Apart from those, there
  are fixes and new device/CPU IDs in drivers, cleanups and an
  improvement of the new schedutil governor.

  Next, there are some changes in the hibernation core, including a fix
  for a nasty problem related to the MONITOR/MWAIT usage by CPU offline
  during resume from hibernation, a few core improvements related to
  memory management during resume, a couple of additional debug features
  and cleanups.

  Finally, we have some fixes and cleanups in the devfreq subsystem,
  generic power domains framework improvements related to system
  suspend/resume, support for some new chips in intel_idle and in the
  power capping RAPL driver, a new version of the AnalyzeSuspend utility
  and some assorted fixes and cleanups.

  Specifics:

   - Rework the cpufreq governor interface to make it more
     straightforward and modify the conservative governor to avoid using
     transition notifications (Rafael Wysocki).

   - Rework the handling of frequency tables by the cpufreq core to make
     it more efficient (Viresh Kumar).

   - Modify the schedutil governor to reduce the number of wakeups it
     causes to occur in cases when the CPU frequency doesn't need to be
     changed (Steve Muckle, Viresh Kumar).

   - Fix some minor issues and clean up code in the cpufreq core and
     governors (Rafael Wysocki, Viresh Kumar).

   - Add Intel Broxton support to the intel_pstate driver (Srinivas
     Pandruvada).

   - Fix problems related to the config TDP feature and to the validity
     of the MSR_HWP_INTERRUPT register in intel_pstate (Jan Kiszka,
     Srinivas Pandruvada).

   - Make intel_pstate update the cpu_frequency tracepoint even if the
     frequency doesn't change to avoid confusing powertop (Rafael
     Wysocki).

   - Clean up the usage of __init/__initdata in intel_pstate, mark some
     of its internal variables as __read_mostly and drop an unused
     structure element from it (Jisheng Zhang, Carsten Emde).

   - Clean up the usage of some duplicate MSR symbols in intel_pstate
     and turbostat (Srinivas Pandruvada).

   - Update/fix the powernv, s3c24xx and mvebu cpufreq drivers (Akshay
     Adiga, Viresh Kumar, Ben Dooks).

   - Fix a regression (introduced during the 4.5 cycle) in the
     pcc-cpufreq driver by reverting the problematic commit (Andreas
     Herrmann).

   - Add support for Intel Denverton to intel_idle, clean up Broxton
     support in it and make it explicitly non-modular (Jacob Pan, Jan
     Beulich, Paul Gortmaker).

   - Add support for Denverton and Ivy Bridge server to the Intel RAPL
     power capping driver and make it more careful about the handing of
     MSRs that may not be present (Jacob Pan, Xiaolong Wang).

   - Fix resume from hibernation on x86-64 by making the CPU offline
     during resume avoid using MONITOR/MWAIT in the "play dead" loop
     which may lead to an inadvertent "revival" of a "dead" CPU and a
     page fault leading to a kernel crash from it (Rafael Wysocki).

   - Make memory management during resume from hibernation more
     straightforward (Rafael Wysocki).

   - Add debug features that should help to detect problems related to
     hibernation and resume from it (Rafael Wysocki, Chen Yu).

   - Clean up hibernation core somewhat (Rafael Wysocki).

   - Prevent KASAN from instrumenting the hibernation core which leads
     to large numbers of false-positives from it (James Morse).

   - Prevent PM (hibernate and suspend) notifiers from being called
     during the cleanup phase if they have not been called during the
     corresponding preparation phase which is possible if one of the
     other notifiers returns an error at that time (Lianwei Wang).

   - Improve suspend-related debug printout in the tasks freezer and
     clean up suspend-related console handling (Roger Lu, Borislav
     Petkov).

   - Update the AnalyzeSuspend script in the kernel sources to version
     4.2 (Todd Brandt).

   - Modify the generic power domains framework to make it handle system
     suspend/resume better (Ulf Hansson).

   - Make the runtime PM framework avoid resuming devices synchronously
     when user space changes the runtime PM settings for them and
     improve its error reporting (Rafael Wysocki, Linus Walleij).

   - Fix error paths in devfreq drivers (exynos, exynos-ppmu,
     exynos-bus) and in the core, make some devfreq code explicitly
     non-modular and change some of it into tristate (Bartlomiej
     Zolnierkiewicz, Peter Chen, Paul Gortmaker).

   - Add DT support to the generic PM clocks management code and make it
     export some more symbols (Jon Hunter, Paul Gortmaker).

   - Make the PCI PM core code slightly more robust against possible
     driver errors (Andy Shevchenko).

   - Make it possible to change DESTDIR and PREFIX in turbostat (Andy
     Shevchenko)"

* tag 'pm-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits)
  Revert "cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency"
  PM / hibernate: Introduce test_resume mode for hibernation
  cpufreq: export cpufreq_driver_resolve_freq()
  cpufreq: Disallow ->resolve_freq() for drivers providing ->target_index()
  PCI / PM: check all fields in pci_set_platform_pm()
  cpufreq: acpi-cpufreq: use cached frequency mapping when possible
  cpufreq: schedutil: map raw required frequency to driver frequency
  cpufreq: add cpufreq_driver_resolve_freq()
  cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT
  intel_pstate: Update cpu_frequency tracepoint every time
  cpufreq: intel_pstate: clean remnant struct element
  PM / tools: scripts: AnalyzeSuspend v4.2
  x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
  cpufreq: powernv: Replacing pstate_id with frequency table index
  intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate()
  PM / hibernate: Image data protection during restoration
  PM / hibernate: Add missing braces in __register_nosave_region()
  PM / hibernate: Clean up comments in snapshot.c
  PM / hibernate: Clean up function headers in snapshot.c
  PM / hibernate: Add missing braces in hibernate_setup()
  ...
2016-07-26 17:29:07 -07:00
Kirill A. Shutemov
dcddffd41d mm: do not pass mm_struct into handle_mm_fault
We always have vma->vm_mm around.

Link: http://lkml.kernel.org/r/1466021202-61880-8-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Aneesh Kumar K.V
9af3f56ba1 powerpc/mm: check for irq disabled() only if DEBUG_VM is enabled
We don't need to check this always.  The idea here is to capture the
wrong usage of find_linux_pte_or_hugepte and we can do that by
occasionally running with DEBUG_VM enabled.

Link: http://lkml.kernel.org/r/1464692688-6612-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Linus Torvalds
3fc9d69093 Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
 "This branch also contains core changes.  I've come to the conclusion
  that from 4.9 and forward, I'll be doing just a single branch.  We
  often have dependencies between core and drivers, and it's hard to
  always split them up appropriately without pulling core into drivers
  when that happens.

  That said, this contains:

   - separate secure erase type for the core block layer, from
     Christoph.

   - set of discard fixes, from Christoph.

   - bio shrinking fixes from Christoph, as a followup up to the
     op/flags change in the core branch.

   - map and append request fixes from Christoph.

   - NVMeF (NVMe over Fabrics) code from Christoph.  This is pretty
     exciting!

   - nvme-loop fixes from Arnd.

   - removal of ->driverfs_dev from Dan, after providing a
     device_add_disk() helper.

   - bcache fixes from Bhaktipriya and Yijing.

   - cdrom subchannel read fix from Vchannaiah.

   - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.

   - set of drbd updates and fixes from Fabian, Lars, and Philipp.

   - mg_disk error path fix from Bart.

   - user notification for failed device add for loop, from Minfei.

   - NVMe in general:
        + NVMe delay quirk from Guilherme.
        + SR-IOV support and command retry limits from Keith.
        + fix for memory-less NUMA node from Masayoshi.
        + use UINT_MAX for discard sectors, from Minfei.
        + cancel IO fixes from Ming.
        + don't allocate unused major, from Neil.
        + error code fixup from Dan.
        + use constants for PSDT/FUSE from James.
        + variable init fix from Jay.
        + fabrics fixes from Ming, Sagi, and Wei.
        + various fixes"

* 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
  nvme/pci: Provide SR-IOV support
  nvme: initialize variable before logical OR'ing it
  block: unexport various bio mapping helpers
  scsi/osd: open code blk_make_request
  target: stop using blk_make_request
  block: simplify and export blk_rq_append_bio
  block: ensure bios return from blk_get_request are properly initialized
  virtio_blk: use blk_rq_map_kern
  memstick: don't allow REQ_TYPE_BLOCK_PC requests
  block: shrink bio size again
  block: simplify and cleanup bvec pool handling
  block: get rid of bio_rw and READA
  block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
  block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
  NVMe: don't allocate unused nvme_major
  nvme: avoid crashes when node 0 is memoryless node.
  nvme: Limit command retries
  loop: Make user notify for adding loop device failed
  nvme-loop: fix nvme-loop Kconfig dependencies
  nvmet: fix return value check in nvmet_subsys_alloc()
  ...
2016-07-26 15:37:51 -07:00
Kees Cook
1d3c132474 powerpc/uaccess: Enable hardened usercopy
Enables CONFIG_HARDENED_USERCOPY checks on powerpc.

Based on code from PaX and grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-26 14:41:51 -07:00
Linus Torvalds
bbce2ad2d7 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "Here is the crypto update for 4.8:

  API:
   - first part of skcipher low-level conversions
   - add KPP (Key-agreement Protocol Primitives) interface.

  Algorithms:
   - fix IPsec/cryptd reordering issues that affects aesni
   - RSA no longer does explicit leading zero removal
   - add SHA3
   - add DH
   - add ECDH
   - improve DRBG performance by not doing CTR by hand

  Drivers:
   - add x86 AVX2 multibuffer SHA256/512
   - add POWER8 optimised crc32c
   - add xts support to vmx
   - add DH support to qat
   - add RSA support to caam
   - add Layerscape support to caam
   - add SEC1 AEAD support to talitos
   - improve performance by chaining requests in marvell/cesa
   - add support for Araneus Alea I USB RNG
   - add support for Broadcom BCM5301 RNG
   - add support for Amlogic Meson RNG
   - add support Broadcom NSP SoC RNG"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (180 commits)
  crypto: vmx - Fix aes_p8_xts_decrypt build failure
  crypto: vmx - Ignore generated files
  crypto: vmx - Adding support for XTS
  crypto: vmx - Adding asm subroutines for XTS
  crypto: skcipher - add comment for skcipher_alg->base
  crypto: testmgr - Print akcipher algorithm name
  crypto: marvell - Fix wrong flag used for GFP in mv_cesa_dma_add_iv_op
  crypto: nx - off by one bug in nx_of_update_msc()
  crypto: rsa-pkcs1pad - fix rsa-pkcs1pad request struct
  crypto: scatterwalk - Inline start/map/done
  crypto: scatterwalk - Remove unnecessary BUG in scatterwalk_start
  crypto: scatterwalk - Remove unnecessary advance in scatterwalk_pagedone
  crypto: scatterwalk - Fix test in scatterwalk_done
  crypto: api - Optimise away crypto_yield when hard preemption is on
  crypto: scatterwalk - add no-copy support to copychunks
  crypto: scatterwalk - Remove scatterwalk_bytes_sglen
  crypto: omap - Stop using crypto scatterwalk_bytes_sglen
  crypto: skcipher - Remove top-level givcipher interface
  crypto: user - Remove crypto_lookup_skcipher call
  crypto: cts - Convert to skcipher
  ...
2016-07-26 13:40:17 -07:00
Anton Blanchard
dd57023747 powerpc: Improve comment explaining why we modify VRSAVE
The comment explaining why we modify VRSAVE is misleading, glibc
does rely on the behaviour. Update the comment.

Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-26 14:16:19 +10:00
Michael Ellerman
1a1cee843c powerpc/mm: Drop unused externs for hpte_init_beat[_v3]()
We removed the BEAT support in 2015 in commit bf4981a006 ("powerpc:
Remove the celleb support"). These externs are unused since then.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-26 14:16:18 +10:00
Michael Ellerman
6364e84e85 powerpc/mm: Rename hpte_init_lpar() and move the fallback to a header
hpte_init_lpar() is part of the pseries platform, so name it as such.

Move the fallback implementation for when PSERIES=n into the header,
dropping the weak implementation. The panic() is now handled by the
calling code.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-26 14:16:18 +10:00
Michael Ellerman
7353644fa9 powerpc/mm: Fix build break when PPC_NATIVE=n
The recent commit to rework the hash MMU setup broke the build when
CONFIG_PPC_NATIVE=n. Fix it by adding an IS_ENABLED() check before
calling hpte_init_native().

Removing the else clause opens the possibility that we don't set any
ops, which would probably lead to a strange crash later. So add a check
that we correctly initialised at least one member of the struct.

Fixes: 166dd7d3fb ("powerpc/64: Move MMU backend selection out of platform code")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-26 14:16:08 +10:00
Kees Cook
74e630a758 Linux 4.7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXlRXSAAoJEHm+PkMAQRiGG/gH/0Z8O4zWOsrwO+X1mRToRDBH
 joFOjAmCVe83T1VpF5LYNB+9+owL/dEDt6+ZIswnhH7AfQPjs4RqwS4PcuMbCDVO
 +mDm0PmfcKaYcQZrB2Z2OwIzRNnfCTVcsDPhIHwuIHk0m4z/xuGZonD8KoAj0+tO
 3yJF6sbE1KubDVjOb+lmZZSP3cXA0pDXrNhkYhE4Tsr8fiihGjeXSNJ8t2zPLjxo
 W3MPqo0rzDvQsOwoF4TWHHagVaFSJlhLBBgqu33fI7uO3jtfQD2G8wG68JCND1j3
 qbMoBfTLFV/yQmSIJUt0Wv1axaCcwnjpweEB35A/GEeZ0mNB1rDdoBeI1eKEQkc=
 =DGFC
 -----END PGP SIGNATURE-----

Merge tag 'v4.7' into for-linus/pstore

Linux 4.7
2016-07-25 13:50:36 -07:00
Linus Torvalds
c86ad14d30 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The locking tree was busier in this cycle than the usual pattern - a
  couple of major projects happened to coincide.

  The main changes are:

   - implement the atomic_fetch_{add,sub,and,or,xor}() API natively
     across all SMP architectures (Peter Zijlstra)

   - add atomic_fetch_{inc/dec}() as well, using the generic primitives
     (Davidlohr Bueso)

   - optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
     Waiman Long)

   - optimize smp_cond_load_acquire() on arm64 and implement LSE based
     atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
     on arm64 (Will Deacon)

   - introduce smp_acquire__after_ctrl_dep() and fix various barrier
     mis-uses and bugs (Peter Zijlstra)

   - after discovering ancient spin_unlock_wait() barrier bugs in its
     implementation and usage, strengthen its semantics and update/fix
     usage sites (Peter Zijlstra)

   - optimize mutex_trylock() fastpath (Peter Zijlstra)

   - ... misc fixes and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
  locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
  locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
  locking/static_keys: Fix non static symbol Sparse warning
  locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
  locking/atomic, arch/tile: Fix tilepro build
  locking/atomic, arch/m68k: Remove comment
  locking/atomic, arch/arc: Fix build
  locking/Documentation: Clarify limited control-dependency scope
  locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
  locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
  locking/atomic, arch/mips: Convert to _relaxed atomics
  locking/atomic, arch/alpha: Convert to _relaxed atomics
  locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
  locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
  locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
  locking/atomic: Fix atomic64_relaxed() bits
  locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
  locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
  locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
  locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
  ...
2016-07-25 12:41:29 -07:00
Rafael J. Wysocki
9def970ead Merge branch 'pm-cpufreq'
* pm-cpufreq: (41 commits)
  Revert "cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latency"
  cpufreq: export cpufreq_driver_resolve_freq()
  cpufreq: Disallow ->resolve_freq() for drivers providing ->target_index()
  cpufreq: acpi-cpufreq: use cached frequency mapping when possible
  cpufreq: schedutil: map raw required frequency to driver frequency
  cpufreq: add cpufreq_driver_resolve_freq()
  cpufreq: intel_pstate: Check cpuid for MSR_HWP_INTERRUPT
  intel_pstate: Update cpu_frequency tracepoint every time
  cpufreq: intel_pstate: clean remnant struct element
  cpufreq: powernv: Replacing pstate_id with frequency table index
  intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate()
  cpufreq: Reuse new freq-table helpers
  cpufreq: Handle sorted frequency tables more efficiently
  cpufreq: Drop redundant check from cpufreq_update_current_freq()
  intel_pstate: Declare pid_params/pstate_funcs/hwp_active __read_mostly
  intel_pstate: add __init/__initdata marker to some functions/variables
  intel_pstate: Fix incorrect placement of __initdata
  cpufreq: mvebu: fix integer to pointer cast
  cpufreq: intel_pstate: Broxton support
  cpufreq: conservative: Do not use transition notifications
  ...
2016-07-25 13:46:08 +02:00
Dan Williams
0606263f24 Merge branch 'for-4.8/libnvdimm' into libnvdimm-for-next 2016-07-24 08:05:44 -07:00
Sebastian Andrzej Siewior
bdab88e006 powerpc/numa: Convert to hotplug state machine
Install the callbacks via the state machine. On the boot cpu the callback is
invoked manually because cpuhp is not up yet and everything must be
preinitialized before additional CPUs are up.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christophe Jaillet <christophe.jaillet@wanadoo.fr>
Cc: Anton Blanchard <anton@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160718140727.GA13132@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-07-22 21:53:17 +02:00
Stefan Agner
5edc2aae16 powerpc/dts: fix STMicroelectronics compatible strings
Replace the non-standard vendor prefix stm and st-micro with st for
STMicroelectronics. The drivers do not specify the vendor prefixes
since the I2C Core strips them away from the DT provided compatible
string. Therefore, changing existing device trees does not have any
impact on device detection.

Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Rob Herring <robh@kernel.org>
2016-07-22 14:53:05 -05:00
Alastair D'Silva
4a1202765d powerpc: Add module autoloading based on CPU features
This patch provides the necessary infrastructure to allow drivers
to be automatically loaded via udev. It implements the minimum
required to be able to use module_cpu_feature_match() to trigger
the GENERIC_CPU_AUTOPROBE mechanisms.

The features exposed are a mirror of the cpu_user_features
(converted to an offset from a mask). This decision was made to
ensure that the behavior between features for module loading and
userspace are consistent.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
[mpe: Only define the bits we currently need]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 20:33:57 +10:00
Alexey Kardashevskiy
802a345183 powerpc/powernv/ioda: Fix endianness when reading TCEs
The iommu_table_ops::exchange() callback writes new TCE to the table and
returns old value and permission mask. The old TCE value is correctly
converted from BE to CPU endian; however permission mask was calculated
from BE value and therefore always returned DMA_NONE which could cause
memory leak on LE systems using VFIO SPAPR TCE IOMMU v1 driver.

This fixes pnv_tce_xchg() to have @oldtce a CPU endian.

Fixes: 05c6cfb9dc ("powerpc/iommu/powernv: Release replaced TCE")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 20:21:06 +10:00
Sukadev Bhattiprolu
0eab46be21 powerpc/mm: Add memory barrier in __hugepte_alloc()
__hugepte_alloc() uses kmem_cache_zalloc() to allocate a zeroed PTE
and proceeds to use the newly allocated PTE. Add a memory barrier to
make sure that the other CPUs see a properly initialized PTE.

Based on a fix suggested by James Dykman.

Reported-by: James Dykman <jdykman@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Tested-by: James Dykman <jdykman@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 20:13:28 +10:00
Michael Ellerman
31278b17a0 powerpc/modules: Never restore r2 for a mprofile-kernel style mcount() call
In the module loader we process relocations, and for long jumps we
generate trampolines (aka stubs). At the call site for one of these
trampolines we usually need to generate a load instruction to restore
the TOC pointer into r2.

There is one exception however, which is calls to mcount() using the
mprofile-kernel ABI, they handle the TOC inside the stub, and so for
them we do not generate a TOC load.

The bug is in how the code in restore_r2() decides if it needs to
generate the TOC load. It does so by looking for a nop following the
branch, and if it sees a nop, it replaces it with the load. In general
the compiler has no reason to generate a nop following the mcount()
call and so that check works OK.

However if we combine a jump label at the start of a function, with an
early return, such that GCC applies the shrink-wrapping optimisation, we
can then end up with an mcount call followed immediately by a nop.
However the nop is not there for a TOC load, it is for the jump label.

That confuses restore_r2() into replacing the jump label nop with a TOC
load, which in turn confuses ftrace into replacing the mcount call with
a b +8 (fixed in the previous commit). The end result is we jump over
the jump label, which if it was supposed to return means we incorrectly
run the body of the function.

We have seen this in practice with some yet-to-be-merged patches that
use jump labels more extensively.

The fix is relatively simple, in restore_r2() we check for an
mprofile-kernel style mcount() call first, before looking for the
presence of a nop.

Fixes: 153086644f ("powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 20:10:42 +10:00
Michael Ellerman
9d63610951 powerpc/ftrace: Separate the heuristics for checking call sites
In __ftrace_make_nop() (the 64-bit version), we have code to deal with
two ftrace ABIs. There is the original ABI, which looks mostly like a
function call, and then the mprofile-kernel ABI which is just a branch.

The code tries to handle both cases, by looking for the presence of a
load to restore the TOC pointer (PPC_INST_LD_TOC). If we detect the TOC
load, we assume the call site is for an mcount() call using the old ABI.
That means we patch the mcount() call with a b +8, to branch over the
TOC load.

However if the kernel was built with mprofile-kernel, then there will
never be a call site using the original ftrace ABI. If for some reason
we do see a TOC load, then it's there for a good reason, and we should
not jump over it.

So split the code, using the existing CC_USING_MPROFILE_KERNEL. Kernels
built with mprofile-kernel will only look for, and expect, the new ABI,
and similarly for the original ABI.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 20:10:37 +10:00
Benjamin Herrenschmidt
b1923caa6e powerpc: Merge 32-bit and 64-bit setup_arch()
There is little enough differences now.

mpe: Add a/p/k/setup.h to contain the prototypes and empty versions of
functions we need, rather than using weak functions. Add a few other
empty versions to avoid as many #ifdefs as possible in the code.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 19:17:46 +10:00
Benjamin Herrenschmidt
009776baa1 powerpc/64: Make a few boot functions __init
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 19:17:25 +10:00
Benjamin Herrenschmidt
f7b9ebb79e powerpc: Re-order setup_panic()
Do it right after probe_machine() since it's about testing ppc_md,
and put the test in the common code.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 19:17:23 +10:00
Benjamin Herrenschmidt
e39afba3aa powerpc: Re-order the call to smp_setup_cpu_maps()
It makes more sense to do it before intializing xmon() as xmon might
use the info in there. We do want to register the console early
though in case we want some functioning printk's in the cpu map setup.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 19:14:32 +10:00
Benjamin Herrenschmidt
8f212cb26f powerpc/32: Move cache info inits to a separate function
Matches 64-bit. Also move the call to the same spot as ppc64

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 19:14:32 +10:00
Benjamin Herrenschmidt
fa745a129c powerpc/64: Move the content of setup_system() to setup_arch()
And kill setup_system().

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 19:14:29 +10:00
Benjamin Herrenschmidt
9df549afea powerpc/64: Move setting of {i,d}cache_bsize to initialize_cache_info()
Also remove the completely osbolete comment. We *do* look in the
device-tree.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 19:08:06 +10:00
Benjamin Herrenschmidt
bf1b61fb57 powerpc/64: Move the boot time info banner to a separate function
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 19:08:05 +10:00
Benjamin Herrenschmidt
f2d576948d powerpc: Get rid of ppc_md.init_early()
It is now called right after platform probe, so the probe function
can just do the job.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 19:07:26 +10:00
Benjamin Herrenschmidt
5657138404 powerpc: Move 32-bit probe() machine to later in the boot process
This converts all the 32-bit platforms to use the expanded device-tree
which is a pretty mechanical change. Unlike 64-bit, the 32-bit kernel
didn't rely on platform initializations to setup the MMU since it
sets it up entirely before probe_machine() so the move has comparatively
less consequences though it's a bigger patch.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 19:06:42 +10:00
Benjamin Herrenschmidt
406b0b6ae3 powerpc/64: Move 64-bit probe_machine() to later in the boot process
We no long need the machine type that early, so we can move probe_machine()
to after the device-tree has been expanded. This will allow further
consolidation.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:59:22 +10:00
Benjamin Herrenschmidt
84b62c72fa powerpc: Ensure that ppc_md is empty before probing for machine type
Anything in there will be overwritten, so it helps catching nasty
bugs if we check that it's indeed full of NULL's before we do so.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:59:21 +10:00
Benjamin Herrenschmidt
7025776ed1 powerpc/mm: Move hash table ops to a separate structure
Moving probe_machine() to after mmu init will cause the ppc_md
fields relative to the hash table management to be overwritten.

Since we have essentially disconnected the machine type from
the hash backend ops, finish the job by moving them to a different
structure.

The only callback that didn't quite fix is update_partition_table
since this is not specific to hash, so I moved it to a standalone
variable for now. We can revisit later if needed.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Fix ppc64e build failure in kexec]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:59:09 +10:00
Benjamin Herrenschmidt
b521f576df powerpc/pmac: Remove spurrious machine type test
pmac_declare_of_platform_devices() is already a machine initcall, thus
it won't be called on a non-powermac machine. Testing for chrp there
is pointless.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:58:26 +10:00
Benjamin Herrenschmidt
2b4e3ad8f5 powerpc/mm/hash64: Don't test for machine type to detect HEA special case
Instead, check for FW_FEATURE_SPLPAR. This should be roughtly equivalent
as all pseries machiens that can have an HEA also support SPLPAR and
no other machine type does.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:58:25 +10:00
Benjamin Herrenschmidt
5556ecf5e9 powerpc/mm/hash: Don't use machine_is() early during boot
Use the device-tree instead as we'll be moving probe_machine()
out of early_setup

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:58:21 +10:00
Benjamin Herrenschmidt
388dc1c3f0 powerpc/pasemi: Remove IOBMAP allocation from platform probe()
These days, memblocks is available later, so we can just allocate it
as part of iob_init.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:56:38 +10:00
Benjamin Herrenschmidt
166dd7d3fb powerpc/64: Move MMU backend selection out of platform code
We move it into early_mmu_init() based on firmware features. For PS3,
we have to move the setting of these into early_init_devtree().

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:56:38 +10:00
Benjamin Herrenschmidt
91b6fad5cf powerpc/pmac: Remove early allocation of the SMU command buffer
The SMU command buffer needs to be allocated below 2G using memblock.

In the past, this had to be done very early from the arch code as
memblock wasn't available past that point. That is no longer the
case though, smu_init() is called from setup_arch() when memblock
is still functional these days. So move the allocation to the
SMU driver itself.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:56:38 +10:00
Benjamin Herrenschmidt
d3cbff1b5a powerpc: Put exception configuration in a common place
The various calls to establish exception endianness and AIL are
now done from a single point using already established CPU and FW
feature bits to decide what to do.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:56:31 +10:00
Benjamin Herrenschmidt
3808a88985 powerpc: Move FW feature probing out of pseries probe()
We move the function itself to pseries/firmware.c and call it along
with almost all other flat device-tree parsers from early_init_devtree()

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Move #ifdefs into the header by providing pseries_probe_fw_features()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:56:13 +10:00
Benjamin Herrenschmidt
c40785ad30 powerpc/dart: Use a cachable DART
Instead of punching a hole in the linear mapping, just use normal
cachable memory, and apply the flush sequence documented in the
CPC625 (aka U3) user manual.

This allows us to remove quite a bit of code related to the early
allocation of the DART and the hole in the linear mapping. We can
also get rid of the copy of the DART for suspend/resume as the
original memory can just be saved/restored now, as long as we
properly sync the caches.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Integrate dart_init() fix to return ENODEV when DART disabled]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:55:54 +10:00
Benjamin Herrenschmidt
de4cf3de59 powerpc: Move 64-bit memory reserves to setup_arch()
There is really no need to do them that early, early_setup() runs
before MMU is on, we should do the strict minimum there to get the
MMU going.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:54:55 +10:00
Benjamin Herrenschmidt
c4bd6cb87c powerpc: Move 64-bit feature fixup earlier
Make it part of early_setup() as we really want the feature fixups
to be applied before we turn on the MMU since they can have an impact
on the various assembly path related to MMU management and interrupts.

This makes 64-bit match what 32-bit does.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:54:55 +10:00
Benjamin Herrenschmidt
9402c68461 powerpc: Factor do_feature_fixup calls
32 and 64-bit do a similar set of calls early on, we move it all to
a single common function to make the boot code more readable.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:51:42 +10:00
Kevin Hao
27d1149667 powerpc/32: Remove RELOCATABLE_PPC32
It is seldom used in the kernel code and can be easily replaced by
either RELOCATABLE or PPC32. So there is no reason to keep a separate
kernel option for this.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-19 20:17:07 +10:00
Kevin Hao
4c91bd6eea powerpc: Merge the RELOCATABLE config entries for ppc32 and ppc64
It makes no sense to keep two separate RELOCATABLE config entries for
ppc32 and ppc64 respectively. Merge them into one and move it to a
common place.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-19 20:16:59 +10:00
Kevin Hao
da42307146 powerpc/32/booke: Fix the build error when CRASH_DUMP is enabled
In the current code, the RELOCATABLE will be forcedly enabled when
enabling CRASH_DUMP. But for ppc32, the RELOCABLE also depend on
ADVANCED_OPTIONS and select NONSTATIC_KERNEL. This will cause the
following build error when CRASH_DUMP=y && ADVANCED_OPTIONS=n because
the select of NONSTATIC_KERNEL doesn't take effect.

  arch/powerpc/include/asm/io.h: In function 'virt_to_phys':
  arch/powerpc/include/asm/page.h:113:26: error: 'virt_phys_offset' undeclared (first use in this function)
   #define VIRT_PHYS_OFFSET virt_phys_offset
                          ^
It doesn't have any strong reasons to make the RELOCATABLE depend on
ADVANCED_OPTIONS. So remove this dependency to fix this issue.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-19 20:16:36 +10:00
John Allen
1dc7595666 powerpc/pseries: Use kernel hotplug queue for PowerVM hotplug events
The sysfs interface used to handle PowerVM hotplug events should use the
hotplug queue as well. PRRN events will soon be placing many hotplug
events on the queue at once and we will need ordinary hotplug events to
use the queue as well in order to ensure these events will still be handled
and that proper serialization is maintained during the PRRN event.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-19 20:12:31 +10:00
John Allen
b7d9eb397b powerpc/pseries: Add support for hotplug interrupt source
Add handler for new hotplug interrupt. For memory and CPU hotplug events,
we will add the hotplug errorlog to the hotplug workqueue. Since PCI
hotplug is not currently supported in the kernel, PCI hotplug events are
written to the rtas_log_bug and are handled by rtas_errd.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-19 20:12:30 +10:00
John Allen
9054619ef5 powerpc/pseries: Add pseries hotplug workqueue
In support of PAPR changes to add a new hotplug interrupt, introduce a
hotplug workqueue to avoid processing hotplug events in interrupt context.
We will also take advantage of the queue on PowerVM to ensure hotplug
events initiated from different sources (HMC and PRRN events) are handled
and serialized properly.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-19 20:12:30 +10:00
Ian Munsie
c2ca9f6b4c powerpc/powernv: Fix pci-cxl.c build when CONFIG_MODULES=n
pnv_cxl_enable_phb_kernel_api() grabs a reference to the cxl module to
prevent it from being unloaded after the PHB has been switched to CX4
mode. This breaks the build when CONFIG_MODULES=n as module_mutex
doesn't exist.

However, if we don't have modules, we don't need to protect against the
case of the cxl module being unloaded. As such, split the relevant code
out into a function surrounded with #if IS_MODULE(CXL) so we don't try
to compile it if cxl isn't being compiled as a module.

Fixes: 5918dbc9b4ec ("powerpc/powernv: Add support for the cxl kernel api on the real phb")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-19 20:12:28 +10:00
Aneesh Kumar K.V
a4b349540a powerpc/mm: Cleanup LPCR defines
This makes it easy to verify we are not overloading the bits.
No functionality change by this patch.

mpe: Cleanup more. Completely fixup whitespace, convert all UL values to
ASM_CONST(), and replace all occurrences of 63-x with the actual shift.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-19 20:12:28 +10:00
Arnd Bergmann
58ab5e0c2c Kbuild: arch: look for generated headers in obtree
There are very few files that need add an -I$(obj) gcc for the preprocessor
or the assembler. For C files, we add always these for both the objtree and
srctree, but for the other ones we require the Makefile to add them, and
Kbuild then adds it for both trees.

As a preparation for changing the meaning of the -I$(obj) directive to
only refer to the srctree, this changes the two instances in arch/x86 to use
an explictit $(objtree) prefix where needed, otherwise we won't find the
headers any more, as reported by the kbuild 0day builder.

arch/x86/realmode/rm/realmode.lds.S:75:20: fatal error: pasyms.h: No such file or directory

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michal Marek <mmarek@suse.com>
2016-07-18 21:31:35 +02:00
Aneesh Kumar K.V
b275bfb269 powerpc/mm/radix: Add a kernel command line to disable radix
This patch adds the kernel command line disable_radix which disable
the radix MMU mode even if firmware indicates radix support via
ibm,pa-features device tree node.

This helps in testing different MMU mode easily.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:55 +10:00
Aneesh Kumar K.V
912cc87a65 powerpc/mm/radix: Add LPID based tlb flush helpers
We add a tlb flush variant, to flush LPID mappings.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:55 +10:00
Aneesh Kumar K.V
83209bc861 powerpc/mm/radix: Update machine call back to support new HCALL.
This update the machine dep callback such that we can use the same
callback to register process table. The interface is updated such that
we can easily call H_REGISTER_PROC_TBL hcall. The HCALL itself is
introduced in a later patch.

No functionality change introduced by this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:54 +10:00
Aneesh Kumar K.V
09cf5bcb0c powerpc/mm/radix: Update PID switch sequence
Update the PID switch as per ISA doc. slbia is needed in radix to
invalidate any implementation specific lookaside information.
We use the .long format due to build errors with the below compiler
version.

gcc (Ubuntu 5.3.1-14ubuntu2.1) 5.3.1 20160413
GNU assembler (GNU Binutils for Ubuntu) 2.26

CC      arch/powerpc/mm//mmu_context_book3s64.o
{standard input}: Assembler messages:
{standard input}:506: Error: junk at end of line: `0x7'
scripts/Makefile.build:291: recipe for target 'arch/powerpc/mm//mmu_context_book3s64.o' failed
make[1]: *** [arch/powerpc/mm//mmu_context_book3s64.o] Error 1
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:53 +10:00
Aneesh Kumar K.V
4b7a350480 powerpc/mm/hash: Update SDR1 size encoding as documented in ISA 3.0
ISA 3.0 document hash table size in bytes = 2^(HTABSIZE + 18)

No functionality change by this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:53 +10:00
Aneesh Kumar K.V
56547411a0 powerpc/mm: Print formation regarding the the MMU mode
This helps in easily identifying the MMU mode with which the kernel
is operating.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:52 +10:00
Aneesh Kumar K.V
accfad7d0a powerpc/mm: Clear top 16 bits of va only on older cpus
As per ISA, we need to do this only for architecture version 2.02 and
earlier. This continued to work even for 2.07. But let's not do this for
anything after 2.02. ISA 3.0 requires these top bits to be not cleared.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:52 +10:00
Aneesh Kumar K.V
e21fc93b70 powerpc/mm: Compile out radix related functions if RADIX_MMU is disabled
Currently we depend on mmu_has_feature to evalute to zero based on
MMU_FTRS_POSSIBLE mask. In a later patch, we want to update
radix_enabled() to runtime update the conditional operation to a jump
instruction. This implies we cannot depend on MMU_FTRS_POSSIBLE mask.
Instead define radix_enabled to return 0 if RADIX_MMU is not enabled.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:51 +10:00
Aneesh Kumar K.V
66c570f545 powerpc/mm: use _raw variant of page table accessors
This switch few of the page table accessor to use the __raw variant
and does the cpu to big endian conversion of constants. This helps in
generating better code.

For ex: a pgd_none(pgd) check with and without fix is listed below

Without fix:
------------
   2240:	20 00 61 eb 	ld      r27,32(r1)
/* PGD level */
typedef struct { __be64 pgd; } pgd_t;
static inline unsigned long pgd_val(pgd_t x)
{
	return be64_to_cpu(x.pgd);

    2244:	22 00 66 78 	rldicl  r6,r3,32,32
    2248:	3e 40 7d 54 	rotlwi  r29,r3,8
    224c:	0e c0 7d 50 	rlwimi  r29,r3,24,0,7
    2250:	3e 40 c5 54 	rotlwi  r5,r6,8
    2254:	2e c4 7d 50 	rlwimi  r29,r3,24,16,23
    2258:	0e c0 c5 50 	rlwimi  r5,r6,24,0,7
    225c:	2e c4 c5 50 	rlwimi  r5,r6,24,16,23
    2260:	c6 07 bd 7b 	rldicr  r29,r29,32,31
    2264:	78 2b bd 7f 	or      r29,r29,r5
		if (pgd_none(pgd))
    2268:	00 00 bd 2f 	cmpdi   cr7,r29,0
    226c:	54 03 9e 41 	beq     cr7,25c0 <__get_user_pages_fast+0x500>

With fix:
---------
    2370:	20 00 61 eb 	ld      r27,32(r1)
		if (pgd_none(pgd))
    2374:	00 00 bd 2f 	cmpdi   cr7,r29,0
    2378:	a8 03 9e 41 	beq     cr7,2720 <__get_user_pages_fast+0x530>
			break;
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:51 +10:00
Aneesh Kumar K.V
bf16cdf48a powerpc/mm/radix: Update LPCR HR bit as per ISA
PowerISA 3.0 requires the MMU mode (radix vs. hash) of the hypervisor
to be mirrored in the LPCR register, in addition to the partition table.
This is done to avoid fetching from the table when deciding, among other
things, how to perform transitions to HV mode on some interrupts.
So let's set it up appropriately

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:50 +10:00
Balbir Singh
8cd6d3c23e powerpc/mm: Fix .long's in tlb-radix.c to more meaningful
The .longs with the shifts are harder to read, use more meaningful names
for the opcodes. PPC_TLBIE_5 is introduced for the 5 opcode variation of
the instruction due to an existing op-code for the 2 opcode variant.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:50 +10:00
Benjamin Herrenschmidt
9a1a70ae15 powerpc/pci: Don't try to allocate resources that will be reassigned
When we know we will reassign all resources, trying (and failing)
to allocate them initially is fairly pointless and leads to a lot
of scary messages in the kernel log

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:49 +10:00
Benjamin Herrenschmidt
08a45b320a powerpc/powernv/pci: Check status of a PHB before using it
If the firmware encounters an error (internal or HW) during initialization
of a PHB, it might leave the device-node in the tree but mark it disabled
using the "status" property. We should check it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:49 +10:00
Benjamin Herrenschmidt
a1339faf72 powerpc/powernv/pci: Use the device-tree to get available range of M64's
M64's are the configurable 64-bit windows that cover the 64-bit MMIO
space. We used to hard code 16 windows. Newer chips might have a
variable number and might need to reserve some as well (for example
on PHB4/POWER9, M32 and M64 are actually unified and we use M64#0
to map the 32-bit space).

So newer OPALs will provide a property we can use to know what range
of windows is available. The property is named so that it can
eventually support multiple ranges but we only use the first one for
now.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:48 +10:00
Benjamin Herrenschmidt
f0228c4130 powerpc/powernv/pci: Fallback to OPAL for TCE invalidations
If we don't find registers for the PHB or don't know the model
specific invalidation method, use OPAL calls instead.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:48 +10:00
Benjamin Herrenschmidt
fd141d1a99 powerpc/powernv/pci: Rework accessing the TCE invalidate register
It's architected, always in a known place, so there is no need
to keep a separate pointer to it, we use the existing "regs",
and we complement it with a real mode variant.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

# Conflicts:
#	arch/powerpc/platforms/powernv/pci-ioda.c
#	arch/powerpc/platforms/powernv/pci.h
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:47 +10:00
Benjamin Herrenschmidt
08acce1cab powerpc/powernv/pci: Remove SWINV constants and obsolete TCE code
We have some obsolete code in pnv_pci_p7ioc_tce_invalidate()
to handle some internal lab tools that have stopped being
useful a long time ago. Remove that along with the definition
and test for the TCE_PCI_SWINV_* flags whose value is basically
always the same.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:47 +10:00
Benjamin Herrenschmidt
a34ab7c328 powerpc/powernv/pci: Rename TCE invalidation calls
The TCE invalidation functions are fairly implementation specific,
and while the IODA specs more/less describe the register, in practice
various implementation workarounds may be required. So name the
functions after the target PHB.

Note today and for the foreseeable future, there's a 1:1 relationship
between an IODA version and a PHB implementation. There exist another
variant of IODA1 (Torrent) but we never supported in with OPAL and
never will.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:46 +10:00
Benjamin Herrenschmidt
69c592ed40 powerpc/opal: Add real mode call wrappers
Replace the old generic opal_call_realmode() with proper per-call
wrappers similar to the normal ones and convert callers.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:46 +10:00
Benjamin Herrenschmidt
b7d6bf4fdd powerpc/pseries/pci: Remove obsolete SW invalidate
That was used by some old IBM internal bringup tools and is
no longer relevant.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:45 +10:00
Benjamin Herrenschmidt
fb111334e4 powerpc/powernv: Discover IODA3 PHBs
We instanciate them as IODA2. We also change the MSI EOI hack
to only kick on PHB3 since it will not be needed on any new
implementation.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:45 +10:00
Benjamin Herrenschmidt
d74361881f powerpc/xics: Add ICP OPAL backend
This adds a new XICS backend that uses OPAL calls, which can be
used when we don't have native support for the platform interrupt
controller.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:45 +10:00
Benjamin Herrenschmidt
1d607bb3bd powerpc/irq: Add mechanism to force a replay of interrupts
Calling this function with interrupts soft-disabled will cause
a replay of the external interrupt vector when they are re-enabled.

This will be used by the OPAL XICS backend (and latter by the native
XIVE code) to handle EOI signaling that there are more interrupts to
fetch from the hardware since the hardware won't issue another HW
interrupt in that case.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:44 +10:00
Benjamin Herrenschmidt
9baaef0a22 powerpc/irq: Add support for HV virtualization interrupts
This will be delivering external interrupts from the XIVE to the
Hypervisor. We treat it as a normal external interrupt for the
lazy irq disable code (so it will be replayed as a 0x500) and
route it to do_IRQ.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:44 +10:00
Benjamin Herrenschmidt
b88d4bce2b powerpc/book64s: Move a few exception common handlers to make room
This moves the CBE RAS and facility unavailable "common" handlers
down to after the FWNMI page.

This frees up some space in the very demanded spaces before the
relocation-on vectors and before the FWNMI page. They are still
within 64K of __start, so CONFIG_RELOCATABLE should still work.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17 16:42:34 +10:00
Benjamin Herrenschmidt
9fedd3f880 powerpc/powernv: Add XICS emulation APIs
OPAL provides an emulated XICS interrupt controller to
use as a fallback on newer processors that don't have a
XICS. It's meant as a way to provide backward compatibility
with future processors. Add the corresponding interfaces.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15 20:18:43 +10:00
Shreyas B. Prabhu
c0691f9dd2 powerpc/powernv: Use deepest stop state when cpu is offlined
If hardware supports stop state, use the deepest stop state when
the cpu is offlined.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15 20:18:42 +10:00
Shreyas B. Prabhu
bcef83a00d powerpc/powernv: Add platform support for stop instruction
POWER ISA v3 defines a new idle processor core mechanism. In summary,
 a) new instruction named stop is added. This instruction replaces
	instructions like nap, sleep, rvwinkle.
 b) new per thread SPR named Processor Stop Status and Control Register
	(PSSCR) is added which controls the behavior of stop instruction.

PSSCR layout:
----------------------------------------------------------
| PLS | /// | SD | ESL | EC | PSLL | /// | TR | MTL | RL |
----------------------------------------------------------
0      4     41   42    43   44     48    54   56    60

PSSCR key fields:
	Bits 0:3  - Power-Saving Level Status. This field indicates the lowest
	power-saving state the thread entered since stop instruction was last
	executed.

	Bit 42 - Enable State Loss
	0 - No state is lost irrespective of other fields
	1 - Allows state loss

	Bits 44:47 - Power-Saving Level Limit
	This limits the power-saving level that can be entered into.

	Bits 60:63 - Requested Level
	Used to specify which power-saving level must be entered on executing
	stop instruction

This patch adds support for stop instruction and PSSCR handling.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15 20:18:41 +10:00
Shreyas B. Prabhu
0dfffb48ce powerpc/powernv: abstraction for saving SPRs before entering deep idle states
Create a function for saving SPRs before entering deep idle states.
This function can be reused for POWER9 deep idle states.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15 20:18:40 +10:00
Shreyas B. Prabhu
4eae2c9ae5 powerpc/powernv: Make pnv_powersave_common more generic
pnv_powersave_common does common steps needed before entering idle
state and eventually changes MSR to MSR_IDLE and does rfid to
pnv_enter_arch207_idle_mode.

Move the updation of HSTATE_HWTHREAD_STATE to pnv_powersave_common
from pnv_enter_arch207_idle_mode and make it more generic by passing the
rfid address as a function parameter.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15 20:18:40 +10:00
Shreyas B. Prabhu
5fa6b6bd7a powerpc/powernv: Rename reusable idle functions to hardware agnostic names
Functions like power7_wakeup_loss, power7_wakeup_noloss,
power7_wakeup_tb_loss are used by POWER7 and POWER8 hardware. They can
also be used by POWER9. Hence rename these functions hardware agnostic
names.

Suggested-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15 20:18:39 +10:00
Shreyas B. Prabhu
83289f909a powerpc/powernv: Rename idle_power7.S to idle_book3s.S
idle_power7.S handles idle entry/exit for POWER7, POWER8 and in next
patch for POWER9. Rename the file to a non-hardware specific
name.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15 20:18:39 +10:00
Shreyas B. Prabhu
1706567117 powerpc/kvm: make hypervisor state restore a function
In the current code, when the thread wakes up in reset vector, some
of the state restore code and check for whether a thread needs to
branch to kvm is duplicated. Reorder the code such that this
duplication is avoided.

At a higher level this is what the change looks like-

Before this patch -
power7_wakeup_tb_loss:
	restore hypervisor state
	if (thread needed by kvm)
		goto kvm_start_guest
	restore nvgprs, cr, pc
	rfid to process context

power7_wakeup_loss:
	restore nvgprs, cr, pc
	rfid to process context

reset vector:
	if (waking from deep idle states)
		goto power7_wakeup_tb_loss
	else
		if (thread needed by kvm)
			goto kvm_start_guest
		goto power7_wakeup_loss

After this patch -
power7_wakeup_tb_loss:
	restore hypervisor state
	return

power7_restore_hyp_resource():
	if (waking from deep idle states)
		goto power7_wakeup_tb_loss
	return

power7_wakeup_loss:
	restore nvgprs, cr, pc
	rfid to process context

reset vector:
	power7_restore_hyp_resource()
	if (thread needed by kvm)
                goto kvm_start_guest
	goto power7_wakeup_loss

Reviewed-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15 20:18:38 +10:00
Shreyas B. Prabhu
bfd1b7ae5e powerpc/powernv: Use PNV_THREAD_WINKLE macro while requesting for winkle
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15 20:18:38 +10:00
Stewart Smith
ec5619fdba powerpc/lib: Clarify that adde is an instruction and we mean plural
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15 20:18:37 +10:00
Nathan Fontenot
fdb4f6e99f powerpc/pseries: Remove call to memblock_add()
The call to memblock_add is not needed, this is already done by
memory_add(). This patch removes this call which shrinks
dlpar_add_lmb_memory() enough that it can be merged into dlpar_add_lmb().

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15 20:18:37 +10:00
Nathan Fontenot
ec99907244 powerpc/pseries: Auto-online hotplugged memory
A recent update (commit id 31bc3858ea) allows for automatically
onlining memory that is added. This patch sets the config option
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y for pseries and updates the
pseries memory hotplug code so that DLPAR added memory can be
automatically onlined instead of explicitly onlining the memory.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15 20:18:37 +10:00
Nathan Fontenot
c05a5a4096 powerpc/pseries: Dynamic add entires to associativity lookup array
Dynamically add entries to the associativity lookup array

The ibm,associativity-lookup-arrays property may only contain
associativity arrays for LMBs present at boot time. When hotplug
adding a LMB its associativity array may not be in the associativity
lookup array, this patch adds the ability to add new entries to the
associativity lookup array.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15 20:18:26 +10:00
Nathan Fontenot
c2101c9039 powerpc/pseries: Move property cloning into its own routine
Move property cloning code into its own routine

Split the pieces of dlpar_clone_drconf_property() that create a copy of
the property struct into its own routine. This allows for creating
clones of more than just the ibm,dynamic-memory property used in memory
hotplug.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15 15:02:26 +10:00
Michael Ellerman
236977609a powerpc/pseries: HVC early debug options should depend on HVC_CONSOLE
The pseries HVC early debug options, CONFIG_PPC_EARLY_DEBUG_LPAR and
CONFIG_PPC_EARLY_DEBUG_LPAR_HVSI both require code that is part of the
hvc driver. If we turn them on but not CONFIG_HVC_CONSOLE then we get:

  arch/powerpc/kernel/built-in.o: In function `.udbg_early_init':
  arch/powerpc/kernel/built-in.o:(.debug_addr+0x9a00): undefined reference to `udbg_init_debug_lpar'

Similarly for HVSI. So make them both depend on CONFIG_HVC_CONSOLE.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15 15:02:26 +10:00
Michael Neuling
6bcb80143e powerpc/tm: Fix stack pointer corruption in __tm_recheckpoint()
At the start of __tm_recheckpoint() we save the kernel stack pointer
(r1) in SPRG SCRATCH0 (SPRG2) so that we can restore it after the
trecheckpoint.

Unfortunately, the same SPRG is used in the SLB miss handler.  If an
SLB miss is taken between the save and restore of r1 to the SPRG, the
SPRG is changed and hence r1 is also corrupted.  We can end up with
the following crash when we start using r1 again after the restore
from the SPRG:

  Oops: Bad kernel stack pointer, sig: 6 [#1]
  SMP NR_CPUS=2048 NUMA pSeries
  CPU: 658 PID: 143777 Comm: htm_demo Tainted: G            EL   X 4.4.13-0-default #1
  task: c0000b56993a7810 ti: c00000000cfec000 task.ti: c0000b56993bc000
  NIP: c00000000004f188 LR: 00000000100040b8 CTR: 0000000010002570
  REGS: c00000000cfefd40 TRAP: 0300   Tainted: G            EL   X  (4.4.13-0-default)
  MSR: 8000000300001033 <SF,ME,IR,DR,RI,LE>  CR: 02000424  XER: 20000000
  CFAR: c000000000008468 DAR: 00003ffd84e66880 DSISR: 40000000 SOFTE: 0
  PACATMSCRATCH: 00003ffbc865e680
  GPR00: fffffffcfabc4268 00003ffd84e667a0 00000000100d8c38 000000030544bb80
  GPR04: 0000000000000002 00000000100cf200 0000000000000449 00000000100cf100
  GPR08: 000000000000c350 0000000000002569 0000000000002569 00000000100d6c30
  GPR12: 00000000100d6c28 c00000000e6a6b00 00003ffd84660000 0000000000000000
  GPR16: 0000000000000003 0000000000000449 0000000010002570 0000010009684f20
  GPR20: 0000000000800000 00003ffd84e5f110 00003ffd84e5f7a0 00000000100d0f40
  GPR24: 0000000000000000 0000000000000000 0000000000000000 00003ffff0673f50
  GPR28: 00003ffd84e5e960 00000000003d0f00 00003ffd84e667a0 00003ffd84e5e680
  NIP [c00000000004f188] restore_gprs+0x110/0x17c
  LR [00000000100040b8] 0x100040b8
  Call Trace:
  Instruction dump:
  f8a1fff0 e8e700a8 38a00000 7ca10164 e8a1fff8 e821fff0 7c0007dd 7c421378
  7db142a6 7c3242a6 38800002 7c810164 <e9c100e0> e9e100e8 ea0100f0 ea2100f8

We hit this on large memory machines (> 2TB) but it can also be hit on
smaller machines when 1TB segments are disabled.

To hit this, you also need to be virtualised to ensure SLBs are
periodically removed by the hypervisor.

This patches moves the saving of r1 to the SPRG to the region where we
are guaranteed not to take any further SLB misses.

Fixes: 98ae22e15b ("powerpc: Add helper functions for transactional memory context switching")
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15 15:00:18 +10:00
Michael Ellerman
b5f1bf48f2 powerpc fixes for 4.7 #5
- tm: Always reclaim in start_thread() for exec() class syscalls from Cyril Bur
  - tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0 from Michael Neuling
  - eeh: Fix wrong argument passed to eeh_rmv_device() from Gavin Shan
  - Initialise pci_io_base as early as possible from Darren Stevens
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXeEmsAAoJEFHr6jzI4aWAwMMQAKs/u9rwB3gpOkNJSHajN1Dd
 kdqDufzLxLDwbWnMfqM1+bcO2EOjPhKbtpbzhG6oeiET8undRRoLsjHS5rZeYK5h
 cviRPEJ/Yz8ZWaIgFGI8+02gXwU0MJhuTY8NPexXmsh4FRdKYwEuCIJShl30lg22
 P7UrJ2SCNM+H/uZyS07B7thiwBeAKSp6VkLTpuW/QDz2j1ra/F22dTh7c0Agdahd
 INAMAnh9nYeuMVYn4XjOOlQ07JnBTuf1/W5Wxlw4i/86rVq+Hy8zh5r1X52oysR5
 lZl63B9q3agKG9cc9lSN2ibTDVerlFMwB2QysX2a6Uy7+y2SB3hS7VS1RTXCh3hg
 /omApGGVW3Hh+E2CuKfFLQySU55NRpLAoTGravGr/KsH4wZP/n/fkrctldCrqm7P
 sTPT52+t+iJQk4fiskRY3yQ7DTTnt3rTC8MJRGqvLuCheolLll4NQaWOF75AJP+7
 WFWtC4QHOTPERMkhqLnZDG2vNuDg1H8chuZ2+PxtIs6G1vuOEun+MTZAYh4u6XWE
 bAIT9rV3xBdE17bzYOQz7lU1y7yNVtP7xkm0HIOAHlU4gUrjQp5u8F3TnPW3/M0m
 8GeaZdrPjhsaNg31YZODAeM8Ddf+N9d2a2VPIr/fzytURhMe0ss3Z/MdMoYRATab
 Lh1o+G3gDo9MVaphoJ3w
 =oEAY
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.7-5' into next

Pull in the fixes we sent during 4.7, we have code we want to merge into
next that depends on some of them.
2016-07-15 14:57:47 +10:00
Daniel Axtens
95ec77c06e powerpc: Make ppc_md.{halt, restart} __noreturn
powernv marks it's halt and restart calls as __noreturn. However,
ppc_md does not have this annotation. Add the annotation to ppc_md,
and then to every halt/restart function that is missing it.

Additionally, I have verified that all of these functions do not
return. Occasionally I have added a spin loop to be sure.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14 21:12:06 +10:00
Daniel Axtens
62c2c5cf38 powerpc/sparse: Pass endianness to sparse
Explicitly give sparse an endianness in the Makefile, so that it
doesn't get confused.

Normally we have #ifdef one and #else the other, so it doesn't usually
matter, but we have been bitten by it before, and indeed this patch
fixes a number of sparse errors.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14 20:44:03 +10:00
Daniel Axtens
f8750513b7 powerpc/kvm: Clarify __user annotations
kvmppc_h_put_tce_indirect labels a u64 pointer as __user. It also
labelled the u64 where get_user puts the result as __user. This isn't
a pointer and so doesn't need to be labelled __user.

Split the u64 value definition onto a new line to make it clear that
it doesn't get the annotation.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14 20:43:50 +10:00
Anna-Maria Gleixner
c011926fcb powerpc/pmac/smp: Add missing FROZEN hotplug notifier transitions
The FROZEN transitions are used when a CPU suspends/resumes. In case
of a suspend/resume, only the up prepare (CPU_UP_PREPARE_FROZEN) is
handled. The error handling transition CPU_UP_CANCELED_FROZEN as well
as the CPU_ONLINE_FROZEN transition are not handled.

Masking the switch case action argument with ~CPU_TASKS_FROZEN, to
handle all FROZEN tasks the same way than the corresponding non frozen
tasks.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14 20:39:58 +10:00
Andrew Donnellan
89379f165a PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl
The cxl driver will use infrastructure from pnv_php to handle device tree
updates when switching bi-modal CAPI cards into CAPI mode.

To enable this, export pnv_php_find_slot() and
pnv_php_set_slot_power_state(), and add corresponding declarations, as well
as the definition of struct pnv_php_slot, to asm/pnv-pci.h.

Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: linux-pci@vger.kernel.org
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14 20:28:08 +10:00
Ian Munsie
a2f67d5ee8 cxl: Add support for interrupts on the Mellanox CX4
The Mellanox CX4 in cxl mode uses a hybrid interrupt model, where
interrupts are routed from the networking hardware to the XSL using the
MSIX table, and from there will be transformed back into an MSIX
interrupt using the cxl style interrupts (i.e. using IVTE entries and
ranges to map a PE and AFU interrupt number to an MSIX address).

We want to hide the implementation details of cxl interrupts as much as
possible. To this end, we use a special version of the MSI setup &
teardown routines in the PHB while in cxl mode to allocate the cxl
interrupts and configure the IVTE entries in the process element.

This function does not configure the MSIX table - the CX4 card uses a
custom format in that table and it would not be appropriate to fill that
out in generic code. The rest of the functionality is similar to the
"Full MSI-X mode" described in the CAIA, and this could be easily
extended to support other adapters that use that mode in the future.

The interrupts will be associated with the default context. If the
maximum number of interrupts per context has been limited (e.g. by the
mlx5 driver), it will automatically allocate additional kernel contexts
to associate extra interrupts as required. These contexts will be
started using the same WED that was used to start the default context.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14 20:27:08 +10:00
Ian Munsie
4361b03430 powerpc/powernv: Add support for the cxl kernel api on the real phb
This adds support for the peer model of the cxl kernel api to the
PowerNV PHB, in which physical function 0 represents the cxl function on
the card (an XSL in the case of the CX4), which other physical functions
will use for memory access and interrupt services. It is referred to as
the peer model as these functions are peers of one another, as opposed
to the Virtual PHB model which forms a hierarchy.

This patch exports APIs to enable the peer mode, check if a PCI device
is attached to a PHB in this mode, and to set and get the peer AFU for
this mode.

The cxl driver will enable this mode for supported cards by calling
pnv_cxl_enable_phb_kernel_api(). This will set a flag in the PHB to note
that this mode is enabled, and switch out it's controller_ops for the
cxl version.

The cxl version of the controller_ops struct implements it's own
versions of the enable_device_hook and release_device to handle
refcounting on the peer AFU and to allocate a default context for the
device.

Once enabled, the cxl kernel API may not be disabled on a PHB. Currently
there is no safe way to disable cxl mode short of a reboot, so until
that changes there is no reason to support the disable path.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14 20:26:54 +10:00
Ian Munsie
f456834a6c powerpc/powernv: Split cxl code out into a separate file
The support for using the Mellanox CX4 in cxl mode will require
additions to the PHB code. In preparation for this, move the existing
cxl code out of pci-ioda.c into a separate pci-cxl.c file to keep things
more organised.

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14 20:26:31 +10:00
Michael Ellerman
e0ddf7a245 powerpc/xmon: Dump ISA 2.07 SPRs
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14 20:26:24 +10:00
Michael Ellerman
1846193b17 powerpc/xmon: Dump ISA 2.06 SPRs
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14 20:26:24 +10:00
Michael Ellerman
56346ad88d powerpc/xmon: Adjust spacing of existing SPRs to make room for more
Purely to make it pleasing to the eye.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14 20:26:23 +10:00
Michael Ellerman
13629dad1e powerpc/xmon: Move static regno into its only user
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14 20:26:23 +10:00
Michael Ellerman
5b71eff782 powerpc/xmon: Remove unused externs
None of these are used, or have been since we merged ppc & ppc64.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14 20:26:22 +10:00
Suraj Jitindar Singh
a7d6392866 powerpc/crash: Rearrange loop condition to avoid out of bounds array access
The array crash_shutdown_handles[] has size CRASH_HANDLER_MAX, thus when
we loop over the elements of the list we check crash_shutdown_handles[i]
&& i < CRASH_HANDLER_MAX. However this means that when we increment i to
CRASH_HANDLER_MAX we will perform an out of bound array access checking
the first condition before exiting on the second condition.

To avoid the out of bounds access, simply reorder the loop conditions.

Fixes: 1d1451655b ("powerpc: Add array bounds checking to crash_shutdown_handlers")
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14 20:26:22 +10:00
Thomas Gleixner
57ecde42cc powerpc/perf: Convert book3s notifier to state machine callbacks
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160713153334.345786236@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-07-14 09:34:37 +02:00
Radim Krčmář
c63cf538eb KVM: pass struct kvm to kvm_set_routing_entry
Arch-specific code will use it.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-14 09:03:56 +02:00
Benjamin Herrenschmidt
0f2b3442fb powerpc: Don't test for machine type in smp_setup_cpu_maps()
The subsequent test for RTAS along with the LPAR test are sufficient

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-13 18:15:38 +10:00
Benjamin Herrenschmidt
484cc1ed3c powerpc/rtas: Don't test for machine type in rtas_initialize()
The test is unnecessary, the FW_FEATURE_LPAR is sufficient as there
exist no other LPAR type that has RTAS.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-13 18:15:38 +10:00
Benjamin Herrenschmidt
acd3578ed9 powerpc/85xx/mpc85xx_rdb: Don't use the flat device-tree after boot
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-13 18:15:37 +10:00
Benjamin Herrenschmidt
5b0f9f8368 powerpc/85xx/mpc85xx_ds: Don't use the flat device-tree after boot
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-13 18:15:37 +10:00
Benjamin Herrenschmidt
b282788341 powerpc/85xx/ge_imp3a: Don't use the flat device-tree after boot
ge_imp3a_pic_init() is called way beyond the unflattening of
the tree, it shouldn't be using of_flat_dt_*

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-13 18:15:36 +10:00
Benjamin Herrenschmidt
69a94d84c7 powerpc/cell: Don't use flat device-tree after boot
Some bit of SPU code was using the FDT rather than the expanded
device-tree. Fix it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-13 18:15:36 +10:00
Dan Williams
7a9eb20666 pmem: kill __pmem address space
The __pmem address space was meant to annotate codepaths that touch
persistent memory and need to coordinate a call to wmb_pmem().  Now that
wmb_pmem() is gone, there is little need to keep this annotation.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-07-12 19:25:38 -07:00
Paolo Bonzini
6d5315b3a6 Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD 2016-07-11 18:10:06 +02:00
Benjamin Herrenschmidt
da6a97bf12 powerpc: Move epapr_paravirt_early_init() to early_init_devtree()
The function is called by both 32-bit and 64-bit early setup right
after early_init_devtree(). All it does is run yet another early
DT parser which is precisely what early_init_devtree() is about,
so move it in there.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-11 20:09:40 +10:00
Benjamin Herrenschmidt
63c254a501 powerpc: Add comment explaining the purpose of setup_kdump_trampoline()
Anything in early_setup() needs to be justified to be there, in
this case, we need the trampolines before we can take exceptions
and thus before we turn on the MMU.

Also remove a pretty meaningless and misplaced debug message

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Fix comment formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-11 20:09:40 +10:00
Benjamin Herrenschmidt
bd7c93cca3 powerpc: Update obsolete comments in setup_32.c about entry conditions
early_init() is called in-place before kernel relocation and using
whatever MMU setup exists at the point the kernel is entered.

machine_init() is called after relocation and after some initial
mapping of PAGE_OFFSET has been established (typically using BATs
on 6xx/7xx/7xxx processors or some form of bolted TLB on others).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-11 20:09:39 +10:00
Scott Wood
9f595fd8b5 powerpc/8xx: Force VIRT_IMMR_BASE to be a positive number
The asm-offsets mechanism generates signed numbers, even if the
input value is explicitly unsigned.  This causes a problem with
older binutils (e.g. 2.23), which sign-extend a negative number
when @h is applied.  Thus, this instruction:

	cmpli   cr0, r11, VIRT_IMMR_BASE@h

resulted in this:

Error: operand out of range (0xfffffff0 is not between 0x00000000 and
0x0000ffff)

By casting to a larger type, we can force the output to be expressed
as a positive number.

Signed-off-by: Scott Wood <oss@buserror.net>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
2016-07-09 03:26:53 -05:00
Christophe Leroy
62f64b49d0 powerpc/8xx: add CONFIG_PIN_TLB_IMMR
CONFIG_PIN_TLB maps IMMR area and the first 24 Mbytes of memory.
In some circunstances it might be more interesting to not map
IMMR but map 32 Mbytes of memory instead.

Therefore we add config option CONFIG_PIN_TLB_IMMR to select if
IMMR shall be pinned or not, hence whether we pin 24 or 32 Mbytes of RAM

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09 02:02:48 -05:00
Christophe Leroy
4ad274502f powerpc/8xx: Rework CONFIG_PIN_TLB handling
On recent kernels, with some debug options like for instance
CONFIG_LOCKDEP, the BSS requires more than 8M memory, allthough
the kernel code fits in the first 8M.
Today, it is necessary to activate CONFIG_PIN_TLB to get more than 8M
at startup, allthough pinning TLB is not necessary for that.

We could have inconditionaly mapped 16 or 24M bytes at startup
but some old hardware only have 8M and mapping non-existing RAM
would be an issue due to speculative accesses.

With the preceding patch however, the TLB entries are populated on
demand. By setting up the TLB miss handler to handle up to 24M until
the handler is patched for the entire memory space, it is possible
to allow access up to more memory without mapping non-existing RAM.

It is therefore not needed anymore to map memory data at all
at startup. It will be handled by the TLB miss handler.

One might still want to PIN the IMMR and the first 24M of RAM.
It is now possible to do it in the C memory initialisation
functions. In addition, we now know how much memory we have
when we do it, so we are able to adapt the pining to the
real amount of memory available. So boards with less than 24M
can now also benefit from PIN_TLB.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09 02:02:48 -05:00
Christophe Leroy
bb7f380849 powerpc/8xx: Don't use page table for linear memory space
Instead of using the first level page table to define mappings for
the linear memory space, we can use direct mapping from the TLB
handling routines. This has several advantages:
* No need to read the tables at each TLB miss
* No issue in 16k pages mode where the 1st level table maps 64 Mbytes

The size of the available linear space is known at system startup.
In order to avoid data access at each TLB miss to know the memory
size, the TLB routine is patched at startup with the proper size

This patch provides a 10%-15% improvment of TLB miss handling for
kernel addresses

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09 02:02:48 -05:00
Christophe Leroy
6264dbb98f powerpc/8xx: unpin all TLBs before flushing
Bootloader may have pinned some TLB entries so the kernel must
unpin them before flushing TLBs with tlbia otherwise pinned TLB
entries won't get flushed

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09 02:02:48 -05:00
Christophe Leroy
567a16d152 powerpc/8xx: CONFIG_PIN_TLB unneeded for CONFIG_PPC_EARLY_DEBUG_CPM
IMMR is now mapped by a fixed 512k page managed by the TLB miss
handler so it is not anymore necessary to PIN TLBs

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09 02:02:48 -05:00
Christophe Leroy
4badd43ae4 powerpc/8xx: Map IMMR area with 512k page at a fixed address
Once the linear memory space has been mapped with 8Mb pages, as
seen in the related commit, we get 11 millions DTLB missed during
the reference 600s period. 77% of the misses are on user addresses
and 23% are on kernel addresses (1 fourth for linear address space
and 3 fourth for virtual address space)

Traditionaly, each driver manages one computer board which has its
own components with its own memory maps.
But on embedded chips like the MPC8xx, the SOC has all registers
located in the same IO area.

When looking at ioremaps done during startup, we see that
many drivers are re-mapping small parts of the IMMR for their own use
and all those small pieces gets their own 4k page, amplifying the
number of TLB misses: in our system we get 0xff000000 mapped 31 times
and 0xff003000 mapped 9 times.

Even if each part of IMMR was mapped only once with 4k pages, it would
still be several small mappings towards linear area.

This patch maps the IMMR with a single 512k page.

With this patch applied, the number of DTLB misses during the 10 min
period is reduced to 11.8 millions for a duration of 5.8s, which
represents 2% of the non-idle time hence yet another 10% reduction.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09 02:02:48 -05:00
Christophe Leroy
f86ef74ed9 powerpc/8xx: Fix vaddr for IMMR early remap
Memory: 124428K/131072K available (3748K kernel code, 188K rwdata,
648K rodata, 508K init, 290K bss, 6644K reserved)
Kernel virtual memory layout:
  * 0xfffdf000..0xfffff000  : fixmap
  * 0xfde00000..0xfe000000  : consistent mem
  * 0xfddf6000..0xfde00000  : early ioremap
  * 0xc9000000..0xfddf6000  : vmalloc & ioremap
SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1

Today, IMMR is mapped 1:1 at startup

Mapping IMMR 1:1 is just wrong because it may overlap with another
area. On most mpc8xx boards it is OK as IMMR is set to 0xff000000
but for instance on EP88xC board, IMMR is at 0xfa200000 which
overlaps with VM ioremap area

This patch fixes the virtual address for remapping IMMR with the fixmap
regardless of the value of IMMR.

The size of IMMR area is 256kbytes (CPM at offset 0, security engine
at offset 128k) so a 512k page is enough

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09 02:02:48 -05:00
Christophe Leroy
c223c90386 powerpc32: provide VIRT_CPU_ACCOUNTING
This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture.
PPC32 doesn't have the PACA structure, so we use the task_info
structure to store the accounting data.

In order to reuse on PPC32 the PPC64 functions, all u64 data has
been replaced by 'unsigned long' so that it is u32 on PPC32 and
u64 on PPC64

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09 01:43:50 -05:00
Zhao Qiang
1afbf61750 T104xQDS: Add qe node to t104xqds
add qe node to t104xqds.dtsi

Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09 01:12:04 -05:00
Zhao Qiang
df02087d27 T104xRDB: Add qe node to t104xrdb
add qe node to t104xrdb.dtsi

Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09 01:12:04 -05:00
Zhao Qiang
b7a7085204 T104xD4RDB: Add qe node to t104xd4rdb
add qe node to t104xd4rdb.dtsi and t1040si-post.dtsi.

Signed-off-by: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-09 01:12:04 -05:00
Valentin Longchamp
2dc32d6d7f powerpc: define the fman node for the kmcoge4 DTS
Now that the FMAN mac driver has been merged the fman node is relevant.

The kmcoge4 board implements 3 ethernet interfaces, 1 with a RGMII phy
and 2 with fixed 1 Giga SGMII links.

Signed-off-by: Valentin Longchamp <valentin.longchamp@keymile.com>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-08 20:19:25 -05:00
Bartlomiej Zolnierkiewicz
b58dfa6d88 powerpc: disable IDE subsystem in pq2fads_defconfig
This patch disables deprecated IDE subsystem in pq2fads_defconfig
(no IDE host drivers are selected in this config so there is no valid
reason to enable IDE subsystem itself).

Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-08 20:14:20 -05:00
Alessio Igor Bogani
97493e2e9e powerpc/86xx: Add support for Emerson/Artesyn MVME7100
Add support for the Artesyn MVME7100 Single Board Computer.

The MVME7100 is a 6U form factor VME64 computer with:

    - A two e600 cores Freescale MPC8641D CPU
    - 2 GB of DDR2 onboard memory
    - Four Gigabit Ethernets
    - Five 16550 compatible UARTs
    - One USB 2.0 port
    - Two PCI/PCI eXpress Mezzanine Card (PMC/XMC) Slots
    - A DS1375 Real Time Clock (RTC)
    - 512 KB of Non-Volatile Memory (NVRAM)
    - Two 64 KB EEPROMs
    - 128 MB NOR and 4/8 GB NAND Flash

This patch is based on linux-4.7-rc1 and has been only boot tested.

Limitations:
    This patch covers only models 171 and 173
    No plans to support CPLD timers

Know issues:
    All four PHYs work in polling mode

Configuration is missing for:
    PCI IDSEL and PCI Interrupt definition

Support is missing for:
    Cache and memory controllers (which are very similar to the 85xx ones
        but right now I don't know if we can re-use their support)
    Watchdog, USB, NVRAM, NOR, NAND, EEPROMs, VME, PMC/XMC and RTC

Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-08 20:01:27 -05:00
Sriram Dash
ae9ac1d329 powerpc/85xx: add aliases for usb nodes on t4240, b4860, and b4420
Add usb aliases for consistency with the other platforms.

Signed-off-by: Laurentiu Tudor <Laurentiu.Tudor@freescale.com>
Signed-off-by: Sriram Dash <sriram.dash@nxp.com>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-08 19:58:06 -05:00
Sriram Dash
3dde317654 powerpc/85xx: Change T1040si USB controller version
Change USB controller version name to 2.5 in compatible string for T1040

Signed-off-by: Sriram Dash <sriram.dash@nxp.com>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-08 19:57:36 -05:00
Claudiu Manoil
8ebf506ab2 powerpc/85xx: Don't report SRAM to L2 cache fallback as error
If the SRAM region parameters are missing the SRAM driver
probing exits and the L2 region is configured as L2 cache
entirely.  This is the expected default behaviour, so it
makes no sense to report it as an error.

Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-07-08 19:55:34 -05:00
Andrew Donnellan
bdd910017c powerpc/configs: Remove old symbols from defconfigs
Update defconfigs to remove old symbols and comments referencing old
symbols.

Dropped:

* AVERAGE
* INET_LRO
* EXT3_DEFAULTS_TO_ORDERED
* EXT3_FS_XATTR
* I2O
* INFINIBAND_AMSO1100
* INFINIBAND_EHCA
* IP1000

Replaced:

* BLK_DEV_XIP -> BLK_DEV_RAM_DAX
* CLK_PPC_CORENET -> CLK_QORIQ
* EXT2_FS_XIP -> FS_DAX
* EXT3_FS* -> EXT4_FS*

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-08 22:10:07 +10:00
Andrew Donnellan
fa2cff3f54 powerpc: Fix typo in comment reference to CONFIG_TRACE_IRQFLAGS
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-08 22:10:03 +10:00
Andrew Donnellan
53775c43fe powerpc/ps3: Fix typo in comment reference to CONFIG_PS3_REPOSITORY_WRITE
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-08 22:09:57 +10:00
Andrew Donnellan
91dc068202 powerpc/eeh: Fix pr_debug()s in eeh_cache.c
eeh_cache.c doesn't build cleanly with -DDEBUG when
CONFIG_PHYS_ADDR_T_64BIT is set, as a couple of pr_debug()s use "%lx" for
resource_size_t parameters.

Use "%pap" instead, as it's the correct format specifier for types deriving
from phys_addr_t.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-08 22:09:50 +10:00
Benjamin Herrenschmidt
a203658b5e powerpc/opal: Wake up kopald polling thread before waiting for events
On some environments (prototype machines, some simulators, etc...)
there is no functional interrupt source to signal completion, so
we rely on the fairly slow OPAL heartbeat.

In a number of cases, the calls complete very quickly or even
immediately. We've observed that it helps a lot to wakeup the OPAL
heartbeat thread before waiting for event in those cases, it will
call OPAL immediately to collect completions for anything that
finished fast enough.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-By: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-08 19:53:26 +10:00
Michael Neuling
68a2d80c80 powerpc: Add MTD_BLOCK to powernv_defconfig
This is so we can use the powernv_flash mtd driver as an block
device.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-08 19:24:41 +10:00
Greg Kurz
8c6a0a1f40 powerpc/pseries: start rtasd before PCI probing
A strange behaviour is observed when comparing PCI hotplug in QEMU, between
x86 and pseries. If you consider the following steps:
- start a VM
- add a PCI device via the QEMU monitor before the rtasd has started (for
  example starting the VM in paused state, or hotplug during FW or boot
  loader)
- resume the VM execution

The x86 kernel detects the PCI device, but the pseries one does not.

This happens because the rtasd kernel worker is currently started under
device_initcall, while PCI probing happens earlier under subsys_initcall.

As a consequence, if we have a pending RTAS event at boot time, a message
is printed and the event is dropped.

This patch moves all the initialization of rtasd to arch_initcall, which is
run before subsys_call: this way, logging_enabled is true when the RTAS
event pops up and it is not lost anymore.

The proc fs bits stay at device_initcall because they cannot be run before
fs_initcall.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-08 19:22:15 +10:00
Guilherme G. Piccoli
63a72284b1 powerpc/pci: Assign fixed PHB number based on device-tree properties
The domain/PHB field of PCI addresses has its value obtained from a
global variable, incremented each time a new domain (represented by
struct pci_controller) is added on the system. The domain addition
process happens during boot or due to PHB hotplug add.

As recent kernels are using predictable naming for network interfaces,
the network stack is more tied to PCI naming. This can be a problem in
hotplug scenarios, because PCI addresses will change if devices are
removed and then re-added. This situation seems unusual, but it can
happen if a user wants to replace a NIC without rebooting the machine,
for example.

This patch changes the way PCI domain values are generated: now, we use
device-tree properties to assign fixed PHB numbers to PCI addresses
when available (meaning pSeries and PowerNV cases). We also use a bitmap
to allow dynamic PHB numbering when device-tree properties are not
used. This bitmap keeps track of used PHB numbers and if a PHB is
released (by hotplug operations for example), it allows the reuse of
this PHB number, avoiding PCI address to change in case of device remove
and re-add soon after. No functional changes were introduced.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Ian Munsie <imunsie@au1.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
[mpe: Drop unnecessary machine_is(pseries) test]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-07 22:06:55 +10:00
Michael Ellerman
fc022fdf41 powerpc/kernel: Drop unused extern for current_set
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-07 22:03:10 +10:00
Michael Ellerman
d468fcafb7 powerpc/pci: Fix build with PCI_IOV=y and EEH=n
Despite attempting to fix this in commit fb36e90736 ("powerpc/pci: Fix
SRIOV not building without EEH enabled"), the build is still broken when
PCI_IOV=y and EEH=n (eg. g5_defconfig with PCI_IOV=y):

  arch/powerpc/kernel/pci_dn.c: In function ‘remove_dev_pci_data’:
  arch/powerpc/kernel/pci_dn.c:230:18: error: unused variable ‘edev’

Incorporate Ben's idea of using __maybe_unused to avoid so many #ifdefs.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-07 16:33:27 +10:00
Benjamin Herrenschmidt
fecbfabe1d powerpc: Fix build with CONFIG_MEMORY_HOTPLUG on some configs
For memory hotplug to work, the MMU code needs to provide the functions
create_section_mapping() and remove_section_mapping() to respectively
map and unmap portions of the linear mapping.

At the moment only hash64 provides these, so we provide weak stubs that
just error out. This fixes the build with configurations such as 64-bit
BookE with CONFIG_MEMORY_HOTPLUG enabled.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-07 16:33:27 +10:00
Benjamin Herrenschmidt
e93d8e6773 powerpc/mm: Fix build of Book3E/64 with 64K pages
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-07 16:33:26 +10:00
Anton Blanchard
6dd7a82cc5 crypto: powerpc - Add POWER8 optimised crc32c
Use the vector polynomial multiply-sum instructions in POWER8 to
speed up crc32c.

This is just over 41x faster than the slice-by-8 method that it
replaces. Measurements on a 4.1 GHz POWER8 show it sustaining
52 GiB/sec.

A simple btrfs write performance test:

    dd if=/dev/zero of=/mnt/tmpfile bs=1M count=4096
    sync

is over 3.7x faster.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-07-05 23:05:19 +08:00
Anton Blanchard
151f25112f powerpc: define FUNC_START/FUNC_END
gcc provides FUNC_START/FUNC_END macros to help with creating
assembly functions. Mirror these in the kernel so we can more easily
share code between userspace and the kernel. FUNC_END is just a
stub since we don't currently annotate the end of kernel functions.

It might make sense to do a wholesale search and replace, but for
now just create a couple of defines.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-07-05 23:05:16 +08:00
Oliver O'Halloran
656ad58ef1 powerpc/boot: Add OPAL console to epapr wrappers
This patch adds an OPAL console backend to the powerpc boot wrapper so
that decompression failures inside the wrapper can be reported to the
user. This is important since it typically indicates data corruption in
the firmware and other nasty things.

Currently this only works when building a little endian kernel. When
compiling a 64 bit BE kernel the wrapper is always build 32 bit to be
compatible with some 32 bit firmwares. BE support will be added at a
later date. Another limitation of this is that only the "raw" type of
OPAL console is supported, however machines that provide a hvsi console
also provide a raw console so this is not an issue in practice.

Actually-written-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Move #ifdef __powerpc64__ to avoid warnings on 32-bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05 23:58:54 +10:00
Oliver O'Halloran
faf7882962 powerpc/mm: Add a parameter to disable 1TB segs
This patch adds the kernel command line parameter "no_tb_segs" which
forces the kernel to use 256MB rather than 1TB segments. Forcing the use
of 256MB segments makes it considerably easier to test code that depends
on an SLB miss occurring.

Suggested-by: Michael Neuling <mikey@neuling.org>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05 23:58:54 +10:00
Oliver O'Halloran
7990102446 powerpc/timer: Large Decrementer support
Power ISAv3 adds a large decrementer (LD) mode which increases the size
of the decrementer register. The size of the enlarged decrementer
register is between 32 and 64 bits with the exact size being dependent
on the implementation. When in LD mode, reads are sign extended to 64
bits and a decrementer exception is raised when the high bit is set (i.e
the value goes below zero). Writes however are truncated to the physical
register width so some care needs to be taken to ensure that the high
bit is not set when reloading the decrementer. This patch adds support
for using the LD inside the host kernel on processors that support it.

When LD mode is supported firmware will supply the ibm,dec-bits property
for CPU nodes to allow the kernel to determine the maximum decrementer
value. Enabling LD mode is a hypervisor privileged operation so the kernel
can only enable it manually when running in hypervisor mode. Guests that
support LD mode can request it using the "ibm,client-architecture-support"
firmware call (not implemented in this patch) or some other platform
specific method. If this property is not supplied then the traditional
decrementer width of 32 bit is assumed and LD mode will not be enabled.

This patch was based on initial work by Jack Miller.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05 23:58:53 +10:00
Anton Blanchard
9ddf0075f9 powerpc: Avoid -maltivec when using clang integrated assembler
Check the assembler supports -maltivec by wrapping it with
call as-option.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05 23:58:53 +10:00
Rasmus Villemoes
e2be23712a powerpc/pseries: Fix error return value in cmm_mem_going_offline()
cmm_mem_going_offline() is (only) called from cmm_memory_cb(), which
sends the return value through notifier_from_errno(). The latter
expects 0 or -errno (notifier_to_errno(notifier_from_errno(x)) is 0
for any x >= 0, so passing a positive value cannot make sense). Hence
negate ENOMEM.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05 23:58:52 +10:00
Andrew Donnellan
a9862c7440 powerpc/rtas: Fix array overrun in ppc_rtas() syscall
If ppc_rtas() is called with args.nargs == 16 and args.nret == 0,
args.rets is set to point to &args.args[16], which is beyond the end of
the args.args array. This results in a minor read overrun of the array
when we check the first return code (which, per PAPR, is a required
output of all RTAS calls) to see if there's been a hardware error.

Change the nargs/nret check to ensure nargs is <= 15, allowing room for
the status code. Users shouldn't be calling with nret == 0, but there's
no real harm if they do, so we don't stop them.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05 23:49:52 +10:00
Chris Smart
ae26b36f80 powerpc: Send SIGBUS on unaligned copy and paste
Calling ISA 3.0 instructions copy, copy_first, paste and paste_last
generates an alignment fault when copying or pasting unaligned
data (128 byte). We catch this and send SIGBUS to the userspace
process that caused it.

We do not emulate these because paste may contain additional metadata
when pasting to a co-processor and paste_last is the synchronisation
point for preceding copy/paste sequences.

Thanks to Michael Neuling <mikey@neuling.org> for his help.

Signed-off-by: Chris Smart <chris@distroguy.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05 23:49:51 +10:00
Madhavan Srinivasan
f1fb60bfde powerpc/perf: Export Power9 generic and cache events to sysfs
Export the generic hardware and cache perf events for Power9 to sysfs,
so users can determine the PMU event monitored.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05 23:49:48 +10:00
Madhavan Srinivasan
8c002dbd05 powerpc/perf: Power9 PMU support
This patch adds base enablement for the power9 PMU.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05 23:49:48 +10:00
Madhavan Srinivasan
34922527a2 powerpc/perf: Add power9 event list macros for generic and cache events
Add macros for the generic and cache events on Power9

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05 23:49:48 +10:00
Madhavan Srinivasan
393eb79ad3 powerpc/perf: factor out power8 __init_pmu code
Factor out the power8 pmu init functions to share with
power9. Monitor Mode Control Register S(MMCRS) and
Monitor Mode Control Register H(MMCRH) registers are
dropped in Power9. These registers are added to new
function which are included for power8 init.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05 23:49:47 +10:00
Madhavan Srinivasan
7ffd948fae powerpc/perf: factor out power8 pmu functions
Factor out some of the power8 pmu functions
to new file "isa207-common.c" to share with
power9 pmu code. Only code movement and no
logic change

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05 23:49:47 +10:00
Madhavan Srinivasan
4d3576b207 powerpc/perf: factor out power8 pmu macros and defines
Factor out some of the power8 pmu macros to
new a header file to share with power9 pmu code.
Just code movement and no logic change.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05 23:49:46 +10:00
Michael Ellerman
b5b1cfc5d4 powerpc/fadump: Fix build error introduced by recent cleanup
We spent so much time bike-shedding the printk() we missed that the next
line was missing a semi-colon. And it seems none of our defconfigs turn
on CONFIG_FA_DUMP.

Fixes: 4a03749f14 ("powerpc/fadump: Trivial fix of spelling mistake, clean up message")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-05 23:49:46 +10:00
Rafael J. Wysocki
5d1191ab6c Merge back earlier cpufreq material for v4.8. 2016-07-04 13:21:43 +02:00
Linus Torvalds
70bd68d7f7 powerpc fixes for 4.7 #5
- tm: Always reclaim in start_thread() for exec() class syscalls from Cyril Bur
  - tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0 from Michael Neuling
  - eeh: Fix wrong argument passed to eeh_rmv_device() from Gavin Shan
  - Initialise pci_io_base as early as possible from Darren Stevens
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXeEmsAAoJEFHr6jzI4aWAwMMQAKs/u9rwB3gpOkNJSHajN1Dd
 kdqDufzLxLDwbWnMfqM1+bcO2EOjPhKbtpbzhG6oeiET8undRRoLsjHS5rZeYK5h
 cviRPEJ/Yz8ZWaIgFGI8+02gXwU0MJhuTY8NPexXmsh4FRdKYwEuCIJShl30lg22
 P7UrJ2SCNM+H/uZyS07B7thiwBeAKSp6VkLTpuW/QDz2j1ra/F22dTh7c0Agdahd
 INAMAnh9nYeuMVYn4XjOOlQ07JnBTuf1/W5Wxlw4i/86rVq+Hy8zh5r1X52oysR5
 lZl63B9q3agKG9cc9lSN2ibTDVerlFMwB2QysX2a6Uy7+y2SB3hS7VS1RTXCh3hg
 /omApGGVW3Hh+E2CuKfFLQySU55NRpLAoTGravGr/KsH4wZP/n/fkrctldCrqm7P
 sTPT52+t+iJQk4fiskRY3yQ7DTTnt3rTC8MJRGqvLuCheolLll4NQaWOF75AJP+7
 WFWtC4QHOTPERMkhqLnZDG2vNuDg1H8chuZ2+PxtIs6G1vuOEun+MTZAYh4u6XWE
 bAIT9rV3xBdE17bzYOQz7lU1y7yNVtP7xkm0HIOAHlU4gUrjQp5u8F3TnPW3/M0m
 8GeaZdrPjhsaNg31YZODAeM8Ddf+N9d2a2VPIr/fzytURhMe0ss3Z/MdMoYRATab
 Lh1o+G3gDo9MVaphoJ3w
 =oEAY
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - tm: Always reclaim in start_thread() for exec() class syscalls from
   Cyril Bur

 - tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0 from Michael
   Neuling

 - eeh: Fix wrong argument passed to eeh_rmv_device() from Gavin Shan

 - Initialise pci_io_base as early as possible from Darren Stevens

* tag 'powerpc-4.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc: Initialise pci_io_base as early as possible
  powerpc/tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0
  powerpc/eeh: Fix wrong argument passed to eeh_rmv_device()
  powerpc/tm: Always reclaim in start_thread() for exec() class syscalls
2016-07-02 17:47:54 -07:00
Paolo Bonzini
6edaa5307f KVM: remove kvm_guest_enter/exit wrappers
Use the functions from context_tracking.h directly.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-01 11:03:21 +02:00
Darren Stevens
bfa37087aa powerpc: Initialise pci_io_base as early as possible
Commit d6a9996e84 ("powerpc/mm: vmalloc abstraction in preparation for
radix") turned kernel memory and IO addresses from #defined constants to
variables initialised at runtime.

On PA6T (pasemi) systems the setup_arch() machine call initialises the
onboard PCI-e root-ports, and uses pci_io_base to do this, which is now
before its value has been set, resulting in a panic early in boot before
console IO is initialised.

Move the pci_io_base initialisation to the same place as vmalloc ranges
are set (hash__early_init_mmu()/radix__early_init_mmu()) - this is the
earliest possible place we can initialise it.

Fixes: d6a9996e84 ("powerpc/mm: vmalloc abstraction in preparation for radix")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Add #ifdef CONFIG_PCI, massage change log slightly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-30 16:52:29 +10:00
Suraj Jitindar Singh
43a1dd9b5f powerpc/powernv: Add driver for operator panel on FSP machines
Implement new character device driver to allow access from user space
to the operator panel display present on IBM Power Systems machines
with FSPs.

This will allow status information to be presented on the display which
is visible to a user.

The driver implements a character buffer which a user can read/write
by accessing the device (/dev/op_panel). This buffer is then displayed on
the operator panel display. Any attempt to write past the last character
position will have no effect and attempts to write more characters than
the size of the display will be truncated. The device may only be accessed
by a single process at a time.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-29 17:33:46 +10:00
Suraj Jitindar Singh
d0226d315d powerpc/opal: Add inline function to get rc from an ASYNC_COMP opal_msg
An opal_msg of type OPAL_MSG_ASYNC_COMP contains the return code in the
params[1] struct member. However this isn't intuitive or obvious when
reading the code and requires that a user look at the skiboot
documentation or opal-api.h to verify this.

Add an inline function to get the return code from an opal_msg and update
call sites accordingly.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-29 17:33:18 +10:00
Michael Neuling
190ce8693c powerpc/tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0
Currently we have 2 segments that are bolted for the kernel linear
mapping (ie 0xc000... addresses). This is 0 to 1TB and also the kernel
stacks. Anything accessed outside of these regions may need to be
faulted in. (In practice machines with TM always have 1T segments)

If a machine has < 2TB of memory we never fault on the kernel linear
mapping as these two segments cover all physical memory. If a machine
has > 2TB of memory, there may be structures outside of these two
segments that need to be faulted in. This faulting can occur when
running as a guest as the hypervisor may remove any SLB that's not
bolted.

When we treclaim and trecheckpoint we have a window where we need to
run with the userspace GPRs. This means that we no longer have a valid
stack pointer in r1. For this window we therefore clear MSR RI to
indicate that any exceptions taken at this point won't be able to be
handled. This means that we can't take segment misses in this RI=0
window.

In this RI=0 region, we currently access the thread_struct for the
process being context switched to or from. This thread_struct access
may cause a segment fault since it's not guaranteed to be covered by
the two bolted segment entries described above.

We've seen this with a crash when running as a guest with > 2TB of
memory on PowerVM:

  Unrecoverable exception 4100 at c00000000004f138
  Oops: Unrecoverable exception, sig: 6 [#1]
  SMP NR_CPUS=2048 NUMA pSeries
  CPU: 1280 PID: 7755 Comm: kworker/1280:1 Tainted: G                 X 4.4.13-46-default #1
  task: c000189001df4210 ti: c000189001d5c000 task.ti: c000189001d5c000
  NIP: c00000000004f138 LR: 0000000010003a24 CTR: 0000000010001b20
  REGS: c000189001d5f730 TRAP: 4100   Tainted: G                 X  (4.4.13-46-default)
  MSR: 8000000100001031 <SF,ME,IR,DR,LE>  CR: 24000048  XER: 00000000
  CFAR: c00000000004ed18 SOFTE: 0
  GPR00: ffffffffc58d7b60 c000189001d5f9b0 00000000100d7d00 000000003a738288
  GPR04: 0000000000002781 0000000000000006 0000000000000000 c0000d1f4d889620
  GPR08: 000000000000c350 00000000000008ab 00000000000008ab 00000000100d7af0
  GPR12: 00000000100d7ae8 00003ffe787e67a0 0000000000000000 0000000000000211
  GPR16: 0000000010001b20 0000000000000000 0000000000800000 00003ffe787df110
  GPR20: 0000000000000001 00000000100d1e10 0000000000000000 00003ffe787df050
  GPR24: 0000000000000003 0000000000010000 0000000000000000 00003fffe79e2e30
  GPR28: 00003fffe79e2e68 00000000003d0f00 00003ffe787e67a0 00003ffe787de680
  NIP [c00000000004f138] restore_gprs+0xd0/0x16c
  LR [0000000010003a24] 0x10003a24
  Call Trace:
  [c000189001d5f9b0] [c000189001d5f9f0] 0xc000189001d5f9f0 (unreliable)
  [c000189001d5fb90] [c00000000001583c] tm_recheckpoint+0x6c/0xa0
  [c000189001d5fbd0] [c000000000015c40] __switch_to+0x2c0/0x350
  [c000189001d5fc30] [c0000000007e647c] __schedule+0x32c/0x9c0
  [c000189001d5fcb0] [c0000000007e6b58] schedule+0x48/0xc0
  [c000189001d5fce0] [c0000000000deabc] worker_thread+0x22c/0x5b0
  [c000189001d5fd80] [c0000000000e7000] kthread+0x110/0x130
  [c000189001d5fe30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4
  Instruction dump:
  7cb103a6 7cc0e3a6 7ca222a6 78a58402 38c00800 7cc62838 08860000 7cc000a6
  38a00006 78c60022 7cc62838 0b060000 <e8c701a0> 7ccff120 e8270078 e8a70098
  ---[ end trace 602126d0a1dedd54 ]---

This fixes this by copying the required data from the thread_struct to
the stack before we clear MSR RI. Then once we clear RI, we only access
the stack, guaranteeing there's no segment miss.

We also tighten the region over which we set RI=0 on the treclaim()
path. This may have a slight performance impact since we're adding an
mtmsr instruction.

Fixes: 090b9284d7 ("powerpc/tm: Clear MSR RI in non-recoverable TM code")
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-29 16:19:25 +10:00
Gavin Shan
cca0e542e0 powerpc/eeh: Fix wrong argument passed to eeh_rmv_device()
When calling eeh_rmv_device() in eeh_reset_device() for partial hotplug
case, @rmv_data instead of its address is the proper argument.
Otherwise, the stack frame is corrupted when writing to
@rmv_data (actually its address) in eeh_rmv_device(). It results in
kernel crash as observed.

This fixes the issue by passing @rmv_data, not its address to
eeh_rmv_device() in eeh_reset_device().

Fixes: 67086e32b5 ("powerpc/eeh: powerpc/eeh: Support error recovery for VF PE")
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-28 20:47:49 +10:00
Colin Ian King
6e8a9279a8 powerpc/powernv: Fix spelling mistake "Retrived" -> "Retrieved"
Trivial fix to spelling mistake in pr_debug() message.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-28 13:52:18 +10:00
Colin Ian King
4a03749f14 powerpc/fadump: Trivial fix of spelling mistake, clean up message
Fix trivial spelling mistake "rgistration". Also use pr_err()
instead of printk() and unsplit the string to keep it all on one
line.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
[mpe: Keep rc on the same line, splitting it doesn't help]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-28 13:50:47 +10:00
Dan Williams
0d52c756a6 block: convert to device_add_disk()
For block drivers that specify a parent device, convert them to use
device_add_disk().

This conversion was done with the following semantic patch:

    @@
    struct gendisk *disk;
    expression E;
    @@

    - disk->driverfs_dev = E;
    ...
    - add_disk(disk);
    + device_add_disk(E, disk);

    @@
    struct gendisk *disk;
    expression E1, E2;
    @@

    - disk->driverfs_dev = E1;
    ...
    E2 = disk;
    ...
    - add_disk(E2);
    + device_add_disk(E1, E2);

...plus some manual fixups for a few missed conversions.

Cc: Jens Axboe <axboe@fb.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-06-27 12:26:08 -07:00
Cyril Bur
8e96a87c54 powerpc/tm: Always reclaim in start_thread() for exec() class syscalls
Userspace can quite legitimately perform an exec() syscall with a
suspended transaction. exec() does not return to the old process, rather
it load a new one and starts that, the expectation therefore is that the
new process starts not in a transaction. Currently exec() is not treated
any differently to any other syscall which creates problems.

Firstly it could allow a new process to start with a suspended
transaction for a binary that no longer exists. This means that the
checkpointed state won't be valid and if the suspended transaction were
ever to be resumed and subsequently aborted (a possibility which is
exceedingly likely as exec()ing will likely doom the transaction) the
new process will jump to invalid state.

Secondly the incorrect attempt to keep the transactional state while
still zeroing state for the new process creates at least two TM Bad
Things. The first triggers on the rfid to return to userspace as
start_thread() has given the new process a 'clean' MSR but the suspend
will still be set in the hardware MSR. The second TM Bad Thing triggers
in __switch_to() as the processor is still transactionally suspended but
__switch_to() wants to zero the TM sprs for the new process.

This is an example of the outcome of calling exec() with a suspended
transaction. Note the first 700 is likely the first TM bad thing
decsribed earlier only the kernel can't report it as we've loaded
userspace registers. c000000000009980 is the rfid in
fast_exception_return()

  Bad kernel stack pointer 3fffcfa1a370 at c000000000009980
  Oops: Bad kernel stack pointer, sig: 6 [#1]
  CPU: 0 PID: 2006 Comm: tm-execed Not tainted
  NIP: c000000000009980 LR: 0000000000000000 CTR: 0000000000000000
  REGS: c00000003ffefd40 TRAP: 0700   Not tainted
  MSR: 8000000300201031 <SF,ME,IR,DR,LE,TM[SE]>  CR: 00000000  XER: 00000000
  CFAR: c0000000000098b4 SOFTE: 0
  PACATMSCRATCH: b00000010000d033
  GPR00: 0000000000000000 00003fffcfa1a370 0000000000000000 0000000000000000
  GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR12: 00003fff966611c0 0000000000000000 0000000000000000 0000000000000000
  NIP [c000000000009980] fast_exception_return+0xb0/0xb8
  LR [0000000000000000]           (null)
  Call Trace:
  Instruction dump:
  f84d0278 e9a100d8 7c7b03a6 e84101a0 7c4ff120 e8410170 7c5a03a6 e8010070
  e8410080 e8610088 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed023b

  Kernel BUG at c000000000043e80 [verbose debug info unavailable]
  Unexpected TM Bad Thing exception at c000000000043e80 (msr 0x201033)
  Oops: Unrecoverable exception, sig: 6 [#2]
  CPU: 0 PID: 2006 Comm: tm-execed Tainted: G      D
  task: c0000000fbea6d80 ti: c00000003ffec000 task.ti: c0000000fb7ec000
  NIP: c000000000043e80 LR: c000000000015a24 CTR: 0000000000000000
  REGS: c00000003ffef7e0 TRAP: 0700   Tainted: G      D
  MSR: 8000000300201033 <SF,ME,IR,DR,RI,LE,TM[SE]>  CR: 28002828  XER: 00000000
  CFAR: c000000000015a20 SOFTE: 0
  PACATMSCRATCH: b00000010000d033
  GPR00: 0000000000000000 c00000003ffefa60 c000000000db5500 c0000000fbead000
  GPR04: 8000000300001033 2222222222222222 2222222222222222 00000000ff160000
  GPR08: 0000000000000000 800000010000d033 c0000000fb7e3ea0 c00000000fe00004
  GPR12: 0000000000002200 c00000000fe00000 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 c0000000fbea7410 00000000ff160000
  GPR24: c0000000ffe1f600 c0000000fbea8700 c0000000fbea8700 c0000000fbead000
  GPR28: c000000000e20198 c0000000fbea6d80 c0000000fbeab680 c0000000fbea6d80
  NIP [c000000000043e80] tm_restore_sprs+0xc/0x1c
  LR [c000000000015a24] __switch_to+0x1f4/0x420
  Call Trace:
  Instruction dump:
  7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8
  4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020

This fixes CVE-2016-5828.

Fixes: bc2a9408fa ("powerpc: Hook in new transactional memory code")
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-27 20:35:17 +10:00
Stephan Linz
83e2c70e84 powerpc: use the new LED disk activity trigger
- dts: rename 'ide-disk' to 'disk-activity'
- defconfig: rename 'ADB_PMU_LED_IDE' to 'ADB_PMU_LED_DISK'

Cc: Joseph Jezak <josejx@gentoo.org>
Cc: Jörg Sommer <joerg@alea.gnuu.de>
Signed-off-by: Stephan Linz <linz@li-pro.net>
Signed-off-by: Jacek Anaszewski <j.anaszewski@samsung.com>
2016-06-27 08:58:40 +02:00
Linus Torvalds
2f6e97477b powerpc fixes for 4.7 #4
- mm/radix: Update to tlb functions ric argument from Aneesh Kumar K.V
  - mm/radix: Flush page walk cache when freeing page table from Aneesh Kumar K.V
  - mm/hash: Use the correct PPP mask when updating HPTE from Aneesh Kumar K.V
  - mm/hash: Don't add memory coherence if cache inhibited is set from Aneesh Kumar K.V
  - mm/radix: Update Radix tree size as per ISA 3.0 from Aneesh Kumar K.V
  - eeh: Fix invalid cached PE primary bus from Gavin Shan
  - Fix faults caused by radix patching of SLB miss handler from Michael Ellerman
  - bpf/jit: Disable classic BPF JIT on ppc64le from Naveen N. Rao
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXbmITAAoJEFHr6jzI4aWANXwQAIUzjKcLpyQWEKwOKfMqBT5T
 EfsWDqJA/J3mYKNcZyiB7qv1NVPPkU9DSBK0OaAKwYdg5YWKDBl6R3mW+j4di0bP
 SkFACCyE2WbLTCiz5fzd8l974RUh5jKQpIrObp4/8xp40d0vsyAzz4J7d4HVRsrr
 BnoTS/KmytsaDQls5kYArxhW6U+Shag586Au1hNt3SS/be8lCNEXLfa3ltCr7WLJ
 k+xM0KM5kpO9/OK40A64TH7xUZKQIgPMUR5Ct43IJhMeHNnQctLmGQQjRWTrajv1
 K/TfrYwCl66xzKaH5G3MKJgqJAJm1LTwGs+2aOn91x5hPrbmW+bLqr1Mm0ukjROz
 oaANO5fgEQjl0JRGCNAhLHvaoqJX6v5/7GbmFRoaigX4UKJ63nK1ABiwAgKDGnyj
 OchwwJywU5UIX/+9Qpig3CxQNhEV33Nnp8t+dsg8CPd9o/G0mIe0QP1eGdhD09mM
 X9eMfN08hLj5ERKvlpW0rrq1b/wizOGmUXbmt02HZi7iLNsyQMwShiOvwOaAvH6/
 SzEFBJdp11jNoe4GJDt5rH4HlnTnTAYwcLFMTDCCPdJXy7voI/J+MaAmG89S30dQ
 ph0+4v/8K2N0VDZ7kkgi0GL1gp9ULkgtimrN5Z0R8U7qEapEW6ybvv+0Ewln742f
 SCRNVMZgzcwe3CcCKzdn
 =fxml
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "mm/radix (Aneesh Kumar K.V):
   - Update to tlb functions ric argument
   - Flush page walk cache when freeing page table
   - Update Radix tree size as per ISA 3.0

  mm/hash (Aneesh Kumar K.V):
   - Use the correct PPP mask when updating HPTE
   - Don't add memory coherence if cache inhibited is set

  eeh (Gavin Shan):
   - Fix invalid cached PE primary bus

  bpf/jit (Naveen N. Rao):
   - Disable classic BPF JIT on ppc64le

  .. and fix faults caused by radix patching of SLB miss handler
  (Michael Ellerman)"

* tag 'powerpc-4.7-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/bpf/jit: Disable classic BPF JIT on ppc64le
  powerpc: Fix faults caused by radix patching of SLB miss handler
  powerpc/eeh: Fix invalid cached PE primary bus
  powerpc/mm/radix: Update Radix tree size as per ISA 3.0
  powerpc/mm/hash: Don't add memory coherence if cache inhibited is set
  powerpc/mm/hash: Use the correct PPP mask when updating HPTE
  powerpc/mm/radix: Flush page walk cache when freeing page table
  powerpc/mm/radix: Update to tlb functions ric argument
2016-06-25 06:01:48 -07:00
Michal Hocko
2379a23e34 powerpc: get rid of superfluous __GFP_REPEAT
__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.

{pud,pmd}_alloc_one are allocating from {PGT,PUD}_CACHE initialized in
pgtable_cache_init which doesn't have larger than sizeof(void *) << 12
size and that fits into !costly allocation request size.

PGALLOC_GFP is used only in radix__pgd_alloc which uses either order-0
or order-4 requests.  The first one doesn't need the flag while the
second does.  Drop __GFP_REPEAT from PGALLOC_GFP and add it for the
order-4 one.

This means that this flag has never been actually useful here because it
has always been used only for PAGE_ALLOC_COSTLY requests.

Link: http://lkml.kernel.org/r/1464599699-30131-12-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Michal Hocko
32d6bd9059 tree wide: get rid of __GFP_REPEAT for order-0 allocations part I
This is the third version of the patchset previously sent [1].  I have
basically only rebased it on top of 4.7-rc1 tree and dropped "dm: get
rid of superfluous gfp flags" which went through dm tree.  I am sending
it now because it is tree wide and chances for conflicts are reduced
considerably when we want to target rc2.  I plan to send the next step
and rename the flag and move to a better semantic later during this
release cycle so we will have a new semantic ready for 4.8 merge window
hopefully.

Motivation:

While working on something unrelated I've checked the current usage of
__GFP_REPEAT in the tree.  It seems that a majority of the usage is and
always has been bogus because __GFP_REPEAT has always been about costly
high order allocations while we are using it for order-0 or very small
orders very often.  It seems that a big pile of them is just a
copy&paste when a code has been adopted from one arch to another.

I think it makes some sense to get rid of them because they are just
making the semantic more unclear.  Please note that GFP_REPEAT is
documented as

* __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt

* _might_ fail.  This depends upon the particular VM implementation.
  while !costly requests have basically nofail semantic.  So one could
  reasonably expect that order-0 request with __GFP_REPEAT will not loop
  for ever.  This is not implemented right now though.

I would like to move on with __GFP_REPEAT and define a better semantic
for it.

  $ git grep __GFP_REPEAT origin/master | wc -l
  111
  $ git grep __GFP_REPEAT | wc -l
  36

So we are down to the third after this patch series.  The remaining
places really seem to be relying on __GFP_REPEAT due to large allocation
requests.  This still needs some double checking which I will do later
after all the simple ones are sorted out.

I am touching a lot of arch specific code here and I hope I got it right
but as a matter of fact I even didn't compile test for some archs as I
do not have cross compiler for them.  Patches should be quite trivial to
review for stupid compile mistakes though.  The tricky parts are usually
hidden by macro definitions and thats where I would appreciate help from
arch maintainers.

[1] http://lkml.kernel.org/r/1461849846-27209-1-git-send-email-mhocko@kernel.org

This patch (of 19):

__GFP_REPEAT has a rather weak semantic but since it has been introduced
around 2.6.12 it has been ignored for low order allocations.  Yet we
have the full kernel tree with its usage for apparently order-0
allocations.  This is really confusing because __GFP_REPEAT is
explicitly documented to allow allocation failures which is a weaker
semantic than the current order-0 has (basically nofail).

Let's simply drop __GFP_REPEAT from those places.  This would allow to
identify place which really need allocator to retry harder and formulate
a more specific semantic for what the flag is supposed to do actually.

Link: http://lkml.kernel.org/r/1464599699-30131-2-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: John Crispin <blogic@openwrt.org>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24 17:23:52 -07:00
Benjamin Herrenschmidt
cdb1b3424d powerpc/pci: Reduce log level of PCI I/O space warning
If a PHB has no I/O space, there's no need to make it look like
something bad happened, a pr_debug() is plenty enough since this
is the case of all our modern POWER chips.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-24 15:26:31 +10:00
Naveen N. Rao
156d0e290e powerpc/ebpf/jit: Implement JIT compiler for extended BPF
PPC64 eBPF JIT compiler.

Enable with:
  echo 1 > /proc/sys/net/core/bpf_jit_enable
or
  echo 2 > /proc/sys/net/core/bpf_jit_enable

... to see the generated JIT code. This can further be processed with
tools/net/bpf_jit_disasm.

With CONFIG_TEST_BPF=m and 'modprobe test_bpf':

 test_bpf: Summary: 305 PASSED, 0 FAILED, [297/297 JIT'ed]

... on both ppc64 BE and LE.

The details of the approach are documented through various comments in
the code.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-24 15:17:57 +10:00
Naveen N. Rao
6ac0ba5a4f powerpc/bpf/jit: Isolate classic BPF JIT specifics into a separate header
Break out classic BPF JIT specifics into a separate header in
preparation for eBPF JIT implementation. Note that ppc32 will still need
the classic BPF JIT.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-24 15:15:51 +10:00
Naveen N. Rao
cef1e8cdcd powerpc/bpf/jit: A few cleanups
1. Per the ISA, ADDIS actually uses RT, rather than RS. Though
   the result is the same, make the usage clear.
2. The multiply instruction used is a 32-bit multiply. Rename PPC_MUL()
   to PPC_MULW() to make the same clear.
3. PPC_STW[U] take the entire 16-bit immediate value and do not require
   word-alignment, per the ISA. Change the macros to use IMM_L().
4. A few white-space cleanups to satisfy checkpatch.pl.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-24 15:15:37 +10:00
Naveen N. Rao
277285b854 powerpc/bpf/jit: Introduce rotate immediate instructions
Since we will be using the rotate immediate instructions for extended
BPF JIT, let's introduce macros for the same. And since the shift
immediate operations use the rotate immediate instructions, let's redo
those macros to use the newly introduced instructions.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-24 15:15:14 +10:00
Naveen N. Rao
b1a057879a powerpc/bpf/jit: Optimize 64-bit Immediate loads
Similar to the LI32() optimization, if the value can be represented
in 32-bits, use LI32(). Also handle loading a few specific forms of
immediate values in an optimum manner.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-24 15:15:04 +10:00
Naveen N. Rao
aaf2f7e099 powerpc/bpf/jit: Fix/enhance 32-bit Load Immediate implementation
The existing LI32() macro can sometimes result in a sign-extended 32-bit
load that does not clear the top 32-bits properly. As an example,
loading 0x7fffffff results in the register containing
0xffffffff7fffffff. While this does not impact classic BPF JIT
implementation (since that only uses the lower word for all operations),
we would like to share this macro between classic BPF JIT and extended
BPF JIT, wherein the entire 64-bit value in the register matters. Fix
this by first doing a shifted LI followed by ORI.

An additional optimization is with loading values between -32768 to -1,
where we now only need a single LI.

The new implementation now generates the same or less number of
instructions.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-24 15:14:45 +10:00
Shreyas B. Prabhu
5593e30327 powerpc/powernv: set power_save func after the idle states are initialized
pnv_init_idle_states() discovers supported idle states from the
device tree and does the required initialization. Set power_save
function pointer only after this initialization is done

Otherwise on machines which don't support nap, eg. Power9, the kernel
will crash when it tries to nap.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-23 10:46:59 +10:00
Naveen N. Rao
844e3be476 powerpc/bpf/jit: Disable classic BPF JIT on ppc64le
Classic BPF JIT was never ported completely to work on little endian
powerpc. However, it can be enabled and will crash the system when used.
As such, disable use of BPF JIT on ppc64le.

Fixes: 7c105b63bd ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.")
Reported-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-23 10:35:31 +10:00
Michael Ellerman
6e914ee629 powerpc: Fix faults caused by radix patching of SLB miss handler
As part of the Radix MMU support we added some feature sections in the
SLB miss handler. These are intended to catch the case that we
incorrectly take an SLB miss when Radix is enabled, and instead of
crashing weirdly they bail out to a well defined exit path and trigger
an oops.

However the way they were written meant the bailout case was enabled by
default until we did CPU feature patching.

On powermacs the early debug prints in setup_system() can cause an SLB
miss, which happens before code patching, and so the SLB miss handler
would incorrectly bailout and crash during boot.

Fix it by inverting the sense of the feature section, so that the code
which is in place at boot is correct for the hash case. Once we
determine we are using Radix - which will never happen on a powermac -
only then do we patch in the bailout case which unconditionally jumps.

Fixes: caca285e5a ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code")
Reported-by: Denis Kirjanov <kda@linux-powerpc.org>
Tested-by: Denis Kirjanov <kda@linux-powerpc.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-23 09:58:17 +10:00
Gavin Shan
9497a1c1c5 powerpc/powernv: Print correct PHB type names
We're initializing "IODA1" and "IODA2" PHBs though they are IODA2
and NPU PHBs as below kernel log indicates.

   Initializing IODA1 OPAL PHB /pciex@3fffe40700000
   Initializing IODA2 OPAL PHB /pciex@3fff000400000

This fixes the PHB names. After it's applied, we get:

   Initializing IODA2 PHB (/pciex@3fffe40700000)
   Initializing NPU PHB (/pciex@3fff000400000)

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:59 +10:00
Gavin Shan
ea0d856cb2 powerpc/powernv: Functions to get/set PCI slot state
This exports 4 functions, which base on the corresponding OPAL
APIs to get/set PCI slot status. Those functions are going to
be used by PowerNV PCI hotplug driver:

   pnv_pci_get_device_tree()    opal_get_device_tree()
   pnv_pci_get_presence_state() opal_pci_get_presence_state()
   pnv_pci_get_power_state()    opal_pci_get_power_state()
   pnv_pci_set_power_state()    opal_pci_set_power_state()

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:58 +10:00
Gavin Shan
7e19bf32c8 powerpc/powernv: Introduce pnv_pci_get_slot_id()
This introduces pnv_pci_get_slot_id() to get the hotpluggable PCI
slot ID from the corresponding device node. It will be used by
hotplug driver.

Requested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:58 +10:00
Gavin Shan
9c0e1ecbe1 powerpc/powernv: Use PCI slot reset infrastructure
The (OPAL) firmware might provide the PCI slot reset capability
which is identified by property "ibm,reset-by-firmware" on the
PCI slot associated device node.

This routes the reset request to firmware if "ibm,reset-by-firmware"
exists in the PCI slot device node. Otherwise, the reset is done
inside kernel as before.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:57 +10:00
Gavin Shan
ebe2253127 powerpc/powernv: Support PCI slot ID
The reset and poll functionality from (OPAL) firmware supports
PHB and PCI slot at same time. They are identified by ID. This
supports PCI slot ID by:

   * Rename the argument name for opal_pci_reset() and opal_pci_poll()
     accordingly
   * Rename pnv_eeh_phb_poll() to pnv_eeh_poll() and adjust its argument
     name.
   * One macro is added to produce PCI slot ID.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:57 +10:00
Gavin Shan
8cc7581cdb powerpc/pci: Delay populating pdn
The pdn (struct pci_dn) instances are allocated from memblock or
bootmem when creating PCI controller (hoses) in setup_arch(). PCI
hotplug, which will be supported by proceeding patches, releases
PCI device nodes and their corresponding pdn on unplugging event.
The memory chunks for pdn instances allocated from memblock or
bootmem are hard to reused after being released.

This delays creating pdn by pci_devs_phb_init() from setup_arch()
to core_initcall() so that they are allocated from slab. The memory
consumed by pdn can be released to system without problem during
PCI unplugging time. It indicates that pci_dn is unavailable in
setup_arch() and the the fixup on pdn (like AGP's) can't be carried
out that time. We have to do that in pcibios_root_bridge_prepare()
on maple/pasemi/powermac platforms where/when the pdn is available.
pcibios_root_bridge_prepare is called from subsys_initcall() which
is executed after core_initcall() so the code flow does not change.

At the mean while, the EEH device is created when pdn is populated,
meaning pdn and EEH device have same life cycle. In turn, we needn't
call eeh_dev_init() to create EEH device explicitly.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:56 +10:00
Gavin Shan
7415c14c56 powerpc/pci: Update bridge windows on PCI plug
On the PCI plugging event, PCI slot's subordinate devices are
scanned and their (IO and MMIO) resources are assigned. Platform
dependent resources (PE#, IO/MMIO/DMA windows) are allocated or
created on updating windows of the slot's upstream bridge.

This updates the windows of the hot plugged slot's upstream bridge
in pcibios_finish_adding_to_bus() so that the platform resources
(PE#, IO/MMIO/DMA segments) are allocated or created accordingly.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:56 +10:00
Gavin Shan
c5f7700bbd powerpc/powernv: Dynamically release PE
This supports releasing PEs dynamically. A reference count is
introduced to PE representing number of PCI devices associated
with the PE. The reference count is increased when PCI device
joins the PE and decreased when PCI device leaves the PE in
pnv_pci_release_device(). When the count becomes zero, the PE
and its consumed resources are released. Note that the count
is accessed concurrently. So a counter with "int" type is enough
here.

In order to release the sources consumed by the PE, couple of
helper functions are introduced as below:

   * pnv_pci_ioda1_unset_window() - Unset IODA1 DMA32 window
   * pnv_pci_ioda1_release_dma_pe() - Release IODA1 DMA32 segments
   * pnv_pci_ioda2_release_dma_pe() - Release IODA2 DMA resource
   * pnv_ioda_release_pe_seg() - Unmap IO/M32/M64 segments

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:55 +10:00
Gavin Shan
93e01a5039 powerpc/powernv: Make pnv_ioda_deconfigure_pe() visible
pnv_ioda_deconfigure_pe() is visible only when CONFIG_PCI_IOV is
enabled. The function will be used to tear down PE's associated
mapping in PCI hotplug path that doesn't depend on CONFIG_PCI_IOV.

This makes pnv_ioda_deconfigure_pe() visible and not depend on
CONFIG_PCI_IOV.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:55 +10:00
Gavin Shan
40e2a47e62 powerpc/powernv: Extend PCI bridge resources
The PCI slots are associated with root port or downstream ports
of the PCIe switch connected to root port. When adapter is hot
added to the PCI slot, it usually requests more IO or memory
resource from the directly connected parent bridge (port) and
update the bridge's windows accordingly. The resource windows
of upstream bridges can't be updated automatically. It possibly
leads to unbalanced resource across the bridges: The window of
downstream bridge is overruning that of upstream bridge. The
IO or MMIO path won't work.

This resolves the above issue by extending bridge windows of
root port and upstream port of the PCIe switch connected to
the root port to PHB's windows.

The windows of root port and bridge behind that are extended to
the PHB's windows to accomodate the PCI hotplug happening in
future. The PHB's 64KB 32-bits MSI region is included in bridge's
M32 windows (in hardware) though it's excluded in the corresponding
resource, as the bridge's M32 windows have 1MB as their minimal
alignment. We observed EEH error during system boot when the MSI
region is included in bridge's M32 window.

This excludes top 1MB (including 64KB 32-bits MSI region) region
from bridge's M32 windows when extending them.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:55 +10:00
Gavin Shan
63803c39c8 powerpc/powernv: Setup PE for root bus
There is no parent bridge for root bus, meaning pcibios_setup_bridge()
isn't invoked for root bus. The PE for root bus is the ancestor of
other PEs in PELTV. It means we need PE for root bus populated before
all others.

This populates the PE for root bus in pcibios_setup_bridge() path
if it's not populated yet. The PE number next to the reserved one
is used as the PE# to avoid holes in continuous M64 space.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:54 +10:00
Gavin Shan
ccd1c1911a powerpc/powernv: Create PEs in pcibios_setup_bridge()
Currently, the PEs and their associated resources are assigned in
ppc_md.pcibios_fixup() except those used by SRIOV VFs. The function
is called for once after PCI probing and resources assignment is
completed. So it's obviously not hotplug friendly.

This creates PEs dynamically in pcibios_setup_bridge() that is
called for the event during system bootup and PCI hotplug: updating
PCI bridge's windows after resource assignment/reassignment are done.
In partial hotplug case, not all PCI devices included to one particular
PE are unplugged and plugged again, we just need unbinding/binding the
hot added PCI devices with the corresponding PE without creating new
one. The change is applied to IODA1 and IODA2 PHBs only. The behaviour
on NPU PHBs aren't changed. There are no PCI bridges on NPU PHBs,
meaning pcibios_setup_bridge() won't be invoked there. We have to use
old path (pnv_pci_ioda_fixup()) to setup PEs on NPU PHBs.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:54 +10:00
Gavin Shan
9fcd6f4a2b powerpc/powernv: Allocate PE# in reverse order
PE number for one particular PE can be allocated dynamically or
reserved according to the consumed M64 (64-bits prefetchable)
segments of the PE. The M64 segment can't be remapped to arbitrary
PE, meaning the PE number is determined according to the index
of the consumed M64 segment. As below figure shows, M64 resource
grows from low to high end, meaning the PE (number) reserved
according to M64 segment grows from low to high end as well,
so does the dynamically allocated PE number. It will lead to
conflict: PE number (M64 segment) reserved by dynamic allocation
is required by hot added PCI adapter at later point. It fails
the PCI hotplug because of the PE number can't be reserved
based on the index of the consumed M64 segment.

  +---+---+---+---+---+--------------------------------+-----+
  | 0 | 1 | 2 | 3 | 4 |      .......                   | 255 |
  +---+---+---+---+---+--------------------------------+-----+

  PE number for dynamic allocation          ----------------->
  PE number reserved for M64 segment        ----------------->

To resolve above conflicts, this forces the PE number to be
allocated dynamically in reverse order. With this patch applied,
the PE numbers are reserved in ascending order, but allocated
dynamically in reverse order.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:53 +10:00
Gavin Shan
c127562ae1 powerpc/powernv: Increase PE# capacity
Each PHB maintains an array helping to translate 2-bytes Request
ID (RID) to PE# with the assumption that PE# takes one byte, meaning
that we can't have more than 256 PEs. However, pci_dn->pe_number
already had 4-bytes for the PE#.

This extends the PE# capacity for every PHB. After that, the PE number
is represented by 4-bytes value. Then we can reuse IODA_INVALID_PE to
check the PE# in phb->pe_rmap[] is valid or not.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:53 +10:00
Gavin Shan
577c8c8868 powerpc/powernv: Move pnv_pci_ioda_setup_opal_tce_kill() around
pnv_pci_ioda_setup_opal_tce_kill() called by pnv_ioda_setup_dma()
to remap the TCE kill regiter. What's done in pnv_ioda_setup_dma()
will be covered in pcibios_setup_bridge() which is invoked on each
PCI bridge. It means we will possibly remap the TCE kill register
for multiple times and it's unnecessary.

This moves pnv_pci_ioda_setup_opal_tce_kill() to where the PHB is
initialized (pnv_pci_init_ioda_phb()) to avoid above issue.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:53 +10:00
Gavin Shan
e368e4ca9c powerpc/powernv: Remove PCI_RESET_DELAY_US
The macro defined in arch/powerpc/platforms/powernv/pci.c isn't
used by anyone. Just remove it.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:52 +10:00
Gavin Shan
c5fcb29a64 powerpc/pci: Override pcibios_setup_bridge()
This overrides pcibios_setup_bridge() that is called to update PCI
bridge windows when PCI resource assignment is completed, to assign
PE and setup various (resource) mapping for the PE in subsequent
patches.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:52 +10:00
Mauricio Faria de Oliveira
f8ab481066 powerpc: export cpu_to_core_id()
Export cpu_to_core_id(). This will be used by the lpfc driver.

This enables topology_core_id() from <linux/topology.h> (defined
to cpu_to_core_id() in arch/powerpc/include/asm/topology.h) to be
used by (non-builtin) modules.

That is arch-neutral, already used by eg, drivers/base/topology.c,
but it is builtin (obj-y in Makefile) thus didn't need the export.

Since the module uses topology_core_id() and this is defined to
cpu_to_core_id(), it needs the export, otherwise:

    ERROR: "cpu_to_core_id" [drivers/scsi/lpfc/lpfc.ko] undefined!

Tested on next-20160601.

Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:51 +10:00
Jack Miller
bd3ea317fd powerpc: Load Monitor Register Support
This enables new registers, LMRR and LMSER, that can trigger an EBB in
userspace code when a monitored load (via the new ldmx instruction)
loads memory from a monitored space. This facility is controlled by a
new FSCR bit, LM.

This patch disables the FSCR LM control bit on task init and enables
that bit when a load monitor facility unavailable exception is taken
for using it. On context switch, this bit is then used to determine
whether the two relevant registers are saved and restored. This is
done lazily for performance reasons.

Signed-off-by: Jack Miller <jack@codezen.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:50 +10:00
Michael Neuling
b57bd2de8c powerpc: Improve FSCR init and context switching
This fixes a few issues with FSCR init and switching.

In commit 152d523e63 ("powerpc: Create context switch helpers
save_sprs() and restore_sprs()") we moved the setting of the FSCR
register from inside an CPU_FTR_ARCH_207S section to inside just a
CPU_FTR_ARCH_DSCR section. Hence we are setting FSCR on POWER6/7 where
the FSCR doesn't exist. This is harmless but we shouldn't do it.

Also, we can simplify the FSCR context switch. We don't need to go
through the calculation involving dscr_inherit. We can just restore
what we saved last time.

We also set an initial value in INIT_THREAD, so that pid 1 which is
cloned from that gets a sane value.

Based on patch by Jack Miller.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:50 +10:00
Madhavan Srinivasan
103b7827d9 powerpc: Fix misleading comment in early_setup_secondary()
Current comment in the early_setup_secondary() for paca->soft_enabled
update is misleading. Comment should say to Mark interrupts "disabled"
instead of "enabled". Fix the typo.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:49 +10:00
Thiago Jung Bauermann
61ed9cfb1b powerpc/kprobes: Remove kretprobe_trampoline_holder.
Fixes the following testsuite failure:

  $ sudo ./perf test -v kallsyms
   1: vmlinux symtab matches kallsyms                          :
  --- start ---
  test child forked, pid 12489
  Using /proc/kcore for kernel object code
  Looking at the vmlinux_path (8 entries long)
  Using /boot/vmlinux for symbols
  0xc00000000003d300: diff name v: .kretprobe_trampoline_holder k: kretprobe_trampoline
  Maps only in vmlinux:
   c00000000086ca38-c000000000879b6c 87ca38 [kernel].text.unlikely
   c000000000879b6c-c000000000bf0000 889b6c [kernel].meminit.text
   c000000000bf0000-c000000000c53264 c00000 [kernel].init.text
   c000000000c53264-d000000004250000 c63264 [kernel].exit.text
   d000000004250000-d000000004450000 0 [libcrc32c]
   d000000004450000-d000000004620000 0 [xfs]
   d000000004620000-d000000004680000 0 [autofs4]
   d000000004680000-d0000000046e0000 0 [x_tables]
   d0000000046e0000-d000000004780000 0 [ip_tables]
   d000000004780000-d0000000047e0000 0 [rng_core]
   d0000000047e0000-ffffffffffffffff 0 [pseries_rng]
  Maps in vmlinux with a different name in kallsyms:
  Maps only in kallsyms:
   d000000000000000-f000000000000000 1000000000010000 [kernel.kallsyms]
   f000000000000000-ffffffffffffffff 3000000000010000 [kernel.kallsyms]
  test child finished with -1
  ---- end ----
  vmlinux symtab matches kallsyms: FAILED!

The problem is that the kretprobe_trampoline symbol looks like this:

  $ eu-readelf -s /boot/vmlinux G kretprobe_trampoline
   2431: c000000001302368     24 NOTYPE  LOCAL  DEFAULT       37 kretprobe_trampoline_holder
   2432: c00000000003d300      8 FUNC    LOCAL  DEFAULT        1 .kretprobe_trampoline_holder
  97543: c00000000003d300      0 NOTYPE  GLOBAL DEFAULT        1 kretprobe_trampoline

Its type is NOTYPE, and its size is 0, and this is a problem because
symbol-elf.c:dso__load_sym skips function symbols that are not STT_FUNC
or STT_GNU_IFUNC (this is determined by elf_sym__is_function). Even
if the type is changed to STT_FUNC, when dso__load_sym calls
symbols__fixup_duplicate, the kretprobe_trampoline symbol is dropped in
favour of .kretprobe_trampoline_holder because the latter has non-zero
size (as determined by choose_best_symbol).

With this patch, all vmlinux symbols match /proc/kallsyms and the
testcase passes.

Commit c1c355ce14 ("x86/kprobes: Get rid of
kretprobe_trampoline_holder()") gets rid of kretprobe_trampoline_holder
altogether on x86. This commit does the same on powerpc. This change
introduces no regressions on the perf and ftracetest testsuite results.

Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21 15:30:49 +10:00
Mahesh Salgaonkar
fd7bacbca4 KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interrupt
When a guest is assigned to a core it converts the host Timebase (TB)
into guest TB by adding guest timebase offset before entering into
guest. During guest exit it restores the guest TB to host TB. This means
under certain conditions (Guest migration) host TB and guest TB can differ.

When we get an HMI for TB related issues the opal HMI handler would
try fixing errors and restore the correct host TB value. With no guest
running, we don't have any issues. But with guest running on the core
we run into TB corruption issues.

If we get an HMI while in the guest, the current HMI handler invokes opal
hmi handler before forcing guest to exit. The guest exit path subtracts
the guest TB offset from the current TB value which may have already
been restored with host value by opal hmi handler. This leads to incorrect
host and guest TB values.

With split-core, things become more complex. With split-core, TB also gets
split and each subcore gets its own TB register. When a hmi handler fixes
a TB error and restores the TB value, it affects all the TB values of
sibling subcores on the same core. On TB errors all the thread in the core
gets HMI. With existing code, the individual threads call opal hmi handle
independently which can easily throw TB out of sync if we have guest
running on subcores. Hence we will need to co-ordinate with all the
threads before making opal hmi handler call followed by TB resync.

This patch introduces a sibling subcore state structure (shared by all
threads in the core) in paca which holds information about whether sibling
subcores are in Guest mode or host mode. An array in_guest[] of size
MAX_SUBCORE_PER_CORE=4 is used to maintain the state of each subcore.
The subcore id is used as index into in_guest[] array. Only primary
thread entering/exiting the guest is responsible to set/unset its
designated array element.

On TB error, we get HMI interrupt on every thread on the core. Upon HMI,
this patch will now force guest to vacate the core/subcore. Primary
thread from each subcore will then turn off its respective bit
from the above bitmap during the guest exit path just after the
guest->host partition switch is complete.

All other threads that have just exited the guest OR were already in host
will wait until all other subcores clears their respective bit.
Once all the subcores turn off their respective bit, all threads will
will make call to opal hmi handler.

It is not necessary that opal hmi handler would resync the TB value for
every HMI interrupts. It would do so only for the HMI caused due to
TB errors. For rest, it would not touch TB value. Hence to make things
simpler, primary thread would call TB resync explicitly once for each
core immediately after opal hmi handler instead of subtracting guest
offset from TB. TB resync call will restore the TB with host value.
Thus we can be sure about the TB state.

One of the primary threads exiting the guest will take up the
responsibility of calling TB resync. It will use one of the top bits
(bit 63) from subcore state flags bitmap to make the decision. The first
primary thread (among the subcores) that is able to set the bit will
have to call the TB resync. Rest all other threads will wait until TB
resync is complete.  Once TB resync is complete all threads will then
proceed.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-06-20 14:11:25 +10:00
Mahesh Salgaonkar
6dd06d15a8 powerpc/powernv: Remove the usage of PACAR1 from opal wrappers
OPAL_CALL wrapper code sticks the r1 (stack pointer) into PACAR1 purely
for debugging purpose only. The power7_wakeup* functions relies on stack
pointer saved in PACAR1. Any opal call made using opal wrapper (directly
or in-directly) before we fall through power7_wakeup*, then it ends up
replacing r1 in PACAR1(r13) leading to kernel panic. So far we don't see
any issues because we have never made any opal calls using OPAL wrapper
before power7_wakeup*. But the subsequent HMI patch would need to invoke
C calls during cpu wakeup/idle path that in-directly makes opal call using
opal wrapper. This patch facilitates the subsequent HMI patch by removing
usage of PACAR1 from opal call wrapper.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-06-20 14:11:25 +10:00
Thomas Huth
b69890d18f KVM: PPC: Book3S PR: Fix contents of SRR1 when injecting a program exception
vcpu->arch.shadow_srr1 only contains usable values for injecting
a program exception into the guest if we entered the function
kvmppc_handle_exit_pr() with exit_nr == BOOK3S_INTERRUPT_PROGRAM.
In other cases, the shadow_srr1 bits are zero. Since we want to
pass an illegal-instruction program check to the guest, set
"flags" to SRR1_PROGILL for these other cases.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-06-20 14:11:25 +10:00
Thomas Huth
708e75a3ee KVM: PPC: Book3S PR: Fix illegal opcode emulation
If kvmppc_handle_exit_pr() calls kvmppc_emulate_instruction() to emulate
one instruction (in the BOOK3S_INTERRUPT_H_EMUL_ASSIST case), it calls
kvmppc_core_queue_program() afterwards if kvmppc_emulate_instruction()
returned EMULATE_FAIL, so the guest gets an program interrupt for the
illegal opcode.
However, the kvmppc_emulate_instruction() also tried to inject a
program exception for this already, so the program interrupt gets
injected twice and the return address in srr0 gets destroyed.
All other callers of kvmppc_emulate_instruction() are also injecting
a program interrupt, and since the callers have the right knowledge
about the srr1 flags that should be used, it is the function
kvmppc_emulate_instruction() that should _not_ inject program
interrupts, so remove the kvmppc_core_queue_program() here.

This fixes the issue discovered by Laurent Vivier with kvm-unit-tests
where the logs are filled with these messages when the test tries
to execute an illegal instruction:

     Couldn't emulate instruction 0x00000000 (op 0 xop 0)
     kvmppc_handle_exit_pr: emulation at 700 failed (00000000)

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-06-20 14:11:25 +10:00
Bjorn Helgaas
3830135857 powerpc/pci: Implement pci_resource_to_user() with pcibios_resource_to_bus()
"User" addresses are shown in /sys/devices/pci.../.../resource and
/proc/bus/pci/devices and used as mmap offsets for /proc/bus/pci/BB/DD.F
files.  For I/O port resources on powerpc, these are PCI bus addresses,
i.e., raw BAR values.

Previously pci_resource_to_user() computed the user address by subtracting
"hose->io_base_virt - _IO_BASE" from the resource start:

  pci_resource_to_user()
    if (IO)
      offset = (unsigned long)hose->io_base_virt - _IO_BASE;
    *start = rsrc->start - offset;

We've already told the PCI core about that "hose->io_base_virt - _IO_BASE"
offset:

  pcibios_setup_phb_resources()
    res = &hose->io_resource;
    offset = pcibios_io_space_offset();
    /* i.e., "offset = hose->io_base_virt - _IO_BASE" */
    pci_add_resource_offset(resources, res, offset);

so pcibios_resource_to_bus() knows how to do that translation.

No functional change intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
2016-06-17 14:44:44 -05:00
Bjorn Helgaas
8221a01352 PCI: Unify pci_resource_to_user() declarations
Replace the pci_resource_to_user() declarations in each arch that defines
HAVE_ARCH_PCI_RESOURCE_TO_USER with a single one in linux/pci.h.

Change the MIPS static inline implementation to a non-inline version so the
static inline doesn't conflict with the new non-static linux/pci.h
declaration.

No functional change intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-06-17 14:43:34 -05:00
Yinghai Lu
1e70cdd6e6 powerpc/pci: Remove __pci_mmap_set_pgprot()
The powerpc-specific __pci_mmap_set_pgprot() does two things:

  1) Disables write combining for I/O port space mappings

     This only affects procfs mappings.  The pci_mmap_resource() sysfs path
     only requests write combining for resources with IORESOURCE_PREFETCH
     set, which doesn't include I/O resources.

     The only way to request write combining for I/O port space mappings
     was via the PCIIOC_WRITE_COMBINE ioctl and the proc_bus_pci_mmap()
     path, and we recently changed that path to ignore write combining for
     I/O, so this code in powerpc is no longer needed.

  2) Automatically enables write combining for mappings of prefetchable
     resources, even if not requested by the user

     Both procfs (via PCIIOC_MMAP_IS_MEM and PCIIOC_WRITE_COMBINE ioctls)
     and sysfs (via "resourceN_wc" files, which are created for resources
     with IORESOURCE_PREFETCH) provide ways for the user to map PCI memory
     space with write combining.

     Users that desire write combining should use one of those ways instead
     of relying on powerpc-specific behavior.

Remove the powerpc-specific __pci_mmap_set_pgprot().

The user-visible effect of this change is that powerpc users mapping
prefetchable PCI memory space via procfs without PCIIOC_WRITE_COMBINE or
via sysfs "resourceN" (not "resourceN_wc") will get regular uncacheable
mappings instead of the write combining mappings they used to get.

The new behavior matches the behavior on all other arches that support
write combining mapping.

[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-06-17 14:43:33 -05:00
Gavin Shan
a3aa256b72 powerpc/eeh: Fix invalid cached PE primary bus
The PE primary bus cannot be got from its child devices when having
full hotplug in error recovery. The PE primary bus is cached, which
is done in commit <05ba75f84864> ("powerpc/eeh: Fix stale cached primary
bus"). In eeh_reset_device(), the flag (EEH_PE_PRI_BUS) is cleared
before the PCI hot remove. eeh_pe_bus_get() then returns NULL as the
PE primary bus in pnv_eeh_reset() and it crashes the kernel eventually.

This fixes the issue by clearing the flag (EEH_PE_PRI_BUS) before the
PCI hot add. With it, the PowerNV EEH reset backend (pnv_eeh_reset())
can get valid PE primary bus through eeh_pe_bus_get().

Fixes: 67086e32b5 ("powerpc/eeh: powerpc/eeh: Support error recovery for VF PE")
Reported-by: Pridhiviraj Paidipeddi <ppaiddipe@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-17 19:51:47 +10:00
Aneesh Kumar K.V
b23d9c5b9c powerpc/mm/radix: Update Radix tree size as per ISA 3.0
ISA 3.0 updated it to be encoded as Radix tree size = 2^(RTS + 31). We
have it encoded as 2^(RTS + 28). Add a helper with the correct encoding
and use it instead of opencoding.

Fixes: 2bfd65e45e ("powerpc/mm/radix: Add radix callbacks for early init routines")
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-17 19:50:55 +10:00
Aneesh Kumar K.V
e568006b9d powerpc/mm/hash: Don't add memory coherence if cache inhibited is set
H_ENTER hcall handling in qemu had assumptions that a cache inhibited
hpte entry won't have memory conference set. Also older kernel
mentioned that some version of pHyp required this (the code removed
by the below commit says:

    /* Make pHyp happy */
    if ((rflags & _PAGE_NO_CACHE) && !(rflags & _PAGE_WRITETHRU))
            hpte_r &= ~HPTE_R_M;

But with older kernel we had some inconsistent memory conherence
mapping. We always enabled memory conherence in the page fault path and
removed memory conherence is _PAGE_NO_CACHE was set when we mapped the
page via htab_bolt_mapping. The commit mentioned below tried to
consolidate that by always enabling memory conherence. But as mentioned
above that breaks Qemu H_ENTER handling.

This patch update this such that we enable memory conherence only if
cache inhibited is not set and bring fault handling, lpar and bolt
mapping in sync.

Fixes: commit 30bda41aba4e("powerpc/mm: Drop WIMG in favour of new constant")
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-17 19:47:51 +10:00
Russell Currey
fb36e90736 powerpc/pci: Fix SRIOV not building without EEH enabled
On Book3E CPUs (and possibly other configs), it is possible to have SRIOV
(CONFIG_PCI_IOV) set without CONFIG_EEH.  The SRIOV code does not check
for this, and if EEH is disabled, pci_dn.c fails to build.

Fix this by gating the EEH-specific code in the SRIOV implementation
behind CONFIG_EEH.

Fixes: 39218cd0 ("powerpc/eeh: EEH device for VF")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-17 16:52:22 +10:00