Pull PCI fixes from Bjorn Helgaas:
"These are fixes for:
- a resource management problem that causes a Radeon "Fatal error
during GPU init" on machines where the BIOS programmed an invalid
Root Port window. This was a regression in v3.16.
- an Atheros AR93xx device that doesn't handle PCI bus resets
correctly. This was a regression in v3.14.
- an out-of-date email address"
* tag 'pci-v3.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
MAINTAINERS: Update Richard Zhu's email address
sparc/PCI: Clip bridge windows to fit in upstream windows
powerpc/PCI: Clip bridge windows to fit in upstream windows
parisc/PCI: Clip bridge windows to fit in upstream windows
mn10300/PCI: Clip bridge windows to fit in upstream windows
microblaze/PCI: Clip bridge windows to fit in upstream windows
ia64/PCI: Clip bridge windows to fit in upstream windows
frv/PCI: Clip bridge windows to fit in upstream windows
alpha/PCI: Clip bridge windows to fit in upstream windows
x86/PCI: Clip bridge windows to fit in upstream windows
PCI: Add pci_claim_bridge_resource() to clip window if necessary
PCI: Add pci_bus_clip_resource() to clip to fit upstream window
PCI: Pass bridge device, not bus, when updating bridge windows
PCI: Mark Atheros AR93xx to avoid bus reset
PCI: Add flag for devices where we can't use bus reset
The return value of kvm_arch_vcpu_postcreate is not checked in its
caller. This is okay, because only x86 provides vcpu_postcreate right
now and it could only fail if vcpu_load failed. But that is not
possible during KVM_CREATE_VCPU (kvm_arch_vcpu_load is void, too), so
just get rid of the unchecked return value.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.
Optimise the loop in a few ways:
- Unroll the byte at a time loop
- For large (at least 32 byte) comparisons that are also 8 byte
aligned, use an unrolled modulo scheduled loop using 8 byte
loads. This is similar to our glibc memcmp.
A simple microbenchmark testing 10000000 iterations of an 8192 byte
memcmp was used to measure the performance:
baseline: 29.93 s
modified: 1.70 s
Just over 17x faster.
v2: Incorporated some suggestions from Segher:
- Use andi. instead of rdlicl.
- Convert bdnzt eq, to bdnz. It's just duplicating the earlier compare
and was a relic from a previous version.
- Don't use cr5, we have plans to use that CR field for fast local
atomics.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The callback (ppc_md.pci_probe_mode()) is used to determine if the
child PCI devices of the indicated PCI bus should be probed from
device-tree or hardware. On PowerNV platform, we always expect
probing PCI devices from hardware, which is PowerPC PCI core's
default behaviour. Also, the callback had some delay implemented
based on PHB's device node property "reset-clear-timestamp", which
wasn't exported from skiboot. So we don't need this function and
it's safe to remove it.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When PE's frozen count hits maximal allowed frozen times, which is
5 currently, it will be forced to be offline permanently. Once the
PE is removed permanently, rebooting machine is required to bring
the PE back. It's not convienent when testing EEH functionality.
The patch exports the maximal allowed frozen times through debugfs
entry (/sys/kernel/debug/powerpc/eeh_max_freezes).
Requested-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The conditions that one specific PE's frozen count exceeds the maximal
allowed times (EEH_MAX_ALLOWED_FREEZES) and it's in isolated or recovery
state indicate the PE was removed permanently implicitly. The patch
introduces flag EEH_PE_REMOVED to indicate that explicitly so that we
don't depend on the fixed maximal allowed times, which can be varied as
we do in subsequent patch.
Flag EEH_PE_REMOVED is expected to be marked for the PE whose frozen
count exceeds the maximal allowed times, or just failed from recovery.
Requested-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
PE#0 should be regarded as valid for P7IOC, while it's invalid for
PHB3. The patch adds flag EEH_VALID_PE_ZERO to differentiate those
two cases. Without the patch, we possibly see frozen PE#0 state is
cleared without EEH recovery taken on P7IOC as following kernel logs
indicate:
[root@ltcfbl8eb ~]# dmesg
:
pci 0000:00 : [PE# 000] Secondary bus 0 associated with PE#0
pci 0000:01 : [PE# 001] Secondary bus 1 associated with PE#1
pci 0001:00 : [PE# 000] Secondary bus 0 associated with PE#0
pci 0001:01 : [PE# 001] Secondary bus 1 associated with PE#1
pci 0002:00 : [PE# 000] Secondary bus 0 associated with PE#0
pci 0002:01 : [PE# 001] Secondary bus 1 associated with PE#1
pci 0003:00 : [PE# 000] Secondary bus 0 associated with PE#0
pci 0003:01 : [PE# 001] Secondary bus 1 associated with PE#1
pci 0003:20 : [PE# 002] Secondary bus 32..63 associated with PE#2
:
EEH: Clear non-existing PHB#3-PE#0
EEH: PHB location: U78AE.001.WZS00M9-P1-002
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When calling to early_setup(), we pick "boot_paca" up for the master CPU
and initialize that with initialise_paca(). At that point, the SLB
shadow buffer isn't populated yet. Updating the SLB shadow buffer should
corrupt what we had in physical address 0 where the trap instruction is
usually stored.
This hasn't been observed to cause any trouble in practice, but is
obviously fishy.
Fixes: 6f4441ef70 ("powerpc: Dynamically allocate slb_shadow from memblock")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently, all non-dot symbols are being treated as function descriptors
in ABIv1. This is incorrect and is resulting in perf probe not working:
# perf probe do_fork
Added new event:
Failed to write event: Invalid argument
Error: Failed to add events.
# dmesg | tail -1
[192268.073063] Could not insert probe at _text+768432: -22
perf probe bases all kernel probes on _text and writes,
for example, "p:probe/do_fork _text+768432" to
/sys/kernel/debug/tracing/kprobe_events. In-kernel, _text is being
considered to be a function descriptor and is resulting in the above
error.
Fix this by changing how we lookup symbol addresses on ppc64. We first
check for the dot variant of a symbol and look at the non-dot variant
only if that fails. In this manner, we avoid having to look at the
function descriptor.
While at it, also separate out how this works on ABIv2 where
we don't have dot symbols, but need to use the local entry point.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Once upon a time, at least 9 years ago (< 2.6.12), _TIF_SYSCALL_T_OR_A
meant "TRACE or AUDIT". But these days it means TRACE or AUDIT or
SECCOMP or TRACEPOINT or NOHZ.
All of those are implemented via syscall_dotrace() so rename the flag to
that to try and clarify things.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We removed the last usage of CPU_FTR_IABR in commit 1ad7d70562
"powerpc/xmon: Enable HW instruction breakpoint on POWER8".
Mark it as free.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
pci_dn->phb is set to phb in update_dn_pci_info(), if succeed.
This patch removes the duplication of pci_dn->phb initialization.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When IOMMU bypass is enabled, a PCI device can read and write memory
that was not mapped by the driver without causing an EEH. That might
cause memory corruption, for example.
When we disable bypass, DMA reads and writes to addresses not mapped by
the IOMMU will cause an EEH, allowing us to debug such issues.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The current handling of EPOW_SHUTDOWN_ON_UPS event does not shutdown the
system after logging the message. All the events of EPOW_SYSTEM_SHUTDOWN
action code (EPOW_SHUTDOWN_ON_UPS is a part of it) must initiate system
shutdown as per the SPAPR spec. If the LPAR does not shutdown after
receiving this rtas based event, it will expose itself to a forced
abrupt shutdown initiated by the platform firmware. This patch fixes the
situation.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The M64 range information is missed in dmesg, which would be helpful in debug.
This patch prints the M64 range information in the same format as M32.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull module and param fixes from Rusty Russell:
"Surprising number of fixes this merge window :(
The first two are minor fallout from the param rework which went in
this merge window.
The next three are a series which fixes a longstanding (but never
previously reported and unlikely , so no CC stable) race between
kallsyms and freeing the init section.
Finally, a minor cleanup as our module refcount will now be -1 during
unload"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
module: make module_refcount() a signed integer.
module: fix race in kallsyms resolution during module load success.
module: remove mod arg from module_free, rename module_memfree().
module_arch_freeing_init(): new hook for archs before module->module_init freed.
param: fix uninitialized read with CONFIG_DEBUG_LOCK_ALLOC
param: initialize store function to NULL if not available.
Turning snoops on is the last step in CAPP recovery. Sapphire is expected to
have reinitialized the PHB and done the previous recovery steps.
Add mode argument to opal call to do this. Driver can turn snoops off although
it does not currently.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add calls to the ps3_mm_set_repository_highmem() routine when the ps3
r1 highmem region is either created or destroyed.
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add the new routine ps3_mm_set_repository_highmem() that saves highmem info to
the LV1 hypervisor registry so that the info will be available to second stage
OS's loaded by petitboot/kexec. FreeBSD and some Linux derivatives use
this feature.
Also, move the existing ps3_mm_get_repository_highmem() routine up in
the source file.
This implementation of ps3_mm_set_repository_highmem() assumes the repository
will have a single highmem region entry (at index 0).
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
To avoid the need for preprocessor conditionals in C source files add a set of
empty inline repository highmem write routines to platform.h that are used when
CONFIG_PS3_REPOSITORY_WRITE is not defined.
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
LPCR_PECE1 bit controls whether decrementer interrupts are allowed to
cause exit from power-saving mode. While waking up from winkle, restoring
LPCR with LPCR_PECE1 set (i.e Decrementer interrupts allowed) can cause
issue in the following scenario:
- All the threads in a core are offlined. The core enters deep winkle.
- Spurious interrupt wakes up a thread in the core. Here LPCR is restored
with LPCR_PECE1 bit set.
- Since it was a spurious interrupt on a offline thread, the thread clears
the interrupt and goes back to winkle.
- Here before the thread executes winkle and puts the core into deep winkle,
if a decrementer interrupt occurs on any of the sibling threads in the core
that thread wakes up.
- Since in offline loop we are flushing interrupt only in case of external
interrupt, the decrementer interrupt does not get flushed. So at this stage
the thread is stuck in this is loop of waking up at 0x100 due to decrementer
interrupt, not flushing the interrupt as only external interrupts get flushed,
entering winkle, waking up at 0x100 again.
Fix this by programming PORE to restore LPCR with LPCR_PECE1 bit
cleared when waking up from winkle.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull RCU updates from Paul E. McKenney:
- Documentation updates.
- Miscellaneous fixes.
- Preemptible-RCU fixes, including fixing an old bug in the
interaction of RCU priority boosting and CPU hotplug.
- SRCU updates.
- RCU CPU stall-warning updates.
- RCU torture-test updates.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Distros are enabling NUMA balancing (eg Ubuntu), so it would be good to
get some more test coverage with it.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Enable config options required by lxc and docker.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
KSM will only be used on areas marked for merging via madvise, and it
is showing nice improvements on KVM workloads, so enable it by
default.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We are starting to see ppc64 boxes with SATA AHCI adapters in it,
so enable it in our defconfigs.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This was enabled on the pseries defconfigs recently, but missed
the ppc64 one.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
It looks like it's ~4 years since we updated some of these, so do a bulk
update.
Verified that the before and after generated configs are exactly the
same.
Which begs the question why update them? The answer is that it can be
confusing when the stored defconfig drifts too far from the generated
result.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull crypto fix from Herbert Xu:
"This fixes a regression that arose from the change to add a crypto
prefix to module names which was done to prevent the loading of
arbitrary modules through the Crypto API.
In particular, a number of modules were missing the crypto prefix
which meant that they could no longer be autoloaded"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: add missing crypto module aliases
ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)
Change the ppc/hugetlbfs code to replace ACCESS_ONCE with READ_ONCE.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)
Change the ppc/kvm code to replace ACCESS_ONCE with READ_ONCE.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
The commit 3b8a3c0109 ("powerpc/pseries: Fix endiannes issue in RTAS
call from xmon") was fixing an endianness issue in the call made from
xmon to RTAS.
However, as Michael Ellerman noticed, this fix was not complete, the
token value was not byte swapped. This lead to call an unexpected and
most of the time unexisting RTAS function, which is silently ignored by
RTAS.
This fix addresses this hole.
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: stable@vger.kernel.org
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In order to support accesses to larger chunks of memory, pass in a
'size' parameter (counted in bytes), and return the amount available at
that address.
Add a new helper function, bdev_direct_access(), to handle common
functionality including partition handling, checking the length requested
is positive, checking for the sector being page-aligned, and checking
the length of the request does not pass the end of the partition.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Commit 5d26a105b5 ("crypto: prefix module autoloading with "crypto-"")
changed the automatic module loading when requesting crypto algorithms
to prefix all module requests with "crypto-". This requires all crypto
modules to have a crypto specific module alias even if their file name
would otherwise match the requested crypto algorithm.
Even though commit 5d26a105b5 added those aliases for a vast amount of
modules, it was missing a few. Add the required MODULE_ALIAS_CRYPTO
annotations to those files to make them get loaded automatically, again.
This fixes, e.g., requesting 'ecb(blowfish-generic)', which used to work
with kernels v3.18 and below.
Also change MODULE_ALIAS() lines to MODULE_ALIAS_CRYPTO(). The former
won't work for crypto modules any more.
Fixes: 5d26a105b5 ("crypto: prefix module autoloading with "crypto-"")
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This driver provides UIO access to memory of a peripheral connected
to the Freescale enhanced local bus controller (eLBC) interface
using the general purpose chip-select mode (GPCM).
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit a3e5b356b3 "powerpc: Don't use local named register variable
in current_thread_info" Anton changed the way we did current_thread_info()
to accommodate LLVM, and it was not meant to have any effect elsewhere.
Unfortunately it has exposed a gcc bug, where r1 gets copied into
another register and then gcc uses that register to restore the toc
after a function call, even when that register is volatile and has been
clobbered by the function call.
We could revert Anton's patch, but it's not clear the original code is
safe either, we may just have been lucky.
The cleanest solution is to just use the existing CURRENT_THREAD_INFO()
asm macro, and call it using inline asm.
Segher points out we don't need volatile on the asm, if the result of
the shift is unused it's fine for the compiler to elide it.
Fixes: a3e5b356b3 ("powerpc: Don't use local named register variable in current_thread_info")
Reported-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Patch c49f63530b ("powernv: Add OPAL tracepoints") has a spurious
store to the stack:
ld r12,opal_tracepoint_refcount@toc(r2); \
std r12,32(r1); \
The store was originally used to save the current tracepoint status
so the entry and the exit tracepoints were always balanced. In the
end I just created a separate path when tracepoints are enabled.
The offset on the stack used for this store is not valid for ABIv2
and it causes strange issues. I noticed it because OPAL console input
was broken.
Fixes: c49f63530b ("powernv: Add OPAL tracepoints")
Cc: <stable@vger.kernel.org> # v3.17+
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The macros cc-version, cc-fullversion and ld-version take no argument.
It is not necessary to add $(call ...) to invoke them.
Signed-off-by: Masahiro Yamada <yamada.m@jp.panasonic.com>
Acked-by: Helge Deller <deller@gmx.de> [parisc]
Signed-off-by: Michal Marek <mmarek@suse.cz>
SRCU is not necessary to be compiled by default in all cases. For tinification
efforts not compiling SRCU unless necessary is desirable.
The current patch tries to make compiling SRCU optional by introducing a new
Kconfig option CONFIG_SRCU which is selected when any of the components making
use of SRCU are selected.
If we do not select CONFIG_SRCU, srcu.o will not be compiled at all.
text data bss dec hex filename
2007 0 0 2007 7d7 kernel/rcu/srcu.o
Size of arch/powerpc/boot/zImage changes from
text data bss dec hex filename
831552 64180 23944 919676 e087c arch/powerpc/boot/zImage : before
829504 64180 23952 917636 e0084 arch/powerpc/boot/zImage : after
so the savings are about ~2000 bytes.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]