This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Kim Phillips reported following build failure.
LD init/built-in.o
mm/built-in.o: In function `free_pages_prepare':
mm/page_alloc.c:770: undefined reference to `.kernel_map_pages'
mm/built-in.o: In function `prep_new_page':
mm/page_alloc.c:933: undefined reference to `.kernel_map_pages'
mm/built-in.o: In function `map_pages':
mm/compaction.c:61: undefined reference to `.kernel_map_pages'
make: *** [vmlinux] Error 1
Reason for this problem is that commit 031bc5743f
("mm/debug-pagealloc: make debug-pagealloc boottime configurable")
forgot to remove the old declaration of kernel_map_pages() for some
architectures. This patch removes them to fix build failure.
Reported-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Function __flush_tlb_page() must only be called for user contexts, so
put in extra hardening to warn on calling it for kernel context.
Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Register a notifier for a OPAL message indicating that the machine
should prepare itself for a graceful power off.
OPAL will tell us if the power off is a reboot or shutdown, but for now
we perform the same orderly_poweroff action.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Freescale updates from Scott:
"Highlights include 8xx optimizations, some more work on datapath device
tree content, e300 machine check support, t1040 corenet error reporting,
and various cleanups and fixes."
Currently a PAMU driver patch is very likely to receive some
checkpatch complaints about the code in the context of the
patch. This patch is an attempt to fix most of that and make
the driver more readable
Also fixed a subset of the sparse and coccinelle reported
issues.
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add the remaining gpci requests that contain counters suitable for use
by perf. Omit those that don't contain any counters (but note their
ommision).
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds (in req-gen/) a framework for defining gpci counter requests.
It uses macro magic similar to ftrace.
Also convert the existing hv-gpci request structures and enum values to
use the new framework (and adjust old users of the structs and enum
values to cope with changes in naming).
In exchange for this macro disaster, we get autogenerated event listing
for GPCI in sysfs, build time field offset checking, and zero
duplication of information about GPCI requests.
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Retrieves and parses the 24x7 catalog on POWER systems that supply it
(right now, only POWER 8). Events are exposed via sysfs in the standard
fashion, and are all parameterized.
$ cd /sys/bus/event_source/devices/hv_24x7/events
$ cat HPM_CS_FROM_L4_LDATA__PHYS_CORE
domain=0x2,offset=0xd58,core=?,lpar=0x0
$ cat HPM_TLBIE__VCPU_HOME_CHIP
domain=0x4,offset=0x358,vcpu=?,lpar=?
where user is required to specify values for the fields with '?' (like
core, vcpu, lpar above), when specifying the event with the perf tool.
Catalog is (at the moment) only parsed on boot. It needs re-parsing
when a some hypervisor events occur. At that point we'll also need to
prevent old events from continuing to function (counter that is passed
in via spare space in the config values?).
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Define a lite version of the EVENT_DEFINE_RANGE_FORMAT() that avoids
defining helper functions for the bit-field ranges.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
As commit 50ba08f3 ("of/fdt: Don't clear initial_boot_params
if fdt_check_header() fails") does, the device-tree pointer
"initial_boot_params" is initialized by early_init_dt_verify(),
which is called by early_init_devtree(). So we needn't explicitly
initialize that again in early_init_devtree().
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We have code to do syscall tracing which is disabled at compile time by
default. It's not been touched since the dawn of time (ie. v2.6.12).
There are now better ways to do syscall tracing, ie. using the
raw_syscall, or syscall tracepoints.
For the specific case of tracing syscalls at boot on a system that
doesn't get to userspace, you can boot with:
trace_event=syscalls tp_printk=on
Which will trace syscalls from boot, and echo all output to the console.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently when we back trace something that is in a syscall we see
something like this:
[c000000000000000] [c000000000000000] SyS_read+0x6c/0x110
[c000000000000000] [c000000000000000] syscall_exit+0x0/0x98
Although it's entirely correct, seeing syscall_exit at the bottom can be
confusing - we were exiting from a syscall and then called SyS_read() ?
If we instead change syscall_exit to be a local label we get something
more intuitive:
[c0000001fa46fde0] [c00000000026719c] SyS_read+0x6c/0x110
[c0000001fa46fe30] [c000000000009264] system_call+0x38/0xd0
ie. we were handling a system call, and it was SyS_read().
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When unbinding and rebinding the driver on a system with a card in PHB0, this
error condition is reached after a few attempts:
ERROR: Bad of_node_put() on /pciex@3fffe40000000
CPU: 0 PID: 3040 Comm: bash Not tainted 3.18.0-rc3-12545-g3627ffe #152
Call Trace:
[c000000721acb5c0] [c00000000086ef94] .dump_stack+0x84/0xb0 (unreliable)
[c000000721acb640] [c00000000073a0a8] .of_node_release+0xd8/0xe0
[c000000721acb6d0] [c00000000044bc44] .kobject_release+0x74/0xe0
[c000000721acb760] [c0000000007394fc] .of_node_put+0x1c/0x30
[c000000721acb7d0] [c000000000545cd8] .cxl_probe+0x1a98/0x1d50
[c000000721acb900] [c0000000004845a0] .local_pci_probe+0x40/0xc0
[c000000721acb980] [c000000000484998] .pci_device_probe+0x128/0x170
[c000000721acba30] [c00000000052400c] .driver_probe_device+0xac/0x2a0
[c000000721acbad0] [c000000000522468] .bind_store+0x108/0x160
[c000000721acbb70] [c000000000521448] .drv_attr_store+0x38/0x60
[c000000721acbbe0] [c000000000293840] .sysfs_kf_write+0x60/0xa0
[c000000721acbc50] [c000000000292500] .kernfs_fop_write+0x140/0x1d0
[c000000721acbcf0] [c000000000208648] .vfs_write+0xd8/0x260
[c000000721acbd90] [c000000000208b18] .SyS_write+0x58/0x100
[c000000721acbe30] [c000000000009258] syscall_exit+0x0/0x98
We are missing a call to of_node_get(). pnv_pci_to_phb_node() should
call of_node_get() otherwise np's reference count isn't incremented and
it might go away. Rename pnv_pci_to_phb_node() to pnv_pci_get_phb_node()
so it's clear it calls of_node_get().
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
MMU_NO_CONTEXT is conditionally defined as 0 or (unsigned int)-1. However,
in __flush_tlb_page() a corresponding variable is only tested for open
coded 0, which can cause NULL pointer dereference if `mm' argument was
legitimately passed as such.
Bail out early in case the first argument is NULL, thus eliminate confusion
between different values of MMU_NO_CONTEXT and avoid disabling and then
re-enabling preemption unnecessarily.
Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Convert the powermac PCI driver to use the generic config access functions.
This changes accesses from (in|out)_(8|le16|le32) to readX/writeX variants.
I believe these should be equivalent for PCI config space accesses, but
confirmation would be nice.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: linuxppc-dev@lists.ozlabs.org
Convert the fsl_pci driver to use the generic config access functions.
This changes accesses from (in|out)_(8|le16|le32) to readX/writeX variants.
I believe these should be equivalent for PCI config space accesses, but
confirmation would be nice.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: linuxppc-dev@lists.ozlabs.org
These defconfigs contain the CONFIG_M25P80 symbol, which is now
dependent on the MTD_SPI_NOR symbol. Add CONFIG_MTD_SPI_NOR to satisfy
the new dependency.
At the same time, drop the now-nonexistent CONFIG_MTD_CHAR symbol.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Scott Wood <scottwood@freescale.com>
When adding an event to the PMU with PERF_EF_START the STOPPED and UPTODATE
flags need to be cleared in the hw.event status variable because they are
preventing the update of the event count on overflow interrupt.
Signed-off-by: Alexandru-Cezar Sardan <alexandru.sardan@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Replace strcpy and strncpy with strlcpy to avoid strings that are too
big, or lack null termination.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
[scottwood@freescale.com: cleaned up commit message]
Signed-off-by: Scott Wood <scottwood@freescale.com>
The of_node_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Fix the GPIO address in the device tree to match the documented location.
Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Change-Id: I16e63db731e55a3d60d4e147573c1af8718082d3
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Geoff Thorpe <Geoff.Thorpe@freescale.com>
Signed-off-by: Hai-Ying Wang <Haiying.Wang@freescale.com>
[Emil Medve: Sync with the upstream binding]
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Geoff Thorpe <Geoff.Thorpe@freescale.com>
Signed-off-by: Hai-Ying Wang <Haiying.Wang@freescale.com>
[Emil Medve: Sync with the upstream binding]
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Add support for the Artesyn MVME2500 Single Board Computer.
The MVME2500 is a 6U form factor VME64 computer with:
- A single Freescale QorIQ P2010 CPU
- 1 GB of DDR3 onboard memory
- Three Gigabit Ethernets
- Five 16550 compatible UARTS
- One USB 2.0 port, one SHDC socket and one SATA connector
- One PCI/PCI eXpress Mezzanine Card (PMC/XMC) Slot
- MultiProcessor Interrupt Controller (MPIC)
- A DS1375T Real Time Clock (RTC) and 512 KB of Non-Volatile Memory
- Two 64 KB EEPROMs
- U-Boot in 16 SPI Flash
This patch is based on linux-3.18 and has been boot tested.
Signed-off-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Commit 746c9e9f92 "of/base: Fix PowerPC address parsing hack" limited
the applicability of the workaround whereby a missing ranges is treated
as an empty ranges. This workaround was hiding a bug in the etsec2
device tree nodes, which have children with reg, but did not have
ranges.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Reported-by: Alexander Graf <agraf@suse.de>
All accessed to PGD entries are done via 0(r11).
By using lower part of swapper_pg_dir as load index to r11, we can remove the
ori instruction.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
L1 base address is now aligned so we can insert L1 index into r11 directly and
then preserve r10
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
When pages are not 4K, PGDIR table is allocated with kmalloc(). In order to
optimise TLB handlers, aligned memory is needed. kmalloc() doesn't provide
aligned memory blocks, so lets use a kmem_cache pool instead.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Kernel MMU handling code handles validity of entries via _PMD_PRESENT which
corresponds to V bit in MD_TWC and MI_TWC. When the V bit is not set, MPC8xx
triggers TLBError exception. So we don't have to check that and branch ourself
to TLBError. We can set TLB entries with non present entries, remove all those
tests and let the 8xx handle it. This reduce the number of cycle when the
entries are valid which is the case most of the time, and doesn't significantly
increase the time for handling invalid entries.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Since commit 33fb845a6f ("powerpc/8xx: Don't use MD_TWC for walk"), MD_EPN and
MD_TWC are not writen anymore in FixupDAR so saving r3 has become useless.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
For nohash powerpc, when we run out of contexts, contexts are freed by stealing
used contexts in-turn. When a victim has been selected, the associated TLB
entries are freed using _tlbil_pid(). Unfortunatly, on the PPC 8xx, _tlbil_pid()
does a tlbia, hence flushes ALL TLB entries and not only the one linked to the
stolen context. Therefore, as implented today, at each task switch requiring a
new context, all entries are flushed.
This patch modifies the implementation so that when running out of contexts, all
contexts get freed at once, hence dividing the number of calls to tlbia by 16.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
On powerpc 8xx, in TLB entries, 0x400 bit is set to 1 for read-only pages
and is set to 0 for RW pages. So we should use _PAGE_RO instead of _PAGE_RW
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Some powerpc like the 8xx don't have a RW bit in PTE bits but a RO
(Read Only) bit. This patch implements the handling of a _PAGE_RO flag
to be used in place of _PAGE_RW
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[scottwood@freescale.com: fix whitespace]
Signed-off-by: Scott Wood <scottwood@freescale.com>
PMCs on PowerPC increases towards 0x80000000 and triggers an overflow
interrupt when the msb is set to collect a sample. Therefore, to setup
for the next sample collection, pmu_start should set the pmc value to
0x80000000 - left instead of left which incorrectly delays the next
overflow interrupt. Same as commit 9a45a9407c ("powerpc/perf:
power_pmu_start restores incorrect values, breaking frequency events")
for book3s.
Signed-off-by: Tom Huynh <tom.huynh@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
They seem to be leftovers from '14cf11a powerpc: Merge enough to start
building in arch/powerpc'
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Probably we should have not upstreamed this in the first place
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Fix this:
CC arch/powerpc/sysdev/fsl_pci.o
arch/powerpc/sysdev/fsl_pci.c: In function 'fsl_pcie_check_link':
arch/powerpc/sysdev/fsl_pci.c:91:1: error: the frame size of 1360 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
when configuring FRAME_WARN, by refactoring indirect_read_config()
to take hose and bus number instead of the 1344-byte struct pci_bus.
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
"you should SIGSEGV" error, because the SIGSEGV case was generally
handled by the caller - usually the architecture fault handler.
That results in lots of duplication - all the architecture fault
handlers end up doing very similar "look up vma, check permissions, do
retries etc" - but it generally works. However, there are cases where
the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.
In particular, when accessing the stack guard page, libsigsegv expects a
SIGSEGV. And it usually got one, because the stack growth is handled by
that duplicated architecture fault handler.
However, when the generic VM layer started propagating the error return
from the stack expansion in commit fee7e49d45 ("mm: propagate error
from stack expansion even for guard page"), that now exposed the
existing VM_FAULT_SIGBUS result to user space. And user space really
expected SIGSEGV, not SIGBUS.
To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
duplicate architecture fault handlers about it. They all already have
the code to handle SIGSEGV, so it's about just tying that new return
value to the existing code, but it's all a bit annoying.
This is the mindless minimal patch to do this. A more extensive patch
would be to try to gather up the mostly shared fault handling logic into
one generic helper routine, and long-term we really should do that
cleanup.
Just from this patch, you can generally see that most architectures just
copied (directly or indirectly) the old x86 way of doing things, but in
the meantime that original x86 model has been improved to hold the VM
semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
"newer" things, so it would be a good idea to bring all those
improvements to the generic case and teach other architectures about
them too.
Reported-and-tested-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On PowerNV platform, the OPAL interrupts are exported by firmware
through device-node property (/ibm,opal::opal-interrupts). Under
some extreme circumstances (e.g. simulator), we don't have this
property found from the device tree. For that case, we shouldn't
allocate the interrupt map. Otherwise, slab complains allocating
zero sized memory chunk.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The patch put the OPAL interrupt setup logic in opal_init() into
seperate function opal_irq_init() for easier code maintaining. The
patch doesn't introduce logic changes except:
* Rename variable names.
* Release virtual IRQ upon error from request_irq().
* Don't cache the virtual IRQ to opal_irqs[] upon error from
request_irq().
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In commit c8742f8512 "powerpc/powernv: Expose OPAL firmware symbol
map" I added pr_fmt() to opal.c. This left some existing pr_xxx()s with
duplicate "opal" prefixes, eg:
opal: opal: Found 0 interrupts reserved for OPAL
Fix them all up. Also make the "Not not found" message a bit more
verbose.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Remove slice_set_psize() which is not used.
It was added in 3a8247cc2c "powerpc: Only demote individual slices
rather than whole process" but was never used.
Remove vsx_assist_exception() which is not used.
It was added in ce48b21007 "powerpc: Add VSX context save/restore,
ptrace and signal support" but was never used.
Remove generic_mach_cpu_die() which is not used.
Its last caller was removed in 375f561a41 "powerpc/powernv: Always go
into nap mode when CPU is offline".
Remove mpc7448_hpc2_power_off() and mpc7448_hpc2_halt() which are
unused.
These were introduced in c5d56332fd "[POWERPC] Add general support for
mpc7448hpc2 (Taiga) platform" but were never used.
This was partially found by using a static code analysis program called
cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
[mpe: Update changelog with details on when/why they are unused]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
arch/powerpc has __kernel_map_pages implementations in mm/pgtable_32.c, and
mm/hash_utils_64.c, of which the former is built for PPC32, and the latter
for PPC64 machines with PPC_STD_MMU. Fix arch/powerpc/Kconfig to not select
ARCH_SUPPORTS_DEBUG_PAGEALLOC when CONFIG_PPC_STD_MMU_64 isn't defined,
i.e., for 64-bit book3e builds to use the generic __kernel_map_pages()
in mm/debug-pagealloc.c.
LD init/built-in.o
mm/built-in.o: In function `kernel_map_pages':
include/linux/mm.h:2076: undefined reference to `.__kernel_map_pages'
include/linux/mm.h:2076: undefined reference to `.__kernel_map_pages'
include/linux/mm.h:2076: undefined reference to `.__kernel_map_pages'
Makefile:925: recipe for target 'vmlinux' failed
make: *** [vmlinux] Error 1
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When CONFIG_PRINTK=n, log_buf_addr_get() returns NULL and log_buf_len_get()
return 0. Check for these return values and skip registering the dump buffer.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
RTAS events require arguments be passed in big endian while hypercalls
have their arguments passed in registers and the values should therefore
be in CPU endian.
The "ibm,suspend_me" 'RTAS' call makes a sequence of hypercalls to setup
one true RTAS call. This means that "ibm,suspend_me" is handled
specially in the ppc_rtas() syscall.
The ppc_rtas() syscall has its arguments in big endian and can therefore
pass these arguments directly to the RTAS call. "ibm,suspend_me" is
handled specially from within ppc_rtas() (by calling rtas_ibm_suspend_me())
which has left an endian bug on little endian systems due to the
requirement of hypercalls. The return value from rtas_ibm_suspend_me()
gets returned in cpu endian, and is left unconverted, also a bug on
little endian systems.
rtas_ibm_suspend_me() does not actually make use of the rtas_args that
it is passed. This patch removes the convoluted use of the rtas_args
struct to pass params to rtas_ibm_suspend_me() in favour of passing what
it needs as actual arguments. This patch also ensures the two callers of
rtas_ibm_suspend_me() pass function parameters in cpu endian and in the
case of ppc_rtas(), converts the return value.
migrate_store() (the other caller of rtas_ibm_suspend_me()) is from a
sysfs file which deals with everything in cpu endian so this function
only underwent cleanup.
This patch has been tested with KVM both LE and BE and on PowerVM both
LE and BE. Under QEMU/KVM the migration happens without touching these
code pathes.
For PowerVM there is no obvious regression on BE and the LE code path
now provides the correct parameters to the hypervisor.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Resource management
- Clip bridge windows to fit in upstream windows (Yinghai Lu)
Virtualization
- Mark Atheros AR93xx to avoid using bus reset (Alex Williamson)
Miscellaneous
- Update Richard Zhu's email address (Lucas Stach)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUwqpDAAoJEFmIoMA60/r8ykIQAINgkP/iPaFMTPkTSfzTJCMY
oQVNGha4FDt6Ic1UWGyS/sYUpywSnALxlYWxVZTm5r+sGQ2yJBo6veuxvCI09YFw
lWqf6lfvkFSthWCo7pHLoNaIjKJUNCy4a2han31aAIScMCNX4YF60YorMSjQBST8
smLMG75U3U9VWaXYsV1e5gTvLa5IQh4lgaTgMAOXqd+6WcAR4WwOgD2sR06o2X43
63JF2U+ieuA789Xu2IS92TmMMESD5haEZATqdGPtxpnqyHxmBNu0Y4JkkBWD2S92
HvveOoLBT2TBfICkftvCJscBLHh7PZMIx9nLx58SnijVzX+hzVr4Zfc96MZU50MK
DuNbbZn3sO902ukOEpfih7Mg0tDxCxNytleEdAnXmZuqf+odbd/Y4AA0Hg6w7GEY
OsVGbQAT/knlTfsSZsivtmUl7l1SXzrozv+q4f4szY95v34S9pm0sWzz0IBn7oKj
h7N9Vslr3lyEudOUo1OrFq+0arDw53kwOOkIavMUH0nvTqKs4cmXBcGMfo1EfMa+
3YhjwbgpvtZ3AXi2NSBk4gIGZEmQslvgRStLhgXVDl+9DieK+sw1Vx4cKe8gu9mD
c7zPStEsJBJgd3v+8s8avwo8R0oPZb6MsCKFjjaYojTvpfFmfX0YyWE/TzYoUm6Z
+BTyA8t0+3jTArTs/Zid
=HRy7
-----END PGP SIGNATURE-----
Merge tag 'pci-v3.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"These are fixes for:
- a resource management problem that causes a Radeon "Fatal error
during GPU init" on machines where the BIOS programmed an invalid
Root Port window. This was a regression in v3.16.
- an Atheros AR93xx device that doesn't handle PCI bus resets
correctly. This was a regression in v3.14.
- an out-of-date email address"
* tag 'pci-v3.19-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
MAINTAINERS: Update Richard Zhu's email address
sparc/PCI: Clip bridge windows to fit in upstream windows
powerpc/PCI: Clip bridge windows to fit in upstream windows
parisc/PCI: Clip bridge windows to fit in upstream windows
mn10300/PCI: Clip bridge windows to fit in upstream windows
microblaze/PCI: Clip bridge windows to fit in upstream windows
ia64/PCI: Clip bridge windows to fit in upstream windows
frv/PCI: Clip bridge windows to fit in upstream windows
alpha/PCI: Clip bridge windows to fit in upstream windows
x86/PCI: Clip bridge windows to fit in upstream windows
PCI: Add pci_claim_bridge_resource() to clip window if necessary
PCI: Add pci_bus_clip_resource() to clip to fit upstream window
PCI: Pass bridge device, not bus, when updating bridge windows
PCI: Mark Atheros AR93xx to avoid bus reset
PCI: Add flag for devices where we can't use bus reset
The return value of kvm_arch_vcpu_postcreate is not checked in its
caller. This is okay, because only x86 provides vcpu_postcreate right
now and it could only fail if vcpu_load failed. But that is not
possible during KVM_CREATE_VCPU (kvm_arch_vcpu_load is void, too), so
just get rid of the unchecked return value.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.
Optimise the loop in a few ways:
- Unroll the byte at a time loop
- For large (at least 32 byte) comparisons that are also 8 byte
aligned, use an unrolled modulo scheduled loop using 8 byte
loads. This is similar to our glibc memcmp.
A simple microbenchmark testing 10000000 iterations of an 8192 byte
memcmp was used to measure the performance:
baseline: 29.93 s
modified: 1.70 s
Just over 17x faster.
v2: Incorporated some suggestions from Segher:
- Use andi. instead of rdlicl.
- Convert bdnzt eq, to bdnz. It's just duplicating the earlier compare
and was a relic from a previous version.
- Don't use cr5, we have plans to use that CR field for fast local
atomics.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The callback (ppc_md.pci_probe_mode()) is used to determine if the
child PCI devices of the indicated PCI bus should be probed from
device-tree or hardware. On PowerNV platform, we always expect
probing PCI devices from hardware, which is PowerPC PCI core's
default behaviour. Also, the callback had some delay implemented
based on PHB's device node property "reset-clear-timestamp", which
wasn't exported from skiboot. So we don't need this function and
it's safe to remove it.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When PE's frozen count hits maximal allowed frozen times, which is
5 currently, it will be forced to be offline permanently. Once the
PE is removed permanently, rebooting machine is required to bring
the PE back. It's not convienent when testing EEH functionality.
The patch exports the maximal allowed frozen times through debugfs
entry (/sys/kernel/debug/powerpc/eeh_max_freezes).
Requested-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The conditions that one specific PE's frozen count exceeds the maximal
allowed times (EEH_MAX_ALLOWED_FREEZES) and it's in isolated or recovery
state indicate the PE was removed permanently implicitly. The patch
introduces flag EEH_PE_REMOVED to indicate that explicitly so that we
don't depend on the fixed maximal allowed times, which can be varied as
we do in subsequent patch.
Flag EEH_PE_REMOVED is expected to be marked for the PE whose frozen
count exceeds the maximal allowed times, or just failed from recovery.
Requested-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
PE#0 should be regarded as valid for P7IOC, while it's invalid for
PHB3. The patch adds flag EEH_VALID_PE_ZERO to differentiate those
two cases. Without the patch, we possibly see frozen PE#0 state is
cleared without EEH recovery taken on P7IOC as following kernel logs
indicate:
[root@ltcfbl8eb ~]# dmesg
:
pci 0000:00 : [PE# 000] Secondary bus 0 associated with PE#0
pci 0000:01 : [PE# 001] Secondary bus 1 associated with PE#1
pci 0001:00 : [PE# 000] Secondary bus 0 associated with PE#0
pci 0001:01 : [PE# 001] Secondary bus 1 associated with PE#1
pci 0002:00 : [PE# 000] Secondary bus 0 associated with PE#0
pci 0002:01 : [PE# 001] Secondary bus 1 associated with PE#1
pci 0003:00 : [PE# 000] Secondary bus 0 associated with PE#0
pci 0003:01 : [PE# 001] Secondary bus 1 associated with PE#1
pci 0003:20 : [PE# 002] Secondary bus 32..63 associated with PE#2
:
EEH: Clear non-existing PHB#3-PE#0
EEH: PHB location: U78AE.001.WZS00M9-P1-002
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When calling to early_setup(), we pick "boot_paca" up for the master CPU
and initialize that with initialise_paca(). At that point, the SLB
shadow buffer isn't populated yet. Updating the SLB shadow buffer should
corrupt what we had in physical address 0 where the trap instruction is
usually stored.
This hasn't been observed to cause any trouble in practice, but is
obviously fishy.
Fixes: 6f4441ef70 ("powerpc: Dynamically allocate slb_shadow from memblock")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
num_possible_cpus() is just a shorthand for it.
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently, all non-dot symbols are being treated as function descriptors
in ABIv1. This is incorrect and is resulting in perf probe not working:
# perf probe do_fork
Added new event:
Failed to write event: Invalid argument
Error: Failed to add events.
# dmesg | tail -1
[192268.073063] Could not insert probe at _text+768432: -22
perf probe bases all kernel probes on _text and writes,
for example, "p:probe/do_fork _text+768432" to
/sys/kernel/debug/tracing/kprobe_events. In-kernel, _text is being
considered to be a function descriptor and is resulting in the above
error.
Fix this by changing how we lookup symbol addresses on ppc64. We first
check for the dot variant of a symbol and look at the non-dot variant
only if that fails. In this manner, we avoid having to look at the
function descriptor.
While at it, also separate out how this works on ABIv2 where
we don't have dot symbols, but need to use the local entry point.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Once upon a time, at least 9 years ago (< 2.6.12), _TIF_SYSCALL_T_OR_A
meant "TRACE or AUDIT". But these days it means TRACE or AUDIT or
SECCOMP or TRACEPOINT or NOHZ.
All of those are implemented via syscall_dotrace() so rename the flag to
that to try and clarify things.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We removed the last usage of CPU_FTR_IABR in commit 1ad7d70562
"powerpc/xmon: Enable HW instruction breakpoint on POWER8".
Mark it as free.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
pci_dn->phb is set to phb in update_dn_pci_info(), if succeed.
This patch removes the duplication of pci_dn->phb initialization.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When IOMMU bypass is enabled, a PCI device can read and write memory
that was not mapped by the driver without causing an EEH. That might
cause memory corruption, for example.
When we disable bypass, DMA reads and writes to addresses not mapped by
the IOMMU will cause an EEH, allowing us to debug such issues.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The current handling of EPOW_SHUTDOWN_ON_UPS event does not shutdown the
system after logging the message. All the events of EPOW_SYSTEM_SHUTDOWN
action code (EPOW_SHUTDOWN_ON_UPS is a part of it) must initiate system
shutdown as per the SPAPR spec. If the LPAR does not shutdown after
receiving this rtas based event, it will expose itself to a forced
abrupt shutdown initiated by the platform firmware. This patch fixes the
situation.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The M64 range information is missed in dmesg, which would be helpful in debug.
This patch prints the M64 range information in the same format as M32.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Some instances of pci_ops initialization rely on the read/write members'
location in the struct. This is fragile and may break when adding new
members to the beginning of the struct.
No functional change.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Arnd Bergmann <arnd@arndb.de>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: linuxppc-dev@lists.ozlabs.org
CC: cbe-oss-dev@lists.ozlabs.org
First two are minor fallout from the param rework which went in this merge
window.
Next three are a series which fixes a longstanding (but never previously
reported and unlikely , so no CC stable) race between kallsyms and freeing
the init section.
Finally, a minor cleanup as our module refcount will now be -1 during
unload.
Thanks,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUwEmwAAoJENkgDmzRrbjx77kP/1cNQR2eG2sBwokg3q0tvHnQ
IKqEXErW7NvxRa+RAMEmy2uQoGt6+uNklAbtyJEYM9oR1NieFbPi2yrt9Xn5SAXS
Brp1S8WYBMilA3W3o6I0trFDRWHdpdtkKIQwLWgJNSEWjbTXh8bSwp/2X1rlOPyI
ZmphCMOQMU2/uFEyJhTz1WMEV8eVXiRLN8OxSkPxToxdZoGln2U8IBCCCJC9OG+f
Cf3eMgEcNdEXNcPKqr11NIcHkAx6M6qI/eMDOqk151PslHa8lbis6di9Z87aE0ps
i8PyrkJGTmgM9cCjXwE8deNseeCmuKYlbPIF+NoxcqtvZstfaMrISwTIEuzV4JHi
p13YhDxy4XiC3H6pKHub/jo7UCl+wWtFh9SqpqGgduFX/p6FtUHQJm0S0X/DFFZt
C+2MFVSe6HRHE8B7bFz86+619Qd/rU7+806CLCE+NbYlYAKIBYKzWt/bml6VH3RJ
OjwXhQqmznWhJjsfD3BUUUpZpHijmylI9gAe2F1oErb8YjRU6gIm7P8hlkOzD7AS
TfGHPFq2raQcfAiGdVmvkbvvhvYZXnB3WVsAexrYoqrT9I8eEfRI+7SkL75MLR2E
ikzhJS3SHkAUAd7fUVMt7xMwh0jmhsPjWCCqc13m6UUFoXhTaDgKgPGftltN0bI2
g85+enZ3/eca6xh/KxvW
=Kf9b
-----END PGP SIGNATURE-----
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module and param fixes from Rusty Russell:
"Surprising number of fixes this merge window :(
The first two are minor fallout from the param rework which went in
this merge window.
The next three are a series which fixes a longstanding (but never
previously reported and unlikely , so no CC stable) race between
kallsyms and freeing the init section.
Finally, a minor cleanup as our module refcount will now be -1 during
unload"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
module: make module_refcount() a signed integer.
module: fix race in kallsyms resolution during module load success.
module: remove mod arg from module_free, rename module_memfree().
module_arch_freeing_init(): new hook for archs before module->module_init freed.
param: fix uninitialized read with CONFIG_DEBUG_LOCK_ALLOC
param: initialize store function to NULL if not available.
Turning snoops on is the last step in CAPP recovery. Sapphire is expected to
have reinitialized the PHB and done the previous recovery steps.
Add mode argument to opal call to do this. Driver can turn snoops off although
it does not currently.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add calls to the ps3_mm_set_repository_highmem() routine when the ps3
r1 highmem region is either created or destroyed.
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add the new routine ps3_mm_set_repository_highmem() that saves highmem info to
the LV1 hypervisor registry so that the info will be available to second stage
OS's loaded by petitboot/kexec. FreeBSD and some Linux derivatives use
this feature.
Also, move the existing ps3_mm_get_repository_highmem() routine up in
the source file.
This implementation of ps3_mm_set_repository_highmem() assumes the repository
will have a single highmem region entry (at index 0).
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
To avoid the need for preprocessor conditionals in C source files add a set of
empty inline repository highmem write routines to platform.h that are used when
CONFIG_PS3_REPOSITORY_WRITE is not defined.
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
LPCR_PECE1 bit controls whether decrementer interrupts are allowed to
cause exit from power-saving mode. While waking up from winkle, restoring
LPCR with LPCR_PECE1 set (i.e Decrementer interrupts allowed) can cause
issue in the following scenario:
- All the threads in a core are offlined. The core enters deep winkle.
- Spurious interrupt wakes up a thread in the core. Here LPCR is restored
with LPCR_PECE1 bit set.
- Since it was a spurious interrupt on a offline thread, the thread clears
the interrupt and goes back to winkle.
- Here before the thread executes winkle and puts the core into deep winkle,
if a decrementer interrupt occurs on any of the sibling threads in the core
that thread wakes up.
- Since in offline loop we are flushing interrupt only in case of external
interrupt, the decrementer interrupt does not get flushed. So at this stage
the thread is stuck in this is loop of waking up at 0x100 due to decrementer
interrupt, not flushing the interrupt as only external interrupts get flushed,
entering winkle, waking up at 0x100 again.
Fix this by programming PORE to restore LPCR with LPCR_PECE1 bit
cleared when waking up from winkle.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull RCU updates from Paul E. McKenney:
- Documentation updates.
- Miscellaneous fixes.
- Preemptible-RCU fixes, including fixing an old bug in the
interaction of RCU priority boosting and CPU hotplug.
- SRCU updates.
- RCU CPU stall-warning updates.
- RCU torture-test updates.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Distros are enabling NUMA balancing (eg Ubuntu), so it would be good to
get some more test coverage with it.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Enable config options required by lxc and docker.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
KSM will only be used on areas marked for merging via madvise, and it
is showing nice improvements on KVM workloads, so enable it by
default.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We are starting to see ppc64 boxes with SATA AHCI adapters in it,
so enable it in our defconfigs.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This was enabled on the pseries defconfigs recently, but missed
the ppc64 one.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
So the boards which has COMMON_CLK enabled don't have to
invoke this in its board specific file.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Acked-by: Scott Wood <scottwood@freescale.com>
Acked-by: Michael Turquette <mturquette@linaro.org>
Signed-off-by: Michael Turquette <mturquette@linaro.org>
It looks like it's ~4 years since we updated some of these, so do a bulk
update.
Verified that the before and after generated configs are exactly the
same.
Which begs the question why update them? The answer is that it can be
confusing when the stored defconfig drifts too far from the generated
result.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull crypto fix from Herbert Xu:
"This fixes a regression that arose from the change to add a crypto
prefix to module names which was done to prevent the loading of
arbitrary modules through the Crypto API.
In particular, a number of modules were missing the crypto prefix
which meant that they could no longer be autoloaded"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: add missing crypto module aliases
Nothing needs the module pointer any more, and the next patch will
call it from RCU, where the module itself might no longer exist.
Removing the arg is the safest approach.
This just codifies the use of the module_alloc/module_free pattern
which ftrace and bpf use.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: x86@kernel.org
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: linux-cris-kernel@axis.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: nios2-dev@lists.rocketboards.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: netdev@vger.kernel.org
ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)
Change the ppc/hugetlbfs code to replace ACCESS_ONCE with READ_ONCE.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)
Change the ppc/kvm code to replace ACCESS_ONCE with READ_ONCE.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Alexander Graf <agraf@suse.de>
The commit 3b8a3c0109 ("powerpc/pseries: Fix endiannes issue in RTAS
call from xmon") was fixing an endianness issue in the call made from
xmon to RTAS.
However, as Michael Ellerman noticed, this fix was not complete, the
token value was not byte swapped. This lead to call an unexpected and
most of the time unexisting RTAS function, which is silently ignored by
RTAS.
This fix addresses this hole.
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: stable@vger.kernel.org
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Every PCI-PCI bridge window should fit inside an upstream bridge window
because orphaned address space is unreachable from the primary side of the
upstream bridge. If we inherit invalid bridge windows that overlap an
upstream window from firmware, clip them to fit and update the bridge
accordingly.
[bhelgaas: changelog]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=85491
Reported-by: Marek Kordik <kordikmarek@gmail.com>
Fixes: 5b28541552 ("PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Gavin Shan <gwshan@linux.vnet.ibm.com>
CC: Anton Blanchard <anton@samba.org>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: Wei Yang <weiyang@linux.vnet.ibm.com>
CC: Andrew Murray <amurray@embedded-bits.co.uk>
CC: linuxppc-dev@lists.ozlabs.org
In order to support accesses to larger chunks of memory, pass in a
'size' parameter (counted in bytes), and return the amount available at
that address.
Add a new helper function, bdev_direct_access(), to handle common
functionality including partition handling, checking the length requested
is positive, checking for the sector being page-aligned, and checking
the length of the request does not pass the end of the partition.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The 'pfn' returned by axonram was completely bogus, and has been since
2008.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Commit 5d26a105b5 ("crypto: prefix module autoloading with "crypto-"")
changed the automatic module loading when requesting crypto algorithms
to prefix all module requests with "crypto-". This requires all crypto
modules to have a crypto specific module alias even if their file name
would otherwise match the requested crypto algorithm.
Even though commit 5d26a105b5 added those aliases for a vast amount of
modules, it was missing a few. Add the required MODULE_ALIAS_CRYPTO
annotations to those files to make them get loaded automatically, again.
This fixes, e.g., requesting 'ecb(blowfish-generic)', which used to work
with kernels v3.18 and below.
Also change MODULE_ALIAS() lines to MODULE_ALIAS_CRYPTO(). The former
won't work for crypto modules any more.
Fixes: 5d26a105b5 ("crypto: prefix module autoloading with "crypto-"")
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This driver provides UIO access to memory of a peripheral connected
to the Freescale enhanced local bus controller (eLBC) interface
using the general purpose chip-select mode (GPCM).
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>